Formula 1 is a sport like no other. Anything can happen and the smallest of errors can mean the difference between finishing first and crashing into a wall of tires. The cost of any mistakes is further amplified by major factors outside the driver’s control, from the ever-changing weather to the quality of the car provided by the constructor.
With so many variables in play, it’s no wonder that the greatest driver of all time is so fiercely debated. To make matters worse, F1 rules have changed drastically over the years and it is difficult to make a fair comparison between drivers whose careers happened decades apart. With all of this to consider, how can we give a reasonable, meaningful answer to who the best Formula 1 driver of all time is? Here at OST, we set out to help you find your answer based on whatever criteria are most important to you.
Here’s how we did it.
We created an RDFox-powered web app that lets you effortlessly compare drivers and play around with different scoring systems and filters. If you would like to try it out yourself, just go to f1.rdfox.tech and you will have your answers in just a couple of clicks!
The app allows the user to assign a score to each driver by choosing from a range of options according to their definition of what makes a good driver. The results are then returned in milliseconds thanks to the unparalleled performance of its underlying database — RDFox.
But in order to take advantage of this, we had to bring Formula 1 data from various sources into the world of graph. Thankfully, tasks like this are made easy with RDFox’s rich integration features.
If we want to compare drivers, first and foremost we need information about how they did in races. Luckily, historical and recent Formula 1 data is available from the Ergast API . The web service only supports REST-style queries though, and we would like to perform some rather complex operations on the dataset. That is why we decided to download the data in tabular format and integrate it into RDFox. In order to achieve this, we used the data source registration feature and created tuple tables based on the CSV files. Then we just wrote a couple Datalog rules to map these tables into RDF triples.
We spoke about our app with an ex-CEO of a Formula 1 team and they said that one of the biggest challenges in the sport is driving in the rain. When the track is wet and slippery, it takes a lot of skill to avoid collisions and still come out on top.
To find out whether it was raining during a given race, we turned to Wikipedia (we chose this approach because the Ergast data already contained Wikipedia links). Using Python and basic NLP, we sorted through race conditions descriptions and divided the races into three categories: “dry”, “mixed” (partially wet) and “wet”.
We wanted the users of our app to be able to choose from a range of scoring options, but also to see the results as fast as possible. Because of this, we decided to use RDFox reasoning to have the most important inferred information ready when the users’ requests arrive.
Since Formula 1 championships have had vastly different scoring systems over the years, we wanted to make things fair and assigned a “modern equivalent”* score to each race result. For example, the rule for the winner of each race looks like this:
This can be read as “if a node of type f1:PrimaryResult has position 1, it should also have modern points equal to 25”. The declarative nature of Datalog means that RDFox will ensure the system remains consistent and the inferred triples always stay up to date with the explicit facts present in the database. Thanks to this, the “modern points” will automatically be calculated whenever new results are added, even if the scoring system were to change in the future.
We now have one way of comparing competitors, but looking at our data we found that in recent years the winner started from pole position in over 50% of Formula 1 races. A driver’s starting position depends on how they did in the qualifying round, which in turn depends on how fast they can do a lap on the circuit, without worrying about overtaking others or being overtaken. According to many fans this means that, barring any serious mistakes, the driver’s performance depends less on their skill, and more on how fast their car is.
To mitigate this, we decided to give our users the option to look at how many positions the driver managed to move up during the race instead. We can do this more easily if we add a rule like this:
Note that, since in our data position 0 means the driver did not complete the race, we need to consider these situations separately:
But what if we only want to only compare drivers who race in the same kind of car? Thankfully, nowadays there are usually two participants from each constructor’s team, so RDFox reasoning can help us check how drivers are doing against their teammates.
First, we introduced an index for nodes connected to a “race node” and a “constructor node”.
This is a trick that will help us match up teammates’ results with each other faster.
Now we can use this relationship to calculate the best teammate result, both scoring based on “modern points”* and on positions gained during the race:
The facts we inferred will make it easy for us to write queries and list best drivers according to our chosen criteria. For example, if we wanted to calculate the all-time champion when it comes to gaining (on average) more positions in wet weather than their teammate, we could just write a query like this:
If we wanted to use a different scoring method, all we’d need to do is swap out a couple lines. That is exactly what our app does when the user makes their scoring criteria selection, which makes the code for it quick to write and easy to maintain.
At the end of the day, Formula 1 fans will always disagree on who the best driver is, and the answer will never be free from subjectivity. RDFox helps you analyse facts and form your own opinion. Perhaps you think a win is less meaningful when you drive the fastest car, perhaps you think wet weather is the ultimate test of skill. Either way, our app is there to help you find your answers.
We hoped to demonstrate how reasoning can be used to solve complex problems. With just a couple rules and the power of RDFox, we have created a way for our users to go beyond basic data exploration and into the realm of complex aggregation, all without sacrificing query performance. RDFox reasoning ensures that our system is robust and remains consistent when we add new results — something which would simply not be possible with other technologies.
If you have a problem you would like to solve, you can try RDFox for free! Contact us and together we can build a solution that meets the needs of your business. If you’re looking for inspiration, head over to our blog where you can see how RDFox is being used in industry today.
The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data-intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin-out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Enterprises (OSE) and Oxford University Innovation (OUI).