Have you ever wondered how to find the right recipe for the ingredients you have at home? Have you felt very frustrated when you searched for recipes based on one ingredient but you didn’t have the other ingredients in the recipe at home?
Oxford Semantic Technologies found a recipe for a successful semantic search using PoolParty powered by RDFox. They showed us their implementation at SEMANTiCS 2022.
Oxford Semantic Technologies’ use case was based on the recipe data from the BBC Good Food website. I have used this website many times before to find a recipe that I could make in the evening.
I would usually type in just one ingredient and get back a list of recipes with that ingredient in the title. The results wouldn’t include recipes with similar text in the title or in the ingredients. Most recipes were not suitable because I wouldn’t have many of the ingredients at home. And it was not possible to filter the ingredients out that I didn’t have at home.
One of the solutions to this problem is to introduce a taxonomy. A taxonomy makes it easier to find a list of recipes that a human would expect to find. It does this by organising and categorising human knowledge into a hierarchy of so called concepts. A concept is not just a text or a string. It is a category that encapsulates human knowledge. In this case, Fish would be an ingredient concept and not just a piece of text. But when I look for recipes with “fish” without a taxonomy, BBC GoodFood doesn’t give me recipes with “smoked salmon”, for example.
Peter Crocker (CEO of Oxford Semantic Technologies – OST) and Valerio Cocchi (Senior Knowledge Engineer at Oxford Semantic Technologies) explained this problem in more detail. When you search for an ingredient on BBC GoodFood, the website does a text search. And the search result only includes the recipes that have exactly the same text as your search text. It won’t find recipes with an ingredient that is similar or it is not in the title. If you search for more than one ingredient, the search results will simply ignore the second searched ingredient.
This is not the kind of intelligent search result a human would expect.
Valerio presented a demo to show us that it is possible to improve a text-only search experience by adding semantics to the data. In his demo, he showed us that when you input “fish” recipes, you get intelligent search results that have more than just recipes with the string “fish” in the title. You get recipes with anchovies or with smoked salmon, both of which are types of fish, but neither of which is similar to the string “fish”.
Now we don’t simply annotate our recipes with strings, but rather with objects, and we also store the relationships between these objects (e.g. anchovy is a fish, or cheddar is a cheese). This demo uses a PoolParty taxonomy (we call this a PoolParty Project) where they modelled the BBC GoodFood recipe data into so-called hierarchies of ingredients as concepts. You can also search for more than one ingredient such as “fish and alcoholic beverages”. And this search would return recipes with “smoked salmon and wine” because the search is looking for concepts and not text.
This is the key to a search result that a human would expect.
To improve the search, Oxford Semantic Technologies (OST) told us that they first had to find a way to encode the domain knowledge that we have as humans about recipes. And then find an engine to take this knowledge, apply it to the data and exploit it for better search results.
The components they used:
OST encoded this knowledge in the HeLiS Ontology that was adapted to conform to SKOS. They then used PoolParty to manage it.
In PoolParty they set up the taxonomy with a hierarchy of concepts. The top concept is called Food and they modelled the subsequent food groups as concepts. The relation between these concepts are the SKOS broader/narrower. For example, parmesan is a type of cheese which is a dairy product. In PoolParty this is expressed as the “Parmesan” concept which is the narrower of the “Milk and Dairy Products” concepts.
This is really useful for finding recipes not just with a specific ingredient but it could be something similar (using the Alternative labels in PoolParty) or without certain ingredients. For example, you have a food allergy and you want to find a recipe suitable for a dairy-free diet. Below, you can see what this taxonomy looks like in PoolParty and in RDFox’s visualization.
As the next step, they web scraped the BBC GoodFood recipe data and then annotated it using PoolParty Extractor. PoolParty generated an RDF file with relevant concepts from the taxonomy and some probabilities for how confident the system was that an ingredient actually appeared in the recipe.
They then put the annotated BBC GoodFood data into RDFox along with the HeLiS taxonomy. RDFox ran rules to simplify the annotations graph by adding a direct relation for “Ingredient”. It also followed the broader relation. For example the “Salmon” concept is a narrower of the “Fresh Fish” concept and that is a narrower of the “Fish” concept. They were able to find new concepts for allergies such as a dairy allergy. They did it using reasoning. For example, “Cheese” is a narrower of “Dairy”, this would mean that (Not) “Dairy” is “Dairy free” which is the new concept. RDFox knew how to derive concepts and add triples incrementally because of these rules. Below is an example of a broader relation.
In this example they show us how they set up a rule for vegan recipes where they used Negation as failure (NAF) rule where they wanted to get recipes that do not contain, for example milk and dairy, or eggs. This rule was the essential knowledge that was broken into chunks of logic as you can see below.
It is not easy to explain complex technical concepts in a language that everyone understands. I know because I try to do the same as a technical writer, every day. I also really enjoyed how they took a foodie example to explain complex technical concepts.
Another very satisfying thing about this talk was that Oxford Semantic Technologies came to the same conclusion as us at Semantic Web Company: text-only search is not good enough to achieve intelligent search results that humans expect. Search needs to improve. The ingredients for this are a great taxonomy managed in PoolParty fired up by RDFox.
The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data-intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin-out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Enterprises (OSE) and Oxford University Innovation (OUI).