Reasoning is probably the most powerful feature and main selling point of RDF Graph Databases.
RDF Graph Databases with advanced reasoning capabilities are believed to be the future of AI, as Mike Tung put in his Forbes article Knowledge Graphs Will Lead To Trustworthy AI:
The era of black-box AI systems is over. Next-generation systems will optimize the explainability and trustworthiness of the overall human-AI system, and knowledge graphs will serve as a key ingredient that makes these systems more explainable, inspectable, auditable and, ultimately, controllable.
Semantic reasoning is the ability of a system to infer new facts from existing data based on inference rules or ontologies. In simple terms, rules add new information to the existing dataset, adding context, knowledge, and valuable insights
This is the first of a series of articles on reasoning with Northwind sample database. In this article we are going to create inference rules to simplify and optimise queries and data management.
We are going to use RDFox, an in-memory high performance knowledge graph and semantic reasoning engine. RDFox uses Datalog rule language to express rules.
It’s a learning by example experience and not much theory will be covered here.
For more details on RDFox Reasoning, Datalog, and Rules, as well as the Northwind sample database, please refer to the links in the “References” section in the end of this article.
If you choose to set up the environment in order to execute the queries yourself, please refer to the “Setting up the demo environmet” section further down in this article. Otherwise, you can just browse the queries and screenshots below.
Rules define conditions to be matched in the data in order to infer new triples that become available to queries. They provide a mechanism that allows tailor-made performance improvements to specific queries.
In this section we are going introduce three practical examples (use cases) to explain how rules work.
Each use case will contain an original query, a rule and a modified version of the query that uses the rule, producting the same result.
Original query
The original SPARQL query used to return a list of customers who bought product-61.
Original query result:
Note: Query above completed in 12ms and screenshot displays 4 out of 22 results.
By using Property Path we can easily demonstrate the path that needs to be traversed to answer the question.
That’s quite a long way to answer such a typical question. We want to create a shortcut, which will not only speed up things but also make the query more intuitive and easier to maintain. This is archived by rule 01 below.
Rule 01 — boughtProduct
A rule that defines which product was bought by a customer.
Rule definition
Add rule to the data store
There are many ways of adding rules to an RDFox data store. The following example uses curl through a REST API.
Note that the destination named graph for the rule is specified in the curl command.
For those who set up the environment with RDFox running in a Docker container, the required authentication will need to be added to the curl command: -u admin:admin
Modified query
The original query was modified to consume the new rule we have just created. The modified query produces the same result.
Modified query result:
Note: Query above completed in 8ms and screenshot displays 4 out of 22 results.
Since version 5.6, it’s possible to highlight reasoning on the RDFox web console. The following shows the new derived fact, which is materialised in RDFox as a new triple in the graph.
Original query
Lists the top 5 customers by product count.
Original query Result:
Note: query above executed in 12ms.
Create Rule 02 — hasProductCount
The following rule defines relations based on the result of an aggregate calculation.
Rule definition
Add rule to the data store
Modified query
Modified query result:
Note: It’s very important to define the types and make rules as selective as possible to improve rule materialisation and query answering times. For example, adding the types [?customer, a, :Customer], [?order, a, :Order] and [?orderDetail, a, OrderDetail] to the previous rule brought query execution time from 10 down to 3ms. More guidelines on how to create rules can be found in The Do’s and Don’ts of Rule and Query Writing article.
The following illustration highlights the inferred facts (in cyan) as a result of rules 01 and 02 .
Use Case 03 — customers who never placed an order
Original query
Lists the customers who never placed an order.
Original query result:
The SPARQL query above can be re-written using MINUS or FILTER NOT EXISTS, producing the same result. For the differences on how these commands get evaluated, please refer to the comments in the file queries/03–1-customers-who-never-placed-an-order-before-rule-03.sparql from the demo github repo.
Create Rule 03 — CustomerWithoutOrder
Negation as failure is a very powerful feature of rules in RDFox.
Rule definition
Add rule to the data store
Modified query
Modified query result:
The following illustration highlights the derived facts (in cyan) as a result of rule 03.
And, finally, the following highlights (in cyan) the derived facts as a result of all previous rules created so far.
Let’s see what happens if a CustomerWithoutOrder places an order.
When we execute the modified query a second time, :customer-FISSA is not returned. That’s because the derived fact CustomerWithoutOrder was retracted when that customer placed an order.
And, what if we delete rule 03 from the Northwind data store altogether?
Then, query 03 will not produce any results.
The above are considered to the most desirable features of an advanced reasoning engine.
We started our journey with a simple demonstration on how inference rules can enrich an existing triplestore. We are planning to extend the reasoning capabilities of the Northwind sample database by adding axioms, an ontology and additional rules to answer more complex questions. Stay tuned!
If you choose to run the queries in this demonstration, please follow the steps below to set up the demo environment.
The following github repository contains the sample data, queries and rules used in this demonstration.
IMPORTANT! If you are on MacOS, you may choose to follow the instructions in the git repo above and skip the remaining steps in this section. The repo will start a persisted instance of RDFox in a Docker container with the Northwind data store already loaded and configured.
By using this option, the only thing that changes for you when executing the steps in the demo is the curl commands to add rules. You will need to append the authentication -u admin:admin before executing them.
Request an RDFox license here. You will need a commercial or academic email.
Download the appropriate version of RDFox onto your machine.
Copy the license file RDFox.lic to the directory where the RDFox executable is located.
In a terminal, from the same directory above, execute ./RDFox sandbox on MacOS/Linux or RDFox.exe sandbox on Windows to launch RDFox.
MacOS Only
If you get a warning message saying that RDFox is not from an identified developer, click Cancel.
Go to System Preferences > Security and Privacy > General Tab and then click on Allow Anyway, as illustrated below and run the sandbox command again.
If you get another warning message, choose Open to start the RDFox shell.
If everything goes fine, you should get the following message in the terminal:
In the Shell, execute the following to expose the RDFox REST API, which includes a SPARQL over HTTP endpoint.
MacOS only
if you get the following message, choose Allow.
You should get the following message: The REST endpoint was successfully started at port number/service name 12110 with XX threads.
Warning! Do not close the terminal window as that would stop the RDFox server. Also, any of the commands to add rules in this demo must be executed in a separate terminal window.
At this point you should be able to navigate to the RDFox web console at http://localhost:12110/console/
On the Console UI, click on + Create data store and name it “Northwind”.
Cancel the Import Content popup as we need to create a graph before importing the data.
Execute the following query on the RDFox web console to create the dataGraph where we are going to store the data and rules.
From … Menu, choose Add content
Select dataGraph from the drop down and then select the northind.nt file under the nortwind/data directory in your local branch or download it from github repo.
You should get a confirmation message saying that 30780 facts were added to the data store.
Now, go to the beginning of this article for the instructions on how to create rules and run the SPARQL queries.
Once you are done with this demonstration, you can stop the RDFox Server by executing the command quit in the original terminal window.
References:
Exploring an RDF Graph Database
The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data-intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin-out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Enterprises (OSE) and Oxford University Innovation (OUI).