OWLReasoner Benchmark Framework

What kind of reasoners do we test?

The benchmark is aimed at reasoners implementing OWLReasoner as defined in the OWL API v. 3.1 and newer versions; this includes reasoners written in languages other than Java, as long as there is an available interface (e.g., FaCT++ and its JNI interface).

What do we measure?

The code is structured so that one ontology is loaded and one reasoner is used to reason over such ontology; more ontologies and more reasoners can be tested by running separate processes - this has the advantage of avoiding that a large ontology or a memory leak slow down tests for a set of ontologies or a set of reasoners.

The current test code executes the following operations:

Load the target ontology (not timed)
Create a new target reasoner (timed)
Precompute class hierarchy (timed amd memory profiled)
Run a series of operations meant to simulate the expected use of an OWLReasoner (all method calls to OWLReasoner are timed independently). The most important tests are:

a - Consistency check
b - List unsatisfiable classes
c - For each class in the ontology, check if it is satisfiable
d - For each class C in the ontology, get inferred superclasses SC and subclasses sC, and verify that the axioms SubClassOf(C, SC) and SubClassOf(sC, C) are in fact entailed
e - for all classes, data properties and object properties, exercise the methods to retrieve sub/super/disjoint/equivalent/classes/properties
f - for all individuals, get types, same individuals and different individuals, object property and data property values

Dataset

Currently the Bioportal ontologies are being used.

Extensions of the tests/datasets

Suggestions on how to enhance/add to these tests, and pointers to ontologies/datasets that would be interesting to include are very welcome. Please post them in the mailing list or in the issue trackers.