What kind of reasoners do we test?
The benchmark is aimed at reasoners implementing OWLReasoner as defined in the OWL API v. 3.1 and newer versions; this includes reasoners written in languages other than Java, as long as there is an available interface (e.g., FaCT++ and its JNI interface).
What do we measure?
The code is structured so that one ontology is loaded and one reasoner is used to reason over such ontology; more ontologies and more reasoners can be tested by running separate processes - this has the advantage of avoiding that a large ontology or a memory leak slow down tests for a set of ontologies or a set of reasoners.
The current test code executes the following operations:
- Load the target ontology (not timed)
- Create a new target reasoner (timed)
- Precompute class hierarchy (timed amd memory profiled)
- Run a series of operations meant to simulate the expected use of an OWLReasoner (all method calls to OWLReasoner are timed independently). The most important tests are:
- a - Consistency check
- b - List unsatisfiable classes
- c - For each class in the ontology, check if it is satisfiable
- d - For each class C in the ontology, get inferred superclasses SC and subclasses sC, and verify that the axioms SubClassOf(C, SC) and SubClassOf(sC, C) are in fact entailed
- e - for all classes, data properties and object properties, exercise the methods to retrieve sub/super/disjoint/equivalent/classes/properties
- f - for all individuals, get types, same individuals and different individuals, object property and data property values
Dataset
Currently the Bioportal ontologies are being used.
Extensions of the tests/datasets
Suggestions on how to enhance/add to these tests, and pointers to ontologies/datasets that would be interesting to include are very welcome. Please post them in the mailing list or in the issue trackers.