OWLReasoner Benchmark Framework: The tests

What kind of reasoners do we test?

The benchmark is aimed at reasoners implementing OWLReasoner as defined in the OWL API v. 3.1 and newer versions; this includes reasoners written in languages other than Java, as long as there is an available interface (e.g., FaCT++ and its JNI interface).

What do we measure?

The code is structured so that one ontology is loaded and one reasoner is used to reason over such ontology; more ontologies and more reasoners can be tested by running separate processes - this has the advantage of avoiding that a large ontology or a memory leak slow down tests for a set of ontologies or a set of reasoners.

The current test code executes the following operations:

  1. Load the target ontology (not timed)
  2. Create a new target reasoner (timed)
  3. Precompute class hierarchy (timed amd memory profiled)
  4. Run a series of operations meant to simulate the expected use of an OWLReasoner (all method calls to OWLReasoner are timed independently). The most important tests are:
  • a - Consistency check
  • b - List unsatisfiable classes
  • c - For each class in the ontology, check if it is satisfiable
  • d - For each class C in the ontology, get inferred superclasses SC and subclasses sC, and verify that the axioms SubClassOf(C, SC) and SubClassOf(sC, C) are in fact entailed
  • e - for all classes, data properties and object properties, exercise the methods to retrieve sub/super/disjoint/equivalent/classes/properties
  • f - for all individuals, get types, same individuals and different individuals, object property and data property values

Dataset

Currently the Bioportal ontologies are being used.

Extensions of the tests/datasets

Suggestions on how to enhance/add to these tests, and pointers to ontologies/datasets that would be interesting to include are very welcome. Please post them in the mailing list or in the issue trackers.