Skip to content

Benchmarking

The project integrates the KROWN benchmark framework to measure inversion performance across different data scales and mapping complexities. KROWN generates synthetic relational data and mappings, runs the inversion, and collects execution times.

Initialize the submodule:

Terminal window
git submodule update --init --recursive

Benchmarks run through a dedicated Docker Compose file that spins up PostgreSQL and (optionally) a Virtuoso SPARQL endpoint:

Terminal window
docker compose -f docker-compose.benchmark.yml up

For statistically meaningful results, run multiple iterations. The framework computes mean, median, standard deviation, confidence intervals, and generates box plots:

Terminal window
docker compose -f docker-compose.benchmark.yml run benchmark benchmark --iterations 10

To skip the Virtuoso endpoint and query RDF files directly in memory with pyoxigraph:

Terminal window
docker compose -f docker-compose.benchmark.yml run benchmark benchmark --no-virtuoso
Terminal window
docker compose -f docker-compose.benchmark.yml down

Benchmark output goes to benchmarks/krown/results/ and includes:

  • Execution times per scenario (JSON)
  • Statistical summaries when running multiple iterations
  • Box plot visualizations (PNG) for time distributions
  • Data and mapping file sizes
  • Counts of triples maps and predicate-object maps per scenario

Each benchmark scenario consists of a generated relational database, an R2RML mapping, and the RDF graph produced by the forward transformation. The benchmark measures the time to parse the mapping, generate the SPARQL queries, execute them against the RDF graph, and reconstruct the relational output. The forward transformation time (Morph-KGC) is measured separately to isolate the inversion overhead.

Scenarios scale along two axes: the number of rows in the source tables and the complexity of the mapping (number of triples maps and predicate-object maps).

Results from 100 iterations with Virtuoso as the SPARQL endpoint, on three scenarios of increasing complexity. The scenario name indicates the number of triples maps and predicate-object maps per triples map (e.g., 3x2 = 3 triples maps with 2 predicate-object maps each):

ScenarioTriplesMorph-KGCInversionTotal
3x26,0000.89 ± 0.06s7.33 ± 3.98s8.31 ± 4.02s
5x3150,0001.22 ± 0.06s3.05 ± 0.23s4.61 ± 0.27s
8x52,000,0005.63 ± 0.71s22.68 ± 4.23s30.73 ± 4.24s