Benchmarking
The project integrates the KROWN benchmark framework to measure inversion performance across different data scales and mapping complexities. KROWN generates synthetic relational data and mappings, runs the inversion, and collects execution times.
Initialize the submodule:
git submodule update --init --recursiveRunning benchmarks
Section titled “Running benchmarks”Benchmarks run through a dedicated Docker Compose file that spins up PostgreSQL and (optionally) a Virtuoso SPARQL endpoint:
docker compose -f docker-compose.benchmark.yml upMultiple iterations
Section titled “Multiple iterations”For statistically meaningful results, run multiple iterations. The framework computes mean, median, standard deviation, confidence intervals, and generates box plots:
docker compose -f docker-compose.benchmark.yml run benchmark benchmark --iterations 10Without Virtuoso
Section titled “Without Virtuoso”To skip the Virtuoso endpoint and query RDF files directly in memory with pyoxigraph:
docker compose -f docker-compose.benchmark.yml run benchmark benchmark --no-virtuosoStopping services
Section titled “Stopping services”docker compose -f docker-compose.benchmark.yml downResults
Section titled “Results”Benchmark output goes to benchmarks/krown/results/ and includes:
- Execution times per scenario (JSON)
- Statistical summaries when running multiple iterations
- Box plot visualizations (PNG) for time distributions
- Data and mapping file sizes
- Counts of triples maps and predicate-object maps per scenario
What gets measured
Section titled “What gets measured”Each benchmark scenario consists of a generated relational database, an R2RML mapping, and the RDF graph produced by the forward transformation. The benchmark measures the time to parse the mapping, generate the SPARQL queries, execute them against the RDF graph, and reconstruct the relational output. The forward transformation time (Morph-KGC) is measured separately to isolate the inversion overhead.
Scenarios scale along two axes: the number of rows in the source tables and the complexity of the mapping (number of triples maps and predicate-object maps).
Latest results
Section titled “Latest results”Results from 100 iterations with Virtuoso as the SPARQL endpoint, on three scenarios of increasing complexity. The scenario name indicates the number of triples maps and predicate-object maps per triples map (e.g., 3x2 = 3 triples maps with 2 predicate-object maps each):
| Scenario | Triples | Morph-KGC | Inversion | Total |
|---|---|---|---|---|
| 3x2 | 6,000 | 0.89 ± 0.06s | 7.33 ± 3.98s | 8.31 ± 4.02s |
| 5x3 | 150,000 | 1.22 ± 0.06s | 3.05 ± 0.23s | 4.61 ± 0.27s |
| 8x5 | 2,000,000 | 5.63 ± 0.71s | 22.68 ± 4.23s | 30.73 ± 4.24s |