index
Where the database test cases are
Section titled “Where the database test cases are”There are two repositories to consider:
- Archived repository (https://github.com/kg-construct/rml-test-cases). This repo is marked as deprecated (“new test cases are published per module”) but it remains the only source for database-specific RML test cases.
- Test case directories spanning RMLTC0000 through RMLTC0020
- New modular repository (https://github.com/kg-construct/rml-core): currently only JSON variants.
- Contains 76 test cases spanning RMLTC0000 through RMLTC0031. Test cases RMLTC0021 through RMLTC0031 are entirely new and have no equivalent in the archived repo or in R2RML. Compared to the archived repo, the new repo also drops several test cases from the 0000-0020 range (detailed below).
Comparison with R2RML test cases
Section titled “Comparison with R2RML test cases”Our project currently uses 62 R2RML test cases (R2RMLTC0000 through R2RMLTC0020), matching the kg-construct/r2rml-test-cases-support repository. The W3C spec (https://www.w3.org/2001/sw/rdb2rdf/test-cases/) defines 63 test cases, but R2RMLTC0003a (“undefined SQL Version identifier”, an error test case) was never implemented in the support repository. The archived RML repo covers the same range (RMLTC0000 through RMLTC0020) with 60 PostgreSQL test cases.
Direct equivalents
Section titled “Direct equivalents”Most RML-PostgreSQL archived test cases map directly to R2RML test cases and test the same features:
| Feature area | R2RML IDs | RML IDs (archived) |
|---|---|---|
| Empty table | R2RMLTC0000 | RMLTC0000 |
| One column mapping | R2RMLTC0001a-b | RMLTC0001a-b |
| Two column variants | R2RMLTC0002a-j | RMLTC0002a-j |
| Three column mapping | R2RMLTC0003b-c | RMLTC0003a-c (RML adds 0003a) |
| Multiple triples from one row | R2RMLTC0004a-b | RMLTC0004a-b |
| Resource typing | R2RMLTC0005a-b | RMLTC0005a-b |
| Constants | R2RMLTC0006a | RMLTC0006a |
| Named graphs and typing | R2RMLTC0007a-h | RMLTC0007a-h |
| Composite keys, ref object maps | R2RMLTC0008a-c | RMLTC0008a-c |
| Foreign keys | R2RMLTC0009a-b | RMLTC0009a-b |
| Special characters | R2RMLTC0010a-c | RMLTC0010a-c |
| Many-to-many | R2RMLTC0011a-b | RMLTC0011a-b |
| Blank nodes | R2RMLTC0012a-e | RMLTC0012a-e |
| Null values | R2RMLTC0013a | RMLTC0013a |
| Language tags | R2RMLTC0015a-b | RMLTC0015a-b |
| SQL datatypes | R2RMLTC0016a-e | RMLTC0016a-e |
| CHAR type | R2RMLTC0018a | RMLTC0018a |
| IRI values | R2RMLTC0019a-b | RMLTC0019a-b |
| IRI errors | R2RMLTC0020a-b | RMLTC0020a-b |
The coverage is nearly complete. The legacy RML repo adapted R2RML features to RML vocabulary: for instance, RMLTC0009c-d use rml:query instead of R2RML’s rr:sqlQuery for SQL queries as logical sources (verified in https://github.com/kg-construct/rml-test-cases/blob/master/test-cases/RMLTC0009c-PostgreSQL/mapping.ttl). Similarly, RMLTC0002f-j exist as PostgreSQL variants and test the same SQL identifier scenarios as their R2RML counterparts.
The only R2RML test cases with no equivalent in either the legacy or new RML repos are R2RMLTC0014a-c, which test rr:inverseExpression, a construct that exists only in R2RML.
Conversely, RMLTC0003a exists in the RML legacy repo and corresponds to R2RMLTC0003a, which is defined in the W3C spec but was never added to the r2rml-test-cases-support repository.
The RML legacy repo therefore has 60 PostgreSQL test cases: 62 - 3 (no 0014a-c) + 1 (adds 0003a) = 60.
Test cases dropped from the new rml-core repo
Section titled “Test cases dropped from the new rml-core repo”The new rml-core repo (JSON-only) includes fewer test cases from the 0000-0020 range than the legacy repo.
Most dropped test cases depend on SQL-specific features with no JSON equivalent: RMLTC0002c-d, 0002f, 0002h-j, 0003a-b, 0009c-d, 0011a, 0014d, 0016a-e, and 0018a.
R2RMLTC0014a-c test rr:inverseExpression, an R2RML-only optimization construct that RML dropped entirely. R2RMLTC0005b tests “default mapping” generation for tables without a primary key, a concept from R2RML’s direct mapping that RML does not define — every RML mapping must be written explicitly (right?).
Whether the SQL-specific tests will reappear as database variants in a future release remains to be seen.
New test cases in rml-core (JSON only, no R2RML or legacy equivalent)
Section titled “New test cases in rml-core (JSON only, no R2RML or legacy equivalent)”The new rml-core repo introduces test cases RMLTC0021 through RMLTC0031 that have no counterpart in either R2RML or the legacy RML repo. After cross-referencing each test case description (from https://kg-construct.github.io/rml-core/test-cases/docs/) with the KGI inversion code:
| ID | What it tests | Would our system handle it? |
|---|---|---|
| RMLTC0021a | Self-join: RefObjectMap where parentTriplesMap points to the same TriplesMap, with a join on the same field (Sport = Sport) | Yes, the join extraction in triples.py is generic and does not check whether child and parent are the same source |
| RMLTC0022a | Fixed constant datatype on an object map (rml:datatype xsd:string) | Yes, morph-kgc parses this; the inversion pipeline ignores datatypes during query generation |
| RMLTC0022b-e | Dynamic datatypes via rml:datatypeMap: the datatype IRI is read from the data (reference), built from a template, or set as a constant through a separate map | No, no datatypeMap handling exists in the codebase |
| RMLTC0023a-e | Invalid IRI templates (5 error cases expecting failure) | Unclear, our system has an IRI validator in utils.py but it is never called during query generation |
| RMLTC0023f | Valid IRI template with backslash escape (expects success, not an error) | Probably yes |
| RMLTC0024a | Constant term map with conflicting explicit term type (rml:constant "School" + rml:termType rml:BlankNode) | No, my system does not validate term type consistency |
| RMLTC0025a-c | Array references in JSON (extracting values from nested arrays) | JSON-specific, not applicable to databases |
| RMLTC0026a-d | Base IRI resolution: relative IRIs in templates resolved against rml:baseIRI at the TriplesMap level | No, no base IRI handling in the codebase; all IRIs are treated as absolute |
| RMLTC0027a-c | Term types rml:IRI and rml:UnsafeIRI for subjects and objects (UnsafeIRI skips IRI validation) | Partially, our system generates IRI-based SPARQL patterns but does not distinguish between IRI and UnsafeIRI |
| RMLTC0028a | Constant boolean object (rml:object true): the processor must preserve the implicit xsd:boolean datatype from Turtle syntax | Yes, treated as a constant literal |
| RMLTC0028b | Mixed graph assignment within one TriplesMap: subjectMap targets a named graph while the predicateObjectMap uses rml:defaultGraph, so the same triple appears in both the default and named graph | Unclear, needs testing against our graph map handling |
| RMLTC0028c | Constant object with explicit language tag ("Venus"@en) | Yes, treated as a constant literal |
| RMLTC0029a | Shortcut syntax rml:subject ex:example (equivalent to a constant subject map) with a reference-valued object | No, SubjectTriple.generate() has no case for RML_CONSTANT subjects. The existing check_for_constant_only_mappings() only rejects mappings where every component is a constant, so this mapping would pass validation (the object is a reference) but fail during SPARQL generation |
| RMLTC0030a-f | Joins using rml:parentMap/rml:childMap (reference-valued, template-valued, and constant-valued variants) | No, the current join handling reads rml:parent/rml:child as simple field references; parentMap/childMap with templates or constants is a new construct |
| RMLTC0031a-c | Dynamic language tags via rml:languageMap (constant, reference, or template-valued) | No, no languageMap handling in the codebase |
Vocabulary differences between legacy and new RML
Section titled “Vocabulary differences between legacy and new RML”The legacy RML mappings use a hybrid vocabulary that mixes R2RML and old RML namespaces. Here is the full mapping for RMLTC0001a-JSON (https://github.com/kg-construct/rml-test-cases/blob/master/test-cases/RMLTC0001a-JSON/mapping.ttl):
@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .@prefix ex: <http://example.com/> .@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .@prefix rml: <http://semweb.mmlab.be/ns/rml#> .@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@base <http://example.com/base/> .
<TriplesMap1> a rr:TriplesMap; rml:logicalSource [ rml:source "student.json"; rml:referenceFormulation ql:JSONPath; rml:iterator "$.students[*]" ] ; rr:subjectMap [ rr:template "http://example.com/{Name}" ]; rr:predicateObjectMap [ rr:predicate foaf:name ; rr:objectMap [ rml:reference "Name" ] ].The new RML mappings are syntactically very different. The same test case (RMLTC0001a-JSON, https://github.com/kg-construct/rml-core/blob/main/test-cases/RMLTC0001a-JSON/mapping.ttl):
@prefix foaf: <http://xmlns.com/foaf/0.1/> .@prefix rml: <http://w3id.org/rml/> .
<http://example.com/base/TriplesMap1> a rml:TriplesMap; rml:logicalSource [ a rml:LogicalSource; rml:iterator "$.students[*]"; rml:referenceFormulation rml:JSONPath; rml:source [ a rml:RelativePathSource; rml:root rml:MappingDirectory; rml:path "student.json" ] ]; rml:predicateObjectMap [ rml:objectMap [ rml:reference "$.Name" ]; rml:predicate foaf:name ]; rml:subjectMap [ rml:template "http://example.com/{$.Name}" ] .| Aspect | Legacy | New |
|---|---|---|
| RML namespace | http://semweb.mmlab.be/ns/rml# | http://w3id.org/rml/ |
| TriplesMap type | rr:TriplesMap | rml:TriplesMap |
| Subject/predicate/object map properties | rr:subjectMap, rr:predicateObjectMap, rr:objectMap | rml:subjectMap, rml:predicateObjectMap, rml:objectMap |
| Template property | rr:template | rml:template |
| Predicate property | rr:predicate | rml:predicate |
| Reference formulation vocabulary | ql:JSONPath (separate namespace) | rml:JSONPath (same namespace) |
| Reference syntax | "Name" (plain field name) | "$.Name" (full JSONPath expression) |
| Template placeholders | {Name} | {$.Name} |
| Source declaration | rml:source "student.json" (string literal) | rml:RelativePathSource with rml:root + rml:path (structured) |
| TriplesMap IRI | <TriplesMap1> (relative to @base) | <http://example.com/base/TriplesMap1> (absolute) |
Luckily, our system never reads mapping files directly (except for two validation functions discussed below). The mapping parsing is delegated to morph-kgc, which (hopefully) normalizes both the legacy and new vocabularies to the same internal DataFrame representation
Reviewing the KGI codebase, the inversion pipeline is largely vocabulary-agnostic because morph-kgc handles normalization. There are, however, two functions in core.py that parse mapping files directly using the R2RML namespace:
check_for_sql_queries(): looks forrr:sqlQuerytriplescheck_for_multiple_subject_maps(): looks forrr:TriplesMapandrr:subjectMap
Implementation
Section titled “Implementation”refactor: introduce test suite abstraction for multi-suite support
Extract a TestSuite base class with R2RML and RML implementations, replacing hardcoded R2RML paths and manifest queries throughout app.py and test.py. The web interface now supports suite selection and runs tests from either or both suites. MySQL support dropped in favour of PostgreSQL-only operation.
Performance improvements
Section titled “Performance improvements”refactor: replace SPARQLWrapper with sparqlite
https://opencitations.github.io/sparqlite/architecture/benchmarks/
perf: replace rdflib with pyoxigraph for RDF parsing and SPARQL queries on local store
Replacing RDFLib with pyoxigraph for local RDF parsing introduced stricter IRI validation. pyoxigraph enforces RFC 3986 compliance when constructing NamedNode instances, while RDFLib’s URIRef silently accepted malformed IRIs without any validation.
<Juan Daniel> <http://xmlns.com/foaf/0.1/name> "Juan Daniel" .<Carlos> <http://xmlns.com/foaf/0.1/name> "Carlos" .Documentation on GitHub
Section titled “Documentation on GitHub”https://arcangelo7.github.io/knowledge-graphs-inversion/
Questions
Section titled “Questions”Should we allow inversion of malformed RDF? pyoxigraph now correctly rejects these inputs, and the tool classifies them as “non-invertible”. However, morph-kgc itself is permissive in both directions: it does not fully validate R2RML/RML mapping input, and it does not validate the RDF it produces as output. Invalid mappings can be processed, and the resulting RDF may contain malformed IRIs.