Skip to content

index

There are two repositories to consider:

  1. Archived repository (https://github.com/kg-construct/rml-test-cases). This repo is marked as deprecated (“new test cases are published per module”) but it remains the only source for database-specific RML test cases.
    • Test case directories spanning RMLTC0000 through RMLTC0020
  2. New modular repository (https://github.com/kg-construct/rml-core): currently only JSON variants.
    • Contains 76 test cases spanning RMLTC0000 through RMLTC0031. Test cases RMLTC0021 through RMLTC0031 are entirely new and have no equivalent in the archived repo or in R2RML. Compared to the archived repo, the new repo also drops several test cases from the 0000-0020 range (detailed below).

Our project currently uses 62 R2RML test cases (R2RMLTC0000 through R2RMLTC0020), matching the kg-construct/r2rml-test-cases-support repository. The W3C spec (https://www.w3.org/2001/sw/rdb2rdf/test-cases/) defines 63 test cases, but R2RMLTC0003a (“undefined SQL Version identifier”, an error test case) was never implemented in the support repository. The archived RML repo covers the same range (RMLTC0000 through RMLTC0020) with 60 PostgreSQL test cases.

Most RML-PostgreSQL archived test cases map directly to R2RML test cases and test the same features:

Feature areaR2RML IDsRML IDs (archived)
Empty tableR2RMLTC0000RMLTC0000
One column mappingR2RMLTC0001a-bRMLTC0001a-b
Two column variantsR2RMLTC0002a-jRMLTC0002a-j
Three column mappingR2RMLTC0003b-cRMLTC0003a-c (RML adds 0003a)
Multiple triples from one rowR2RMLTC0004a-bRMLTC0004a-b
Resource typingR2RMLTC0005a-bRMLTC0005a-b
ConstantsR2RMLTC0006aRMLTC0006a
Named graphs and typingR2RMLTC0007a-hRMLTC0007a-h
Composite keys, ref object mapsR2RMLTC0008a-cRMLTC0008a-c
Foreign keysR2RMLTC0009a-bRMLTC0009a-b
Special charactersR2RMLTC0010a-cRMLTC0010a-c
Many-to-manyR2RMLTC0011a-bRMLTC0011a-b
Blank nodesR2RMLTC0012a-eRMLTC0012a-e
Null valuesR2RMLTC0013aRMLTC0013a
Language tagsR2RMLTC0015a-bRMLTC0015a-b
SQL datatypesR2RMLTC0016a-eRMLTC0016a-e
CHAR typeR2RMLTC0018aRMLTC0018a
IRI valuesR2RMLTC0019a-bRMLTC0019a-b
IRI errorsR2RMLTC0020a-bRMLTC0020a-b

The coverage is nearly complete. The legacy RML repo adapted R2RML features to RML vocabulary: for instance, RMLTC0009c-d use rml:query instead of R2RML’s rr:sqlQuery for SQL queries as logical sources (verified in https://github.com/kg-construct/rml-test-cases/blob/master/test-cases/RMLTC0009c-PostgreSQL/mapping.ttl). Similarly, RMLTC0002f-j exist as PostgreSQL variants and test the same SQL identifier scenarios as their R2RML counterparts.

The only R2RML test cases with no equivalent in either the legacy or new RML repos are R2RMLTC0014a-c, which test rr:inverseExpression, a construct that exists only in R2RML.

Conversely, RMLTC0003a exists in the RML legacy repo and corresponds to R2RMLTC0003a, which is defined in the W3C spec but was never added to the r2rml-test-cases-support repository.

The RML legacy repo therefore has 60 PostgreSQL test cases: 62 - 3 (no 0014a-c) + 1 (adds 0003a) = 60.

Test cases dropped from the new rml-core repo

Section titled “Test cases dropped from the new rml-core repo”

The new rml-core repo (JSON-only) includes fewer test cases from the 0000-0020 range than the legacy repo.

Most dropped test cases depend on SQL-specific features with no JSON equivalent: RMLTC0002c-d, 0002f, 0002h-j, 0003a-b, 0009c-d, 0011a, 0014d, 0016a-e, and 0018a.

R2RMLTC0014a-c test rr:inverseExpression, an R2RML-only optimization construct that RML dropped entirely. R2RMLTC0005b tests “default mapping” generation for tables without a primary key, a concept from R2RML’s direct mapping that RML does not define — every RML mapping must be written explicitly (right?).

Whether the SQL-specific tests will reappear as database variants in a future release remains to be seen.

New test cases in rml-core (JSON only, no R2RML or legacy equivalent)

Section titled “New test cases in rml-core (JSON only, no R2RML or legacy equivalent)”

The new rml-core repo introduces test cases RMLTC0021 through RMLTC0031 that have no counterpart in either R2RML or the legacy RML repo. After cross-referencing each test case description (from https://kg-construct.github.io/rml-core/test-cases/docs/) with the KGI inversion code:

IDWhat it testsWould our system handle it?
RMLTC0021aSelf-join: RefObjectMap where parentTriplesMap points to the same TriplesMap, with a join on the same field (Sport = Sport)Yes, the join extraction in triples.py is generic and does not check whether child and parent are the same source
RMLTC0022aFixed constant datatype on an object map (rml:datatype xsd:string)Yes, morph-kgc parses this; the inversion pipeline ignores datatypes during query generation
RMLTC0022b-eDynamic datatypes via rml:datatypeMap: the datatype IRI is read from the data (reference), built from a template, or set as a constant through a separate mapNo, no datatypeMap handling exists in the codebase
RMLTC0023a-eInvalid IRI templates (5 error cases expecting failure)Unclear, our system has an IRI validator in utils.py but it is never called during query generation
RMLTC0023fValid IRI template with backslash escape (expects success, not an error)Probably yes
RMLTC0024aConstant term map with conflicting explicit term type (rml:constant "School" + rml:termType rml:BlankNode)No, my system does not validate term type consistency
RMLTC0025a-cArray references in JSON (extracting values from nested arrays)JSON-specific, not applicable to databases
RMLTC0026a-dBase IRI resolution: relative IRIs in templates resolved against rml:baseIRI at the TriplesMap levelNo, no base IRI handling in the codebase; all IRIs are treated as absolute
RMLTC0027a-cTerm types rml:IRI and rml:UnsafeIRI for subjects and objects (UnsafeIRI skips IRI validation)Partially, our system generates IRI-based SPARQL patterns but does not distinguish between IRI and UnsafeIRI
RMLTC0028aConstant boolean object (rml:object true): the processor must preserve the implicit xsd:boolean datatype from Turtle syntaxYes, treated as a constant literal
RMLTC0028bMixed graph assignment within one TriplesMap: subjectMap targets a named graph while the predicateObjectMap uses rml:defaultGraph, so the same triple appears in both the default and named graphUnclear, needs testing against our graph map handling
RMLTC0028cConstant object with explicit language tag ("Venus"@en)Yes, treated as a constant literal
RMLTC0029aShortcut syntax rml:subject ex:example (equivalent to a constant subject map) with a reference-valued objectNo, SubjectTriple.generate() has no case for RML_CONSTANT subjects. The existing check_for_constant_only_mappings() only rejects mappings where every component is a constant, so this mapping would pass validation (the object is a reference) but fail during SPARQL generation
RMLTC0030a-fJoins using rml:parentMap/rml:childMap (reference-valued, template-valued, and constant-valued variants)No, the current join handling reads rml:parent/rml:child as simple field references; parentMap/childMap with templates or constants is a new construct
RMLTC0031a-cDynamic language tags via rml:languageMap (constant, reference, or template-valued)No, no languageMap handling in the codebase

Vocabulary differences between legacy and new RML

Section titled “Vocabulary differences between legacy and new RML”

The legacy RML mappings use a hybrid vocabulary that mixes R2RML and old RML namespaces. Here is the full mapping for RMLTC0001a-JSON (https://github.com/kg-construct/rml-test-cases/blob/master/test-cases/RMLTC0001a-JSON/mapping.ttl):

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex: <http://example.com/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@base <http://example.com/base/> .
<TriplesMap1>
a rr:TriplesMap;
rml:logicalSource [
rml:source "student.json";
rml:referenceFormulation ql:JSONPath;
rml:iterator "$.students[*]"
] ;
rr:subjectMap [
rr:template "http://example.com/{Name}"
];
rr:predicateObjectMap [
rr:predicate foaf:name ;
rr:objectMap [
rml:reference "Name"
]
].

The new RML mappings are syntactically very different. The same test case (RMLTC0001a-JSON, https://github.com/kg-construct/rml-core/blob/main/test-cases/RMLTC0001a-JSON/mapping.ttl):

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rml: <http://w3id.org/rml/> .
<http://example.com/base/TriplesMap1> a rml:TriplesMap;
rml:logicalSource [ a rml:LogicalSource;
rml:iterator "$.students[*]";
rml:referenceFormulation rml:JSONPath;
rml:source [ a rml:RelativePathSource;
rml:root rml:MappingDirectory;
rml:path "student.json"
]
];
rml:predicateObjectMap [
rml:objectMap [
rml:reference "$.Name"
];
rml:predicate foaf:name
];
rml:subjectMap [
rml:template "http://example.com/{$.Name}"
] .
AspectLegacyNew
RML namespacehttp://semweb.mmlab.be/ns/rml#http://w3id.org/rml/
TriplesMap typerr:TriplesMaprml:TriplesMap
Subject/predicate/object map propertiesrr:subjectMap, rr:predicateObjectMap, rr:objectMaprml:subjectMap, rml:predicateObjectMap, rml:objectMap
Template propertyrr:templaterml:template
Predicate propertyrr:predicaterml:predicate
Reference formulation vocabularyql:JSONPath (separate namespace)rml:JSONPath (same namespace)
Reference syntax"Name" (plain field name)"$.Name" (full JSONPath expression)
Template placeholders{Name}{$.Name}
Source declarationrml:source "student.json" (string literal)rml:RelativePathSource with rml:root + rml:path (structured)
TriplesMap IRI<TriplesMap1> (relative to @base)<http://example.com/base/TriplesMap1> (absolute)

Luckily, our system never reads mapping files directly (except for two validation functions discussed below). The mapping parsing is delegated to morph-kgc, which (hopefully) normalizes both the legacy and new vocabularies to the same internal DataFrame representation

Reviewing the KGI codebase, the inversion pipeline is largely vocabulary-agnostic because morph-kgc handles normalization. There are, however, two functions in core.py that parse mapping files directly using the R2RML namespace:

  • check_for_sql_queries(): looks for rr:sqlQuery triples
  • check_for_multiple_subject_maps(): looks for rr:TriplesMap and rr:subjectMap
arcangelo7
arcangelo7Mar 18, 2026 · arcangelo7/knowledge-graphs-inversion

refactor: introduce test suite abstraction for multi-suite support

Extract a TestSuite base class with R2RML and RML implementations, replacing hardcoded R2RML paths and manifest queries throughout app.py and test.py. The web interface now supports suite selection and runs tests from either or both suites. MySQL support dropped in favour of PostgreSQL-only operation.

+702-598f5a38d8
arcangelo7
arcangelo7Mar 19, 2026 · arcangelo7/knowledge-graphs-inversion

refactor: replace SPARQLWrapper with sparqlite

+138-3935c33fda

https://opencitations.github.io/sparqlite/architecture/benchmarks/

arcangelo7
arcangelo7Mar 20, 2026 · arcangelo7/knowledge-graphs-inversion

perf: replace rdflib with pyoxigraph for RDF parsing and SPARQL queries on local store

+285-231fa02827

Replacing RDFLib with pyoxigraph for local RDF parsing introduced stricter IRI validation. pyoxigraph enforces RFC 3986 compliance when constructing NamedNode instances, while RDFLib’s URIRef silently accepted malformed IRIs without any validation.

<Juan Daniel> <http://xmlns.com/foaf/0.1/name> "Juan Daniel" .
<Carlos> <http://xmlns.com/foaf/0.1/name> "Carlos" .

https://arcangelo7.github.io/knowledge-graphs-inversion/

Should we allow inversion of malformed RDF? pyoxigraph now correctly rejects these inputs, and the tool classifies them as “non-invertible”. However, morph-kgc itself is permissive in both directions: it does not fully validate R2RML/RML mapping input, and it does not validate the RDF it produces as output. Invalid mappings can be processed, and the resulting RDF may contain malformed IRIs.