Skip to content

Usage

The tool exposes a single entry point: the reconstruct() function. It takes an R2RML or RML mapping file and an RDF graph, runs the inversion, and returns the reconstructed data.

import kgi
result = kgi.reconstruct(
mapping="mapping.ttl",
rdf_graph="output.nq",
)

The function returns a dictionary keyed by source table name. Each value is a ReconstructedTable with three attributes:

  • sparql_query: the SPARQL query that was executed against the RDF graph to extract the data.
  • sql: the SQL statements (CREATE TABLE + INSERT) that reproduce the original table.
  • data: a pandas DataFrame with the reconstructed rows.
for table_name, table in result.items():
print(f"--- {table_name} ---")
print(table.sparql_query)
print(table.sql)
print(table.data)

The algorithm infers column types and ordering on its own, but when a source database is available it can read the original schema to resolve ambiguous cases:

result = kgi.reconstruct(
mapping="mapping.ttl",
rdf_graph="output.nq",
source_db_url="postgresql+psycopg2://user:password@localhost:5432/source_db",
)

With RML mappings that contain D2RQ database definitions, the connection info is extracted directly from the mapping file. When the mapping includes a block like this:

<#DB_source> a d2rq:Database;
d2rq:jdbcDSN "jdbc:postgresql://localhost:5432/mydb";
d2rq:username "user";
d2rq:password "pass" .

the JDBC DSN, username, and password are converted to a SQLAlchemy URL and used as source_db_url automatically. If you pass source_db_url explicitly, it takes precedence over whatever the mapping says.

To write the reconstructed tables to a database, pass a SQLAlchemy connection string via dest_db_url:

result = kgi.reconstruct(
mapping="mapping.ttl",
rdf_graph="output.nq",
dest_db_url="postgresql+psycopg2://user:password@localhost:5432/restored_db",
)

By default, the function reads RDF from the local file specified by rdf_graph. To query a remote endpoint instead:

result = kgi.reconstruct(
mapping="mapping.ttl",
rdf_graph="output.nq",
sparql_endpoint="http://localhost:8890/sparql",
)

Not every mapping can be inverted. When the function encounters an unsupported or non-invertible case, it raises an exception:

from kgi import MappingError, UnsupportedMappingError, NonInvertibleError, NoDataError
try:
result = kgi.reconstruct(mapping="mapping.ttl", rdf_graph="output.nq")
except UnsupportedMappingError as e:
print(f"Unsupported: {e}")
except MappingError as e:
print(f"Invalid mapping: {e}")
except NonInvertibleError as e:
print(f"Non-invertible: {e}")
except NoDataError as e:
print(f"No data: {e}")

The exceptions are:

ExceptionMeaning
UnsupportedMappingErrorThe mapping uses SQL queries as logical tables, which the algorithm does not handle.
MappingErrorThe mapping document is syntactically invalid or violates the R2RML specification.
NonInvertibleErrorThe mapping is valid but the transformation is not reversible. See limitations.
NoDataErrorThe SPARQL queries returned no results, or the RDF input file does not exist.