Usage
The tool exposes a single entry point: the reconstruct() function. It takes an R2RML or RML mapping file and an RDF graph, runs the inversion, and returns the reconstructed data.
Basic invocation
Section titled “Basic invocation”import kgi
result = kgi.reconstruct( mapping="mapping.ttl", rdf_graph="output.nq",)The function returns a dictionary keyed by source table name. Each value is a ReconstructedTable with three attributes:
sparql_query: the SPARQL query that was executed against the RDF graph to extract the data.sql: the SQL statements (CREATE TABLE + INSERT) that reproduce the original table.data: a pandas DataFrame with the reconstructed rows.
for table_name, table in result.items(): print(f"--- {table_name} ---") print(table.sparql_query) print(table.sql) print(table.data)Using the source database schema
Section titled “Using the source database schema”The algorithm infers column types and ordering on its own, but when a source database is available it can read the original schema to resolve ambiguous cases:
result = kgi.reconstruct( mapping="mapping.ttl", rdf_graph="output.nq", source_db_url="postgresql+psycopg2://user:password@localhost:5432/source_db",)With RML mappings that contain D2RQ database definitions, the connection info is extracted directly from the mapping file. When the mapping includes a block like this:
<#DB_source> a d2rq:Database; d2rq:jdbcDSN "jdbc:postgresql://localhost:5432/mydb"; d2rq:username "user"; d2rq:password "pass" .the JDBC DSN, username, and password are converted to a SQLAlchemy URL and used as source_db_url automatically. If you pass source_db_url explicitly, it takes precedence over whatever the mapping says.
Writing output to a separate database
Section titled “Writing output to a separate database”To write the reconstructed tables to a database, pass a SQLAlchemy connection string via dest_db_url:
result = kgi.reconstruct( mapping="mapping.ttl", rdf_graph="output.nq", dest_db_url="postgresql+psycopg2://user:password@localhost:5432/restored_db",)Querying a remote SPARQL endpoint
Section titled “Querying a remote SPARQL endpoint”By default, the function reads RDF from the local file specified by rdf_graph. To query a remote endpoint instead:
result = kgi.reconstruct( mapping="mapping.ttl", rdf_graph="output.nq", sparql_endpoint="http://localhost:8890/sparql",)Error handling
Section titled “Error handling”Not every mapping can be inverted. When the function encounters an unsupported or non-invertible case, it raises an exception:
from kgi import MappingError, UnsupportedMappingError, NonInvertibleError, NoDataError
try: result = kgi.reconstruct(mapping="mapping.ttl", rdf_graph="output.nq")except UnsupportedMappingError as e: print(f"Unsupported: {e}")except MappingError as e: print(f"Invalid mapping: {e}")except NonInvertibleError as e: print(f"Non-invertible: {e}")except NoDataError as e: print(f"No data: {e}")The exceptions are:
| Exception | Meaning |
|---|---|
UnsupportedMappingError | The mapping uses SQL queries as logical tables, which the algorithm does not handle. |
MappingError | The mapping document is syntactically invalid or violates the R2RML specification. |
NonInvertibleError | The mapping is valid but the transformation is not reversible. See limitations. |
NoDataError | The SPARQL queries returned no results, or the RDF input file does not exist. |