2026-05-05 Triplelite in C
La Novitade
Section titled “La Novitade”RAMOSE
Section titled “RAMOSE”refactor(skgif): use full URIs as product identifiers instead of OMID shorthand
feat: add #default_format field to override CSV default per operation
feat: add #custom_params field for addon-handled query parameters
Allow operations to declare custom query string parameters that are processed by addon functions instead of the built-in pipeline. Each parameter specifies a handler, processing phase (preprocess or postprocess), and description.
Preprocess handlers generate SPARQL fragments injected via [[name]] placeholders. Postprocess handlers transform the result table after built-in filters. When a custom parameter name collides with a built-in (filter, sort, require), the built-in behavior is disabled.
feat(skgif): expand product filter with contributor and type criteria
Support filtering by contributor attributes (family name, given name, ORCID, identifier scheme, local identifier, organization name) and by product type.
test(skgif): validate converter output against SHACL shapes
Also, the converter now normalizes partial dates to full dates before output.
feat(skgif): add citation filters via directive injection into query templates
Placeholders can be placed anywhere in the #sparql block, and the engine resolves them before checking for @@ directives.
Four new citation filters (cf.cites, cf.cited_by, cf.cites_doi, cf.cited_by_doi) use this mechanism to federate across Meta and Index endpoints.
fix(api_manager): return 400 instead of 404 for invalid parameter values
When an operation exists but the parameter value doesn't match the expected type regex (e.g. a DOI where an ORCID is expected), the error now correctly reports an invalid parameter (400) rather than a missing operation (404). Empty parameters are also caught with a specific message.
Closes: #19
L’ingestione di OpenAlex è andata out of memory a causa del db Redis per i contatori di dati e provenance che ha superato i 100GB in RAM. Non è più sostenibile, conviene tornare al vecchio sistema basato su file. Vediamo di quanto degradano le performance. Al massimo sostituisco i file con un db relazionale.
refactor!: replace redis counters with filesystem-based counter handler
Uno più uno
Section titled “Uno più uno”Posso fare come ho sempre fatto
?br_uri datacite:hasIdentifier ?id .?id literal:hasLiteralValue "9781402096327" .Oppure posso usare un blank node
?br_uri datacite:hasIdentifier [ literal:hasLiteralValue "9781402096327" ] .Triplelite
Section titled “Triplelite”build: add initial _core.c with primitives (StringArray, RDFTermArray)
Also switch from hatchling to meson-python build system to support C compilation
feat: add RDFTerm hashmap and string/RDFTerm interners
feat: add integer set and SPO triple index
Open-addressing IntSet with linear probing
feat: implement TripleLite C extension with Python bindings
Expose the C engine as a CPython type with add, remove, triples, objects, predicate_objects, subjects, and has_subject methods. Supports len, contains, and iter via a custom iterator that walks the SPO index.
Add memory ownership throughout: strdup in hashmap/dynarray/intset, deep-copy for RDFTerm, and corresponding free functions for all data structures.
| Operazione | Python | C | Rapporto |
|---|---|---|---|
| add_many | 1.05M triple/s | 19.8K triple/s | C 53× piu lento |
| add_single | 991K triple/s | 17.9K triple/s | C 55× piu lento |
| predicate_objects | 1.14 μs/chiamata | 819 μs/chiamata | C 719× piu lento |
| subjects | 0.75 μs/chiamata | 156 μs/chiamata | C 209× piu lento |
| objects | 0.78 μs/chiamata | 990 μs/chiamata | C 1.269× piu lento |
| has_subject | 0.14 μs/chiamata | 71.5 μs/chiamata | C 502× piu lento |
| contains | 0.50 μs/chiamata | 143 μs/chiamata | C 286× piu lento |
| full_scan | 4.56M triple/s | 2.05M triple/s | C 2.2× piu lento |
| Python | C | Rapporto |
|---|---|---|
| 373.0 byte/tripla | 295.2 byte/tripla | C 1.3× in meno |
perf: add dynamic resizing to all hash tables and optimize query lookups
perf: replace chained hash tables with open-addressing
| Operazione | Python | C | Rapporto |
|---|---|---|---|
| add_many | 1.07M triple/s | 2.16M triple/s | C 2.0× più veloce |
| add_single | 1.00M triple/s | 1.24M triple/s | C 1.2× più veloce |
| predicate_objects | 1.38 μs/chiamata | 2.13 μs/chiamata | C 0.7× |
| subjects | 0.76 μs/chiamata | 0.80 μs/chiamata | C 0.9× |
| objects | 0.79 μs/chiamata | 1.01 μs/chiamata | C 0.8× |
| has_subject | 0.14 μs/chiamata | 0.14 μs/chiamata | C 1.0× |
| contains | 0.42 μs/chiamata | 0.52 μs/chiamata | C 0.8× |
| subgraph | 1.58 μs/chiamata | 3.80 μs/chiamata | C 0.4× |
| full_scan | 5.68M triple/s | 2.47M triple/s | C 0.4× |
| Python | C | Rapporto |
|---|---|---|
| 373.0 byte/tripla | 167.6 byte/tripla | C 2.2× in meno |
ci: add cross-platform wheel builds
Replace single-platform uv build with cibuildwheel for building wheels across Linux (x86_64, aarch64, musl), macOS (x86_64, arm64), and Windows (AMD64, ARM64) for Python 3.10-3.13.
Add multi-OS matrix to test workflow.
Domande
Section titled “Domande”- In Matilda ci sono tante risorse identificate solo dall’id arXiv, non riconciliato al DOI della versione pubblicata. Va bene così?
- SKG-IF impose ISO 8601 datetime per le date. Significa che ci vuole anche ora, minuti e secondi. Va bene così?
- https://github.com/opencitations/ramose/issues/2?
- Ilaria
- Peffomance? https://swsa.semanticweb.org/content/swsa-distinguished-dissertation-award
- Articolo ISWC
- Elia, quando hai corretto la provenance hai verificato che tutte le entità nei dati avessero degli snapshot di provenance? Gli OMID consecutivi 0624010177378–0624010177388 e 0624010177865–0624010177868 non hanno snapshot di provenance. Sono vecchi perché non uso più prefissi diversi da 060 da due dump.
- w3id