Skip to content

2026-01-13 core

<div>
<strong style="display: block; color: #1f2328;">arcangelo7</strong>
<span style="font-size: 0.85em; color: #656d76;">Dec 27, 2025</span>
<span style="font-size: 0.85em; color: #656d76;"> · </span>
<a href="https://github.com/opencitations/oc_meta" style="font-size: 0.85em; color: #0969da; text-decoration: none;">opencitations/oc_meta</a>
</div>

feat(fixer): add script to detect identifier schema mismatches

Scans RDF files directly to find identifiers where the declared schema does not match the value pattern (e.g., ISSN values marked as ORCID).

+226 -0 49a5051

Identificatori con formato ISSN/ISBN ma schema ORCID. Tutti appartenenti a entita BR (Journal/Book).

ID MismatchValore erratoEntitaTitoloValore correttoSchema corretto
060123932432790-9344br/06012054723Pakistan Journal Of Health Sciences2790-9344issn
060123933281462-0324br/06012054834Rheumatology1462-0324issn
060123932500962-1067br/06012054731Journal Of Clinical Nursing0962-1067issn
060123934780277-1691br/06012055012International Journal Of Gynecological Pathology0277-1691issn
060123932949783111692456br/06012054788Women In The Socratic Tradition9783111692456isbn
060123878839783031963971br/06012050602Studies In Childhood And Youth9783031963971isbn
060123878681724-6059ra/06030984957(Responsible Agent)1724-6059issn

Verifica esterna:

Tutte le corruzioni sono successive al 22 dicembre, data in cui è stato spento il server. Io me l’ero dimenticato e non ho stoppato Meta in tempo.

Soluzione: cancellare tutte le entità successive al 22 dicembre e riprocessarle.

Pensandoci meglio questa soluzione è rischiosa. Alla fine ho preferito rilanciare il processo da capo.


autoheal:
image: willfarrell/autoheal:latest
container_name: oc_meta_autoheal
restart: unless-stopped
network_mode: none
environment:
AUTOHEAL_CONTAINER_LABEL: all
AUTOHEAL_INTERVAL: 30
AUTOHEAL_START_PERIOD: 120
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /etc/localtime:/etc/localtime:ro
<div>
<strong style="display: block; color: #1f2328;">arcangelo7</strong>
<span style="font-size: 0.85em; color: #656d76;">Dec 20, 2025</span>
<span style="font-size: 0.85em; color: #656d76;"> · </span>
<a href="https://github.com/opencitations/oc_meta" style="font-size: 0.85em; color: #0969da; text-decoration: none;">opencitations/oc_meta</a>
</div>

fix: add timeout to SPARQLClient to handle database unavailability

  • Add timeout=3600 (1 hour) to all SPARQLClient calls in production code
  • Add timeout=60 to all SPARQLClient calls in test code
  • Update sparqlite to 1.1.0 which supports timeout parameter
  • Add exit code checking in meta_process.py for child process failures
  • Add wait_for_virtuoso utility for test database readiness
  • Add database_unavailability_test.py to verify graceful failure handling

When the triplestore becomes unavailable, SPARQLClient now times out instead of hanging indefinitely, allowing proper error propagation.

+262 -75 c59160b
arcangelo7
arcangelo7 Dec 20, 2025 · opencitations/sparqlite

feat: add timeout parameter to SPARQLClient [release]

+85 -1 8bfd31a

Explain which specific system underpins the Paratext DB case study when it is first mentioned

Regarding the ParaText scenario, it is fairly obvious that editing in a database requires a user interface, this is not specific to the complexity of the domain. As a reader, I wonder whether there is another reason why this system is a good example for the contribution of this thesis. The argument about a provenance is clearer, and maybe that could be at the centre of the scenario in 3.2.

Nella narrativa attuale sembra quasi che Paratext sia un sistema preesistente a Heritrace e che sia stato utilizzato per far emergere dei requisiti che poi Heritrace ha soddisfatto. Questo crea dei fraintendimenti in entrambi i revisori. Di conseguenza ho reso più esplicito che OC Meta è il sistema preesistente che dimostra l’esistenza di barriere sistematiche su scala, mentre Paratext è servito come piattaforma di guerilla testing per validare se la soluzione proposta, cioè Heritrace, fosse in grado di risolvere quei problemi che pur si presentano anche in Paratext. In quest’ottica, la metodologia utilizzata rientra nel cosiddetto Design Science Research, un paradigma orientato alla risoluzione di problemi che mira a estendere i confini delle capacità umane e organizzative attraverso la creazione di artefatti nuovi e innovativi. Si oppone alla behavioural research che invece indaga comportamenti esistenti per elaborare nuove teorie. Noi partiamo da una teoria esistente per elaborare una soluzione.

The research follows a Design Science Research methodology \citep{hevnerDesignScienceInformation2004}, where the construction and evaluation of an artifact (HERITRACE) constitutes the primary contribution. Two case studies serve distinct methodological roles. OpenCitations Meta \citep{massariOpenCitationsMeta2024}, where the author participates as a contributor, provides independent validation that the identified barriers exist at scale in production systems. The ParaText Bibliographical Database was developed as an application case for HERITRACE, serving as a testbed for guerrilla testing \citep{nielsenUsabilityEngineering1993} and iterative refinement throughout the development process. This distinction is methodologically significant: OpenCitations Meta demonstrates the problem exists independently, while ParaText enables solution validation through guerrilla testing in a real-world scholarly context.


The RQ could be better developed, possibly breaking it down to sub-questions.

Why considering technically proficient users, since the RQ only targets domain experts, that are supposidely not technically savvy?

Prima:

This challenge motivates the research question: how can we enable domain experts to participate in semantic data curation while maintaining provenance documentation, tracking changes over time, supporting flexible customization, and integrating with existing RDF collections?

Dopo:

This challenge motivates two research questions addressing distinct operational phases:

RQ1: How can we design interfaces that enable domain experts to curate RDF data without requiring technical expertise, while maintaining provenance documentation and change tracking?

RQ2: How can technical staff perform one-time system configuration to adapt the curation environment to specialized domains, ensuring integration with existing RDF collections and flexible customization?

RQ1 addresses the continuous curation workflow where domain experts interact with semantic data daily. RQ2 addresses the initial configuration phase, performed once by technical staff, that prepares the system for domain-specific use.

Cartelle che non rispettando il naming

CategoriaConteggioEsempio
Non oggetto (materials, _files)8Sala1/materials
Sub-item con lettera (27a, 74b)16S2-27a-FICLIT_...
Separatore spazio1S2-39 - Vitello...
Prefisso speciale (PT, VS, s.n.)6S3-PT-DICAM_...
NR con underscore2S6-106_..., S6-114_115_116-...
Senza NR2S1-CNR_SoffittoSala1
Sala 5 - Vetrina108S5-Vetrina 1 alto N...
Sala 5 - Manoscritto8S5-Manoscritto-FICLIT_...
Sala 5 - Altri (A, B, CNR)20S5-A alto sinistra...
TOTALE171

Riguardo questi

Pasted image 20260112200707.png

Qui qual è il numero?

stesso problema con robe tipo S5-B alto sinistra 2-CNR_Miocene, S5-Manoscritto-FICLIT_LexiconRerumInanimatarum ecc… Questi contengono le cartelle raw, rawp, dhco, e dhcoo