ahgRicExplorerPlugin - Technical Documentation¶
Version: 1.1.5 Category: Linked Data / RiC-O Dependencies: atom-framework, ahgCorePlugin
Overview¶
The ahgRicExplorerPlugin implements the ICA's Records in Contexts Ontology (RiC-O) for archival descriptions, providing extraction, storage, visualization, and semantic search capabilities. It transforms AtoM's relational data into RDF linked data and synchronizes it with an Apache Jena Fuseki triplestore.
Architecture¶
+-------------------------------------------------------------------+
| ahgRicExplorerPlugin |
+-------------------------------------------------------------------+
| |
| +---------------------+ +-----------------------------+ |
| | RiC Extractor v5 | | Fuseki Triplestore | |
| | (Python) |---->| (Apache Jena) | |
| | - MySQL Reader | | - SPARQL Endpoint | |
| | - JSON-LD Writer | | - RDF Storage | |
| +---------------------+ +-----------------------------+ |
| | ^ |
| v | |
| +---------------------+ +-----------------------------+ |
| | RicSyncListener | | ricExplorerActions | |
| | (PHP) |---->| (PHP) | |
| | - Event Handler | | - Graph Data API | |
| | - Queue Manager | | - SPARQL Proxy | |
| +---------------------+ +-----------------------------+ |
| | | |
| v v |
| +---------------------+ +-----------------------------+ |
| | ric_sync_queue | | Visualization Layer | |
| | (MySQL) | | - Cytoscape.js (2D) | |
| | - Pending Ops | | - 3D Force Graph | |
| +---------------------+ +-----------------------------+ |
| |
+-------------------------------------------------------------------+
RiC-O Ontology Mapping¶
AtoM to RiC-O Entity Mapping¶
| AtoM Entity | RiC-O Class | Notes |
|---|---|---|
| information_object (fonds/subfonds/series/collection) | rico:RecordSet | Aggregations |
| information_object (item) | rico:Record | Single records |
| information_object (part) | rico:RecordPart | Record components |
| actor (person) | rico:Person | Named individuals |
| actor (corporate body) | rico:CorporateBody | Organizations |
| actor (family) | rico:Family | Family groups |
| repository | rico:CorporateBody | Holding institutions |
| event (creation) | rico:Production | Creation activity |
| event (accumulation) | rico:Accumulation | Collection activity |
| digital_object | rico:Instantiation | Digital representations |
| term (subject) | rico:Thing | Subject access points |
| term (place) | rico:Place | Geographic terms |
| term (genre) | rico:DocumentaryFormType | Form/genre terms |
| rights | rico:Rule | Access/use rules |
| function_object | rico:Function | Business functions |
Spectrum/GRAP Extensions¶
The extractor includes custom namespace extensions:
| Extension | Namespace | Purpose |
|---|---|---|
| Spectrum | spectrum: |
Collections Trust activities |
| GRAP | grap: |
Heritage asset accounting (GRAP 103) |
| Spectrum Activity Type | Description |
|---|---|
| ConditionCheck | Condition assessments |
| Valuation | Financial valuations |
| LoanOut | Outgoing loans |
| LocationMovement | Physical movements |
Database Schema¶
ERD Diagram¶
+---------------------------+ +---------------------------+
| ric_sync_status | | ric_sync_queue |
+---------------------------+ +---------------------------+
| PK id INT | | PK id BIGINT |
| entity_type VARCHAR | | entity_type VARCHAR |
| entity_id INT | | entity_id INT |
| ric_uri VARCHAR(500) | | operation ENUM |
| ric_type VARCHAR | | priority TINYINT |
| sync_status ENUM | | status ENUM |
| last_synced_at DATETIME| | attempts INT |
| sync_error TEXT | | old_parent_id INT |
| retry_count INT | | new_parent_id INT |
| content_hash VARCHAR | | scheduled_at DATETIME |
| parent_id INT | | last_error TEXT |
| hierarchy_path TEXT | | created_at DATETIME |
+---------------------------+ +---------------------------+
+---------------------------+ +---------------------------+
| ric_sync_log | | ric_orphan_tracking |
+---------------------------+ +---------------------------+
| PK id BIGINT | | PK id INT |
| operation ENUM | | ric_uri VARCHAR(500) |
| entity_type VARCHAR | | ric_type VARCHAR |
| entity_id INT | | expected_entity_type |
| ric_uri VARCHAR(500) | | expected_entity_id INT |
| status ENUM | | detected_at DATETIME |
| triples_affected INT | | detection_method ENUM |
| details JSON | | status ENUM |
| error_message TEXT | | resolved_at DATETIME |
| execution_time_ms INT | | resolved_by INT |
| triggered_by ENUM | | resolution_notes TEXT |
| batch_id VARCHAR | | triple_count INT |
| created_at DATETIME | +---------------------------+
+---------------------------+
+---------------------------+
| ric_sync_config |
+---------------------------+
| PK id INT |
| config_key VARCHAR |
| config_value TEXT |
| description TEXT |
| updated_at DATETIME |
+---------------------------+
Database Views¶
-- ric_sync_summary: Aggregated sync statistics by entity type and status
CREATE VIEW ric_sync_summary AS
SELECT entity_type, sync_status, COUNT(*) as count,
MAX(last_synced_at) as last_sync,
SUM(CASE WHEN retry_count > 0 THEN 1 ELSE 0 END) as with_retries
FROM ric_sync_status
GROUP BY entity_type, sync_status;
-- ric_queue_status: Queue statistics by status
CREATE VIEW ric_queue_status AS
SELECT status, COUNT(*) as count,
MIN(scheduled_at) as oldest, MAX(scheduled_at) as newest
FROM ric_sync_queue
GROUP BY status;
-- ric_recent_operations: Last 100 sync operations
CREATE VIEW ric_recent_operations AS
SELECT * FROM ric_sync_log
ORDER BY created_at DESC LIMIT 100;
SPARQL Queries¶
Common Query Patterns¶
1. Get all records for a fonds:
PREFIX rico: <https://www.ica.org/standards/RiC/ontology#>
SELECT ?record ?title ?type ?identifier
WHERE {
?record a ?type .
FILTER(?type IN (rico:RecordSet, rico:Record, rico:RecordPart))
?record rico:title ?title .
OPTIONAL { ?record rico:identifier ?identifier }
}
ORDER BY ?title
2. Find creators of a record:
PREFIX rico: <https://www.ica.org/standards/RiC/ontology#>
SELECT ?record ?creator ?creatorName
WHERE {
?record rico:hasCreator ?creator .
?creator rico:hasAgentName/rico:textualValue ?creatorName .
}
3. Get record hierarchy:
PREFIX rico: <https://www.ica.org/standards/RiC/ontology#>
SELECT ?parent ?child ?parentTitle ?childTitle
WHERE {
?parent rico:includes ?child .
?parent rico:title ?parentTitle .
?child rico:title ?childTitle .
}
4. Find records by subject:
PREFIX rico: <https://www.ica.org/standards/RiC/ontology#>
SELECT ?record ?title ?subject
WHERE {
?record rico:hasOrHadSubject ?subjectUri .
?subjectUri rico:hasOrHadName/rico:textualValue ?subject .
?record rico:title ?title .
FILTER(CONTAINS(LCASE(?subject), "example"))
}
5. Get Spectrum condition checks:
PREFIX rico: <https://www.ica.org/standards/RiC/ontology#>
PREFIX spectrum: <https://collectionstrust.org.uk/spectrum#>
SELECT ?record ?checkDate ?condition ?priority
WHERE {
?activity a rico:Activity ;
rico:hasActivityType "ConditionCheck" ;
rico:resultsOrResultedIn ?record .
OPTIONAL { ?activity rico:isOrWasAssociatedWithDate/rico:beginningDate ?checkDate }
OPTIONAL { ?activity spectrum:overallCondition ?condition }
OPTIONAL { ?activity spectrum:treatmentPriority ?priority }
}
ORDER BY DESC(?checkDate)
6. Count entities by type:
PREFIX rico: <https://www.ica.org/standards/RiC/ontology#>
SELECT ?type (COUNT(?s) as ?count)
WHERE {
?s a ?type .
FILTER(STRSTARTS(STR(?type), "https://www.ica.org"))
}
GROUP BY ?type
ORDER BY DESC(?count)
Fuseki Integration¶
Configuration¶
Settings are stored in the ahg_settings table (setting_group = 'fuseki'):
| Setting Key | Default | Description |
|---|---|---|
| fuseki_endpoint | http://localhost:3030/ric | SPARQL endpoint URL |
| fuseki_username | admin | Authentication username |
| fuseki_password | (empty) | Authentication password |
| fuseki_sync_enabled | 1 | Enable automatic sync |
| fuseki_queue_enabled | 1 | Use queue for sync |
Docker Deployment¶
# Run Fuseki with Docker
docker run -d --name fuseki \
-p 3030:3030 \
-e ADMIN_PASSWORD=admin123 \
-v fuseki-data:/fuseki \
stain/jena-fuseki
# Create dataset
curl -u admin:admin123 -X POST \
'http://localhost:3030/$/datasets' \
-H 'Content-Type: application/x-www-form-urlencoded' \
-d 'dbName=ric&dbType=tdb2'
Optimization Script¶
The optimize_fuseki.sh script performs:
- TDB2 compaction
- Statistics regeneration
- Query cache clearing
Graph Visualization¶
Cytoscape.js (2D)¶
The plugin uses Cytoscape.js for 2D graph rendering with the following node styles:
| Entity Type | Color |
|---|---|
| RecordSet/Record | #17a2b8 (teal) |
| CorporateBody | #ffc107 (yellow) |
| Person/Family | #dc3545 (red) |
| Production/Activity | #6f42c1 (purple) |
| Place | #fd7e14 (orange) |
| Thing (Subject) | #20c997 (green) |
Layout: COSE (Compound Spring Embedder) with node repulsion.
3D Force Graph¶
Uses three.js and 3d-force-graph for immersive exploration: - Node labels rendered as sprites - Directional particles on links - Interactive camera controls
Event Sync System¶
RicSyncListener¶
The listener hooks into AtoM's event dispatcher to capture entity changes:
QubitInformationObject.insert.post --> handleSave()
QubitInformationObject.update.post --> handleSave()
QubitInformationObject.delete.pre --> handleDelete()
QubitActor.insert.post --> handleSave()
QubitActor.update.post --> handleSave()
QubitActor.delete.pre --> handleDelete()
Syncable Entities¶
| PHP Class | Entity Type Key |
|---|---|
| QubitInformationObject | informationobject |
| QubitActor | actor |
| QubitRepository | repository |
| QubitFunction | function |
| QubitEvent | event |
Queue Operations¶
Operations are queued with priority levels: - Priority 1: Delete operations (process first) - Priority 3: Move operations - Priority 5: Create/Update operations
Python Tools¶
ric_extractor_v5.py¶
The main extraction tool converts AtoM MySQL data to RiC-O JSON-LD:
# List available fonds
python3 ric_extractor_v5.py --list-fonds
# Extract specific fonds
python3 ric_extractor_v5.py --fonds-id 776 --output output.jsonld --pretty
# List standalone records
python3 ric_extractor_v5.py --list-standalone
Environment Variables: | Variable | Default | Description | |----------|---------|-------------| | ATOM_DB_HOST | localhost | MySQL host | | ATOM_DB_USER | root | MySQL user | | ATOM_DB_PASSWORD | - | MySQL password | | ATOM_DB_NAME | archive | Database name | | RIC_BASE_URI | https://archives.theahg.co.za/ric | Base URI for minted URIs | | ATOM_INSTANCE_ID | atom-psis | Instance identifier |
ric_semantic_search.py¶
Flask-based semantic search API combining Elasticsearch and SPARQL:
# Start search API
python3 ric_semantic_search.py
# API endpoints
GET/POST /api/search?q=<query> # Main search
GET /api/autocomplete?q=<prefix> # Autocomplete
GET /api/suggest # Query suggestions
GET /api/health # Service health check
Search Features: - Fuzzy text matching via Elasticsearch - Fallback to SPARQL for RiC-specific queries - Bilingual support (English/Afrikaans) - Date range parsing - Level-of-description filtering
ric_shacl_validator.py¶
SHACL validation for RiC-O conformance:
# Validate data in Fuseki
python3 ric_shacl_validator.py --validate --summary
# Validate JSON-LD file
python3 ric_shacl_validator.py --file output.jsonld --validate
# Generate HTML report
python3 ric_shacl_validator.py --validate --report -o report.html
Dependencies:
ric_sync.sh¶
Shell script for batch synchronization:
# Sync all fonds
./bin/ric_sync.sh
# Sync specific fonds
./bin/ric_sync.sh --fonds 776,829
# Clear and resync
./bin/ric_sync.sh --clear
# With validation
./bin/ric_sync.sh --validate
# Show status
./bin/ric_sync.sh --status
Admin Dashboard Routes¶
| Route | URL | Description |
|---|---|---|
| ric_dashboard_index | /admin/ric | Dashboard home |
| ric_dashboard_sync_status | /admin/ric/sync-status | Entity sync status |
| ric_dashboard_orphans | /admin/ric/orphans | Orphaned triples |
| ric_dashboard_queue | /admin/ric/queue | Sync queue |
| ric_dashboard_logs | /admin/ric/logs | Operation logs |
AJAX Endpoints¶
| Endpoint | Purpose |
|---|---|
| /admin/ric/ajax/stats | Dashboard statistics |
| /admin/ric/ajax/dashboard | Full dashboard data (cached) |
| /admin/ric/ajax/integrity-check | Run integrity check |
| /admin/ric/ajax/cleanup-orphans | Remove orphaned triples |
| /admin/ric/ajax/resync | Force resync entity |
JSON-LD Output Structure¶
{
"@context": {
"rico": "https://www.ica.org/standards/RiC/ontology#",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"spectrum": "https://collectionstrust.org.uk/spectrum#",
"grap": "https://www.asb.co.za/grap#"
},
"@graph": [
{
"@id": "https://example.com/ric/instance/recordset/776",
"@type": "rico:RecordSet",
"rico:identifier": "F001",
"rico:title": "Fonds Title",
"rico:hasCreator": {"@id": "..."},
"rico:includes": [{"@id": "..."}]
}
],
"_metadata": {
"extracted": "2025-01-30T10:00:00Z",
"source": "AtoM instance: atom-psis",
"records_count": 150,
"agents_count": 25,
"relations_count": 340
}
}
Configuration¶
app.yml Settings¶
all:
ric_explorer:
sparql_endpoint: 'http://localhost:3030/ric/query'
base_uri: 'https://your-domain.com/ric/atom'
explorer_url: '/ric/'
enabled: true
show_related_records: true
related_records_limit: 10
show_mini_graph: true
Troubleshooting¶
| Issue | Solution |
|---|---|
| Panel not loading | Check Fuseki endpoint accessibility |
| No graph data | Verify JSON-LD was loaded to Fuseki |
| CORS errors | Configure Fuseki CORS or use PHP proxy |
| Sync queue stuck | Check ric_sync_queue for failed items |
| Orphaned triples | Run integrity check from dashboard |
| 3D graph slow | Reduce node count or use 2D mode |
Security Considerations¶
- Fuseki credentials stored in ahg_settings (encrypted recommended)
- Dashboard restricted to administrators
- SPARQL queries sanitized to prevent injection
- API endpoints require authentication
Part of the AtoM AHG Framework