ahgRicExplorerPlugin - Technical Documentation¶

Version: 1.1.5 Category: Linked Data / RiC-O Dependencies: atom-framework, ahgCorePlugin

Overview¶

The ahgRicExplorerPlugin implements the ICA's Records in Contexts Ontology (RiC-O) for archival descriptions, providing extraction, storage, visualization, and semantic search capabilities. It transforms AtoM's relational data into RDF linked data and synchronizes it with an Apache Jena Fuseki triplestore.

Architecture¶

+-------------------------------------------------------------------+
|                     ahgRicExplorerPlugin                          |
+-------------------------------------------------------------------+
|                                                                   |
|  +---------------------+     +-----------------------------+      |
|  |  RiC Extractor v5   |     |    Fuseki Triplestore       |      |
|  |  (Python)           |---->|    (Apache Jena)            |      |
|  |  - MySQL Reader     |     |    - SPARQL Endpoint        |      |
|  |  - JSON-LD Writer   |     |    - RDF Storage            |      |
|  +---------------------+     +-----------------------------+      |
|           |                              ^                        |
|           v                              |                        |
|  +---------------------+     +-----------------------------+      |
|  |  RicSyncListener    |     |    ricExplorerActions       |      |
|  |  (PHP)              |---->|    (PHP)                    |      |
|  |  - Event Handler    |     |    - Graph Data API         |      |
|  |  - Queue Manager    |     |    - SPARQL Proxy           |      |
|  +---------------------+     +-----------------------------+      |
|           |                              |                        |
|           v                              v                        |
|  +---------------------+     +-----------------------------+      |
|  |  ric_sync_queue     |     |    Visualization Layer      |      |
|  |  (MySQL)            |     |    - Cytoscape.js (2D)      |      |
|  |  - Pending Ops      |     |    - 3D Force Graph         |      |
|  +---------------------+     +-----------------------------+      |
|                                                                   |
+-------------------------------------------------------------------+

RiC-O Ontology Mapping¶

AtoM to RiC-O Entity Mapping¶

AtoM Entity	RiC-O Class	Notes
information_object (fonds/subfonds/series/collection)	rico:RecordSet	Aggregations
information_object (item)	rico:Record	Single records
information_object (part)	rico:RecordPart	Record components
actor (person)	rico:Person	Named individuals
actor (corporate body)	rico:CorporateBody	Organizations
actor (family)	rico:Family	Family groups
repository	rico:CorporateBody	Holding institutions
event (creation)	rico:Production	Creation activity
event (accumulation)	rico:Accumulation	Collection activity
digital_object	rico:Instantiation	Digital representations
term (subject)	rico:Thing	Subject access points
term (place)	rico:Place	Geographic terms
term (genre)	rico:DocumentaryFormType	Form/genre terms
rights	rico:Rule	Access/use rules
function_object	rico:Function	Business functions

Spectrum/GRAP Extensions¶

The extractor includes custom namespace extensions:

Extension	Namespace	Purpose
Spectrum	`spectrum:`	Collections Trust activities
GRAP	`grap:`	Heritage asset accounting (GRAP 103)

Spectrum Activity Type	Description
ConditionCheck	Condition assessments
Valuation	Financial valuations
LoanOut	Outgoing loans
LocationMovement	Physical movements

Database Schema¶

ERD Diagram¶

+---------------------------+     +---------------------------+
|     ric_sync_status       |     |     ric_sync_queue        |
+---------------------------+     +---------------------------+
| PK id INT                 |     | PK id BIGINT              |
|    entity_type VARCHAR    |     |    entity_type VARCHAR    |
|    entity_id INT          |     |    entity_id INT          |
|    ric_uri VARCHAR(500)   |     |    operation ENUM         |
|    ric_type VARCHAR       |     |    priority TINYINT       |
|    sync_status ENUM       |     |    status ENUM            |
|    last_synced_at DATETIME|     |    attempts INT           |
|    sync_error TEXT        |     |    old_parent_id INT      |
|    retry_count INT        |     |    new_parent_id INT      |
|    content_hash VARCHAR   |     |    scheduled_at DATETIME  |
|    parent_id INT          |     |    last_error TEXT        |
|    hierarchy_path TEXT    |     |    created_at DATETIME    |
+---------------------------+     +---------------------------+

+---------------------------+     +---------------------------+
|     ric_sync_log          |     |   ric_orphan_tracking     |
+---------------------------+     +---------------------------+
| PK id BIGINT              |     | PK id INT                 |
|    operation ENUM         |     |    ric_uri VARCHAR(500)   |
|    entity_type VARCHAR    |     |    ric_type VARCHAR       |
|    entity_id INT          |     |    expected_entity_type   |
|    ric_uri VARCHAR(500)   |     |    expected_entity_id INT |
|    status ENUM            |     |    detected_at DATETIME   |
|    triples_affected INT   |     |    detection_method ENUM  |
|    details JSON           |     |    status ENUM            |
|    error_message TEXT     |     |    resolved_at DATETIME   |
|    execution_time_ms INT  |     |    resolved_by INT        |
|    triggered_by ENUM      |     |    resolution_notes TEXT  |
|    batch_id VARCHAR       |     |    triple_count INT       |
|    created_at DATETIME    |     +---------------------------+
+---------------------------+

+---------------------------+
|     ric_sync_config       |
+---------------------------+
| PK id INT                 |
|    config_key VARCHAR     |
|    config_value TEXT      |
|    description TEXT       |
|    updated_at DATETIME    |
+---------------------------+

Database Views¶

-- ric_sync_summary: Aggregated sync statistics by entity type and status
CREATE VIEW ric_sync_summary AS
SELECT entity_type, sync_status, COUNT(*) as count,
       MAX(last_synced_at) as last_sync,
       SUM(CASE WHEN retry_count > 0 THEN 1 ELSE 0 END) as with_retries
FROM ric_sync_status
GROUP BY entity_type, sync_status;

-- ric_queue_status: Queue statistics by status
CREATE VIEW ric_queue_status AS
SELECT status, COUNT(*) as count,
       MIN(scheduled_at) as oldest, MAX(scheduled_at) as newest
FROM ric_sync_queue
GROUP BY status;

-- ric_recent_operations: Last 100 sync operations
CREATE VIEW ric_recent_operations AS
SELECT * FROM ric_sync_log
ORDER BY created_at DESC LIMIT 100;

SPARQL Queries¶

Common Query Patterns¶

1. Get all records for a fonds:

PREFIX rico: <https://www.ica.org/standards/RiC/ontology#>

SELECT ?record ?title ?type ?identifier
WHERE {
    ?record a ?type .
    FILTER(?type IN (rico:RecordSet, rico:Record, rico:RecordPart))
    ?record rico:title ?title .
    OPTIONAL { ?record rico:identifier ?identifier }
}
ORDER BY ?title

2. Find creators of a record:

PREFIX rico: <https://www.ica.org/standards/RiC/ontology#>

SELECT ?record ?creator ?creatorName
WHERE {
    ?record rico:hasCreator ?creator .
    ?creator rico:hasAgentName/rico:textualValue ?creatorName .
}

3. Get record hierarchy:

PREFIX rico: <https://www.ica.org/standards/RiC/ontology#>

SELECT ?parent ?child ?parentTitle ?childTitle
WHERE {
    ?parent rico:includes ?child .
    ?parent rico:title ?parentTitle .
    ?child rico:title ?childTitle .
}

4. Find records by subject:

PREFIX rico: <https://www.ica.org/standards/RiC/ontology#>

SELECT ?record ?title ?subject
WHERE {
    ?record rico:hasOrHadSubject ?subjectUri .
    ?subjectUri rico:hasOrHadName/rico:textualValue ?subject .
    ?record rico:title ?title .
    FILTER(CONTAINS(LCASE(?subject), "example"))
}

5. Get Spectrum condition checks:

PREFIX rico: <https://www.ica.org/standards/RiC/ontology#>
PREFIX spectrum: <https://collectionstrust.org.uk/spectrum#>

SELECT ?record ?checkDate ?condition ?priority
WHERE {
    ?activity a rico:Activity ;
              rico:hasActivityType "ConditionCheck" ;
              rico:resultsOrResultedIn ?record .
    OPTIONAL { ?activity rico:isOrWasAssociatedWithDate/rico:beginningDate ?checkDate }
    OPTIONAL { ?activity spectrum:overallCondition ?condition }
    OPTIONAL { ?activity spectrum:treatmentPriority ?priority }
}
ORDER BY DESC(?checkDate)

6. Count entities by type:

PREFIX rico: <https://www.ica.org/standards/RiC/ontology#>

SELECT ?type (COUNT(?s) as ?count)
WHERE {
    ?s a ?type .
    FILTER(STRSTARTS(STR(?type), "https://www.ica.org"))
}
GROUP BY ?type
ORDER BY DESC(?count)

Fuseki Integration¶

Configuration¶

Settings are stored in the ahg_settings table (setting_group = 'fuseki'):

Setting Key	Default	Description
fuseki_endpoint	http://localhost:3030/ric	SPARQL endpoint URL
fuseki_username	admin	Authentication username
fuseki_password	(empty)	Authentication password
fuseki_sync_enabled	1	Enable automatic sync
fuseki_queue_enabled	1	Use queue for sync

Docker Deployment¶

# Run Fuseki with Docker
docker run -d --name fuseki \
  -p 3030:3030 \
  -e ADMIN_PASSWORD=admin123 \
  -v fuseki-data:/fuseki \
  stain/jena-fuseki

# Create dataset
curl -u admin:admin123 -X POST \
  'http://localhost:3030/$/datasets' \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -d 'dbName=ric&dbType=tdb2'

Optimization Script¶

The optimize_fuseki.sh script performs: - TDB2 compaction - Statistics regeneration - Query cache clearing

./bin/optimize_fuseki.sh

Graph Visualization¶

Cytoscape.js (2D)¶

The plugin uses Cytoscape.js for 2D graph rendering with the following node styles:

Entity Type	Color
RecordSet/Record	#17a2b8 (teal)
CorporateBody	#ffc107 (yellow)
Person/Family	#dc3545 (red)
Production/Activity	#6f42c1 (purple)
Place	#fd7e14 (orange)
Thing (Subject)	#20c997 (green)

Layout: COSE (Compound Spring Embedder) with node repulsion.

3D Force Graph¶

Uses three.js and 3d-force-graph for immersive exploration: - Node labels rendered as sprites - Directional particles on links - Interactive camera controls

Event Sync System¶

RicSyncListener¶

The listener hooks into AtoM's event dispatcher to capture entity changes:

QubitInformationObject.insert.post --> handleSave()
QubitInformationObject.update.post --> handleSave()
QubitInformationObject.delete.pre  --> handleDelete()

QubitActor.insert.post --> handleSave()
QubitActor.update.post --> handleSave()
QubitActor.delete.pre  --> handleDelete()

Syncable Entities¶

PHP Class	Entity Type Key
QubitInformationObject	informationobject
QubitActor	actor
QubitRepository	repository
QubitFunction	function
QubitEvent	event

Queue Operations¶

Operations are queued with priority levels: - Priority 1: Delete operations (process first) - Priority 3: Move operations - Priority 5: Create/Update operations

Python Tools¶

ric_extractor_v5.py¶

The main extraction tool converts AtoM MySQL data to RiC-O JSON-LD:

# List available fonds
python3 ric_extractor_v5.py --list-fonds

# Extract specific fonds
python3 ric_extractor_v5.py --fonds-id 776 --output output.jsonld --pretty

# List standalone records
python3 ric_extractor_v5.py --list-standalone

Environment Variables: | Variable | Default | Description | |----------|---------|-------------| | ATOM_DB_HOST | localhost | MySQL host | | ATOM_DB_USER | root | MySQL user | | ATOM_DB_PASSWORD | - | MySQL password | | ATOM_DB_NAME | archive | Database name | | RIC_BASE_URI | https://archives.theahg.co.za/ric | Base URI for minted URIs | | ATOM_INSTANCE_ID | atom-psis | Instance identifier |

ric_semantic_search.py¶

Flask-based semantic search API combining Elasticsearch and SPARQL:

# Start search API
python3 ric_semantic_search.py

# API endpoints
GET/POST /api/search?q=<query>     # Main search
GET /api/autocomplete?q=<prefix>   # Autocomplete
GET /api/suggest                   # Query suggestions
GET /api/health                    # Service health check

Search Features: - Fuzzy text matching via Elasticsearch - Fallback to SPARQL for RiC-specific queries - Bilingual support (English/Afrikaans) - Date range parsing - Level-of-description filtering

ric_shacl_validator.py¶

SHACL validation for RiC-O conformance:

# Validate data in Fuseki
python3 ric_shacl_validator.py --validate --summary

# Validate JSON-LD file
python3 ric_shacl_validator.py --file output.jsonld --validate

# Generate HTML report
python3 ric_shacl_validator.py --validate --report -o report.html

Dependencies:

pip install pyshacl rdflib

ric_sync.sh¶

Shell script for batch synchronization:

# Sync all fonds
./bin/ric_sync.sh

# Sync specific fonds
./bin/ric_sync.sh --fonds 776,829

# Clear and resync
./bin/ric_sync.sh --clear

# With validation
./bin/ric_sync.sh --validate

# Show status
./bin/ric_sync.sh --status

Admin Dashboard Routes¶

Route	URL	Description
ric_dashboard_index	/admin/ric	Dashboard home
ric_dashboard_sync_status	/admin/ric/sync-status	Entity sync status
ric_dashboard_orphans	/admin/ric/orphans	Orphaned triples
ric_dashboard_queue	/admin/ric/queue	Sync queue
ric_dashboard_logs	/admin/ric/logs	Operation logs

AJAX Endpoints¶

Endpoint	Purpose
/admin/ric/ajax/stats	Dashboard statistics
/admin/ric/ajax/dashboard	Full dashboard data (cached)
/admin/ric/ajax/integrity-check	Run integrity check
/admin/ric/ajax/cleanup-orphans	Remove orphaned triples
/admin/ric/ajax/resync	Force resync entity

JSON-LD Output Structure¶

{
  "@context": {
    "rico": "https://www.ica.org/standards/RiC/ontology#",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "spectrum": "https://collectionstrust.org.uk/spectrum#",
    "grap": "https://www.asb.co.za/grap#"
  },
  "@graph": [
    {
      "@id": "https://example.com/ric/instance/recordset/776",
      "@type": "rico:RecordSet",
      "rico:identifier": "F001",
      "rico:title": "Fonds Title",
      "rico:hasCreator": {"@id": "..."},
      "rico:includes": [{"@id": "..."}]
    }
  ],
  "_metadata": {
    "extracted": "2025-01-30T10:00:00Z",
    "source": "AtoM instance: atom-psis",
    "records_count": 150,
    "agents_count": 25,
    "relations_count": 340
  }
}

Configuration¶

app.yml Settings¶

all:
  ric_explorer:
    sparql_endpoint: 'http://localhost:3030/ric/query'
    base_uri: 'https://your-domain.com/ric/atom'
    explorer_url: '/ric/'
    enabled: true
    show_related_records: true
    related_records_limit: 10
    show_mini_graph: true

Troubleshooting¶

Issue	Solution
Panel not loading	Check Fuseki endpoint accessibility
No graph data	Verify JSON-LD was loaded to Fuseki
CORS errors	Configure Fuseki CORS or use PHP proxy
Sync queue stuck	Check ric_sync_queue for failed items
Orphaned triples	Run integrity check from dashboard
3D graph slow	Reduce node count or use 2D mode

Security Considerations¶

Fuseki credentials stored in ahg_settings (encrypted recommended)
Dashboard restricted to administrators
SPARQL queries sanitized to prevent injection
API endpoints require authentication

Part of the AtoM AHG Framework