AtoM AHG Framework - Data Migration Tool¶
User Guide¶
Plugin Version: 1.4.0 Last Updated: 2026-02-03 Plugin: ahgDataMigrationPlugin
Table of Contents¶
- Overview
- Accessing the Tool
- Supported Source Systems
- Web Interface Workflow
- Field Mapping
- Data Validation
- Batch Export
- Sector-Specific Import
- Sample CSV Files
- Preservica Import/Export
- Background Jobs
- CLI Commands
- Gearman Setup
- Troubleshooting
1. Overview¶
The Data Migration Tool enables importing records from various archival and collection management systems into AtoM. It supports:
- CSV and Excel files from multiple source systems
- XML formats (Preservica OPEX, EAD)
- Preservica packages (PAX/XIP with digital objects)
- Sector-specific mappings (Archives, Museum, Library, Gallery, DAM)
- Background processing for large datasets via Gearman
- Field transformation and validation
- Rights import (PREMIS, SecurityDescriptor, dc:rights, MODS, EAD)
- Provenance/history import from OPEX
2. Accessing the Tool¶
Web Interface¶
Navigate to: https://[your-domain]/dataMigration
Or: Admin → Import/Export → Data Migration
Required Permissions¶
- Administrator access
- Or
importpermission in user group
3. Supported Source Systems¶
Collection Management Systems¶
| System | Formats | Target Sector |
|---|---|---|
| ArchivesSpace | CSV/JSON | Archives, Accessions, Agents, Repositories |
| Vernon CMS | CSV/Excel | Museum |
| PastPerfect | CSV | Museum |
| CollectiveAccess | CSV | Multi-sector |
| Filemaker Pro | CSV | Any |
| WDB | CSV | Archives |
| PSIS | Excel (83 fields) | Library |
Preservation Systems¶
| System | Formats | Features |
|---|---|---|
| Preservica | OPEX XML | Metadata, rights, provenance import/export |
| Preservica | PAX/XIP (ZIP) | Metadata + digital objects |
Standard Formats¶
| Format | Use Case |
|---|---|
| CSV | Universal import |
| Excel (.xlsx, .xls) | Spreadsheet data |
| XML | EAD, Dublin Core |
4. Web Interface Workflow¶
Step 1: Upload File¶
- Go to
/dataMigration - Click "Choose File" or drag-and-drop
- Supported:
.csv,.xlsx,.xls,.xml,.opex,.pax,.zip - Click "Upload"
The system auto-detects: - File format (CSV, Excel, XML) - Source system (based on headers/structure) - Sector type (Archives, Museum, Library, etc.)
Step 2: Select or Create Mapping¶
Use Existing Mapping: 1. Select from dropdown (e.g., "Vernon CMS (Museum)") 2. Click "Load Mapping"
Create New Mapping: 1. Click "New Mapping" 2. Enter name (e.g., "My Museum Import") 3. Select target sector 4. Click "Create"
Step 3: Map Fields¶
The mapping interface shows: - Left column: Your source fields (from uploaded file) - Right column: AtoM target fields
For each source field: 1. Click the dropdown 2. Select matching AtoM field 3. Optionally set transformation rules
Field Transformations:
- trim - Remove whitespace
- uppercase / lowercase - Case conversion
- date:Y-m-d - Date formatting
- prepend:/uploads/ - Add prefix to paths
- split:| - Split multi-value fields
Step 4: Preview¶
- Click "Preview"
- Review first 10-20 records
- Check field mappings are correct
- Verify hierarchy (parent-child relationships)
Step 5: Import¶
Option A: Export to AtoM CSV 1. Click "Export AtoM CSV" 2. Download the transformed CSV 3. Use AtoM's built-in CSV Import (Admin → Import → CSV)
Option B: Direct Import (Large Files)
1. Click "Background Job"
2. Job queued to Gearman workers
3. Monitor progress at /dataMigration/jobs
5. Field Mapping¶
Core AtoM Fields¶
| AtoM Field | Description | Required |
|---|---|---|
legacyId |
Unique ID from source system | Yes |
parentId |
Parent's legacyId for hierarchy | No |
title |
Record title | Yes |
identifier |
Reference code | No |
scopeAndContent |
Description/scope | No |
levelOfDescription |
Fonds/Series/File/Item | Yes |
repository |
Repository name or ID | No |
culture |
Language code (en, af, etc.) | No |
Digital Object Fields¶
| Field | Description |
|---|---|
digitalObjectPath |
Path to file (relative or absolute) |
digitalObjectURI |
External URL |
digitalObjectChecksum |
MD5/SHA256 for verification |
Multi-Value Fields¶
Use pipe | separator for multiple values:
subjectAccessPoints: History|World War II|Military
placeAccessPoints: South Africa|Johannesburg
nameAccessPoints: Jan Smuts|Louis Botha
Hierarchy Example¶
legacyId,parentId,title,levelOfDescription
F001,,Municipal Archives,Fonds
S001,F001,Council Minutes,Series
F001-001,S001,Minutes 1950-1960,File
F001-001-001,F001-001,Meeting 1950-01-15,Item
6. Data Validation¶
The validation framework helps you identify and fix data quality issues before importing.
Validation Types¶
| Validation | Description |
|---|---|
| Schema | Required fields, data types, patterns, max lengths |
| Referential | Parent-child relationships, circular reference detection |
| Duplicates | Duplicate detection in file and against existing database records |
| Sector-Specific | Standards-based rules for each GLAM sector |
Validation-Only Mode¶
Test your data without importing:
Web Interface: 1. Upload your file 2. Map fields as usual 3. Click "Validate Only" instead of Import 4. Review errors/warnings by row and column
CLI:
php symfony sector:archives-csv-import /path/to/file.csv --validate-only
php symfony sector:museum-csv-import /path/to/file.csv --validate-only --mapping=10
Understanding Validation Results¶
Results show errors by row and column:
| Severity | Icon | Action Required |
|---|---|---|
| Error | Red | Must fix before import |
| Warning | Yellow | Review recommended |
| Info | Blue | Informational only |
Example output:
Row 3, Column 'identifier': Required field is empty
Row 5, Column 'levelOfDescription': Invalid value 'folder' - must be one of: fonds, series, file, item
Row 8, Column 'parentId': Parent record '999' not found in file or database
Sector-Specific Validation Rules¶
Each sector has specialized validation:
Archives (ISAD-G): - Level of description must be valid (fonds, series, file, item, etc.) - Parent hierarchy must follow ISAD-G rules - Required: identifier, title, levelOfDescription
Museum (Spectrum): - Object number format validation - Acquisition date must be valid date - Required: objectNumber, objectName
Library (MARC/RDA): - ISBN-10 and ISBN-13 checksum validation - ISSN validation - Required: identifier, title
Gallery (CCO): - Work type must be from controlled vocabulary - Creator format validation (Name; Role) - Required: objectNumber, title
DAM (Dublin Core/IPTC): - DC type must be valid (Image, Audio, Video, Document, etc.) - MIME type format validation - GPS coordinate range validation - Required: identifier, title
Live Validation Preview¶
While mapping fields, click "Preview Validation" to see validation results for the first 20 rows without running a full import.
Duplicate Detection Strategies¶
Configure how duplicates are detected:
| Strategy | Matches On |
|---|---|
| Identifier | identifier field |
| Legacy ID | legacyId field |
| Title + Date | Combination of title and date |
| Composite | Multiple configurable fields |
7. Batch Export¶
Export existing AtoM records to sector-specific CSV formats for backup, reporting, or migration to other systems.
Accessing Batch Export¶
Navigate to: https://[your-domain]/dataMigration/batchExport
Or from the Data Migration page, click the "Batch Export" button in the header.
Export Formats¶
| Format | Standard | Best For |
|---|---|---|
| Archives | ISAD(G) | Archival fonds, series, files, items |
| Museum | Spectrum 5.1 | Museum objects with acquisition, location data |
| Library | MARC/RDA | Bibliographic records with ISBN, call numbers |
| Gallery | CCO/VRA | Artworks and visual resources |
| Digital Assets | Dublin Core/IPTC | Digital files with technical metadata |
Filter Options¶
You can narrow down which records to export:
| Filter | Description |
|---|---|
| Repository | Export only records from a specific repository |
| Level of Description | Filter by fonds, series, file, item, etc. (multi-select) |
| Parent Slug | Export children of a specific record |
| Include Descendants | Include all levels below the parent (not just direct children) |
Export Workflow¶
- Select the Sector Format for your CSV columns
- Optionally set filters to narrow the export
- Click "Export CSV"
For small exports (<500 records): - CSV downloads immediately
For large exports (>500 records):
- Export is queued as a background job
- Check progress at /dataMigration/jobs
- Download file when complete
Example Use Cases¶
Backup a collection: 1. Select "Archives (ISAD-G)" format 2. Enter the fonds slug in "Parent Record Slug" 3. Check "Include all descendants" 4. Export
Export museum objects for reporting: 1. Select "Museum (Spectrum 5.1)" format 2. Select your repository 3. Select "Item" level of description 4. Export
Migrate records to another system: 1. Choose the format closest to your target system 2. Export all records or filter by repository 3. Use the CSV in your target system's import tool
8. Sector-Specific Import¶
Import directly using sector-specific CLI commands with validation built-in.
Archives Import (ISAD-G)¶
php symfony sector:archives-csv-import /path/to/archives.csv \
--repository=my-archive \
--update=identifier \
--validate-only # Remove to perform actual import
Museum Import (Spectrum)¶
php symfony sector:museum-csv-import /path/to/museum.csv \
--repository=my-museum \
--update=objectNumber
Library Import (MARC/RDA)¶
php symfony sector:library-csv-import /path/to/library.csv \
--repository=my-library \
--update=identifier
Gallery Import (CCO)¶
php symfony sector:gallery-csv-import /path/to/gallery.csv \
--repository=my-gallery \
--update=objectNumber
DAM Import (Dublin Core/IPTC)¶
php symfony sector:dam-csv-import /path/to/dam.csv \
--repository=my-repository \
--update=identifier
Common Options¶
| Option | Description |
|---|---|
--validate-only |
Validate without importing |
--mapping=ID |
Use specific mapping profile |
--repository=SLUG |
Target repository slug |
--update=FIELD |
Match field for updates (skip if exists) |
9. Sample CSV Files¶
The plugin includes sample CSV files demonstrating correct format for each sector.
Available Samples¶
Located in: atom-ahg-plugins/ahgDataMigrationPlugin/data/samples/
| File | Sector | Records | Description |
|---|---|---|---|
archives_sample.csv |
Archives | 5 | Hierarchical ISAD-G records with parent-child relationships |
museum_sample.csv |
Museum | 5 | Spectrum objects with materials, techniques, locations |
library_sample.csv |
Library | 5 | MARC/RDA records with ISBN, call numbers, subjects |
gallery_sample.csv |
Gallery | 5 | CCO artworks with provenance, credit lines |
dam_sample.csv |
DAM | 5 | Dublin Core assets with technical metadata |
Archives Sample Structure¶
legacyId,parentId,identifier,title,levelOfDescription,repository,...
1,,F001,Smith Family Papers,Fonds,Main Archive,...
2,1,F001/S1,Correspondence,Series,Main Archive,...
3,2,F001/S1/F1,Personal Letters,File,Main Archive,...
Key features:
- legacyId and parentId establish hierarchy
- Parent records must appear before children
- Level of description follows ISAD-G
Museum Sample Structure¶
objectNumber,objectName,title,materials,techniques,dimensions,productionDate,...
OBJ-001,Painting,Landscape with River,Canvas|Oil paint,Oil painting,60 x 80 cm,1935,...
Key features:
- Multi-value fields use pipe | separator
- Spectrum-compliant field names
- Acquisition and location tracking
Using Samples for Testing¶
-
Copy sample to test location:
-
Test validation:
-
Import if validation passes:
10. Preservica Import/Export¶
OPEX Import¶
OPEX (Open Preservation Exchange) is Preservica's XML metadata format.
Web Interface:
1. Upload .opex or .xml file
2. Select "Preservica OPEX" mapping
3. Map fields or use defaults
4. Preview and import
CLI:
php symfony preservica:import /path/to/file.opex
php symfony preservica:import /path/to/file.opex --repository=5
php symfony preservica:import /path/to/file.opex --dry-run
OPEX Rights Extraction:
The importer automatically extracts rights from:
- SecurityDescriptor elements
- dc:rights Dublin Core
- dcterms:license
- MODS <accessCondition>
- EAD <userestrict> and <accessrestrict>
Provenance Import:
OPEX <opex:History> elements are imported to provenance_event table.
PAX/XIP Import¶
PAX packages contain metadata (XIP XML) plus content files.
Web Interface:
1. Upload .pax or .zip file
2. Select "Preservica PAX/XIP" mapping
3. Digital objects extracted automatically
4. Preview and import
CLI:
php symfony preservica:import /path/to/package.pax --format=xip
php symfony preservica:import /path/to/directory --batch
Preservica Export¶
Export AtoM records to Preservica format:
CLI:
# Export single record
php symfony preservica:export 123
# Export with hierarchy
php symfony preservica:export 123 --hierarchy
# Export to XIP/PAX format
php symfony preservica:export 123 --format=xip
# Export entire repository
php symfony preservica:export --repository=5
Output Location: /uploads/exports/preservica/
11. Background Jobs¶
For large imports (1000+ records), use background processing:
Starting a Background Job¶
- Complete field mapping
- Click "Background Job" instead of direct import
- Job queued to Gearman workers
Monitoring Jobs¶
Navigate to: /dataMigration/jobs
| Status | Description |
|---|---|
queued |
Waiting for worker |
running |
Currently processing |
completed |
Finished successfully |
failed |
Error occurred |
Job Details¶
Click any job to see: - Records processed / total - Errors encountered - Processing time - Download results
12. CLI Commands¶
List Available Mappings¶
Output:
ARCHIVES:
[2] ArchivesSpace Resources
[11] Preservica OPEX
[12] Preservica PAX/XIP
MUSEUM:
[10] Vernon CMS (Museum)
LIBRARY:
[8] PSIS Full Import (83 fields)
Import with Mapping¶
# By mapping ID
php symfony migration:import /path/to/file.csv --mapping=10
# By mapping name
php symfony migration:import /path/to/file.csv --mapping="Vernon CMS"
# With options
php symfony migration:import /path/to/file.csv --mapping=10 \
--repository=5 \
--culture=en \
--update
Dry Run (Preview Only)¶
Sector Import Commands¶
# Archives (ISAD-G)
php symfony sector:archives-csv-import /path/to/file.csv \
--repository=SLUG --validate-only --mapping=ID --update=FIELD
# Museum (Spectrum)
php symfony sector:museum-csv-import /path/to/file.csv \
--repository=SLUG --validate-only --mapping=ID --update=FIELD
# Library (MARC/RDA)
php symfony sector:library-csv-import /path/to/file.csv \
--repository=SLUG --validate-only --mapping=ID --update=FIELD
# Gallery (CCO)
php symfony sector:gallery-csv-import /path/to/file.csv \
--repository=SLUG --validate-only --mapping=ID --update=FIELD
# DAM (Dublin Core)
php symfony sector:dam-csv-import /path/to/file.csv \
--repository=SLUG --validate-only --mapping=ID --update=FIELD
Preservica Commands¶
# Show Preservica info
php symfony preservica:info
# Import OPEX
php symfony preservica:import /path/to/file.opex
# Import PAX/XIP
php symfony preservica:import /path/to/package.pax --format=xip
# Export to OPEX
php symfony preservica:export 123 --format=opex
# Export to PAX
php symfony preservica:export 123 --format=xip --hierarchy
13. Gearman Setup¶
Gearman is required for background job processing (large imports/exports).
Quick Install (Ubuntu)¶
# Run the automated setup script
cd /usr/share/nginx/archive/atom-ahg-plugins/ahgDataMigrationPlugin
sudo ./bin/setup-gearman.sh
Manual Install¶
# Install packages
sudo apt-get install -y gearman-job-server php8.3-gearman
# Enable and start
sudo systemctl enable gearman-job-server
sudo systemctl start gearman-job-server
# Restart PHP-FPM
sudo systemctl restart php8.3-fpm
Running the Worker¶
Development (manual):
Production (systemd service):
Verify Installation¶
# Check Gearman status
gearadmin --status
# Check worker service
sudo systemctl status atom-worker
# View worker logs
sudo journalctl -u atom-worker -f
For detailed Gearman configuration and troubleshooting, see:
atom-ahg-plugins/ahgDataMigrationPlugin/docs/GEARMAN.md
14. Troubleshooting¶
File Upload Fails¶
Problem: File too large
Solution: Increase PHP limits in /etc/php/8.3/fpm/php.ini:
Mapping Not Found¶
Problem: Source columns not detected
Solution: Ensure CSV has headers in first row, UTF-8 encoding
Hierarchy Not Working¶
Problem: Parent-child relationships broken
Solution:
- Ensure legacyId is unique
- Ensure parentId matches a valid legacyId
- Parents must appear before children in file
Digital Objects Not Importing¶
Problem: Files not attaching to records
Solution:
- Check digitalObjectPath is correct
- Verify files exist at specified path
- Use absolute paths or paths relative to AtoM root
Background Job Stuck¶
Problem: Job shows "running" but no progress
Solution:
# Check Gearman workers
ps aux | grep jobs:worker
# Restart workers
sudo systemctl restart atom-worker
OPEX Rights Not Importing¶
Problem: Rights not appearing on records
Solution:
- Verify OPEX contains <SecurityDescriptor> or <dc:rights>
- Check ahg_rights_statement table for imported rights
- Ensure ahgRightsPlugin is enabled
Quick Reference¶
| Task | Web UI | CLI |
|---|---|---|
| Import CSV | /dataMigration → Upload |
php symfony migration:import file.csv --mapping=X |
| Validate Only | /dataMigration/validate |
php symfony sector:*-csv-import file.csv --validate-only |
| Import Archives | /dataMigration → Upload |
php symfony sector:archives-csv-import file.csv |
| Import Museum | /dataMigration → Upload |
php symfony sector:museum-csv-import file.csv |
| Import Library | /dataMigration → Upload |
php symfony sector:library-csv-import file.csv |
| Import Gallery | /dataMigration → Upload |
php symfony sector:gallery-csv-import file.csv |
| Import DAM | /dataMigration → Upload |
php symfony sector:dam-csv-import file.csv |
| Import OPEX | /dataMigration → Upload |
php symfony preservica:import file.opex |
| Import PAX | /dataMigration → Upload |
php symfony preservica:import file.pax --format=xip |
| Batch Export | /dataMigration/batchExport |
php symfony sector:export --sector=X |
| Export Mapping | Map page → Export | N/A |
| Import Mapping | Map page → Import | N/A |
| Export OPEX | N/A | php symfony preservica:export 123 |
| Export PAX | N/A | php symfony preservica:export 123 --format=xip |
| View Jobs | /dataMigration/jobs |
N/A |
| List Mappings | Dropdown | php symfony migration:import --list-mappings |
Version History¶
| Version | Changes |
|---|---|
| 1.4.0 | Universal validation framework, sector-specific validators (ISAD-G, Spectrum, MARC/RDA, CCO, Dublin Core), sector CLI import tasks, validation-only mode, sample CSV files, mapping profile export/import |
| 1.3.0 | Added Batch Export UI, Library/Gallery/DAM default mappings, Gearman setup script |
| 1.2.0 | Added Preservica OPEX/PAX support, rights import, provenance import, Gearman jobs |
| 1.1.0 | Added sector-specific CSV exporters |
| 1.0.0 | Initial release with field mapping UI |
Need Help?
- Check
/dataMigration/jobsfor import status - Review error logs:
/log/qubit.log - Contact: support@theahg.co.za