AtoM Heratio — Integrity Assurance Plugin User Manual¶
Version: 1.2.0 Author: The Archive and Heritage Group (Pty) Ltd
Table of Contents¶
- Overview
- Installation
- Dashboard
- Schedules
- Run History
- Verification Ledger
- Dead Letter Queue
- Reports
- Export & Auditor Pack
- Retention Policies
- Legal Holds
- Disposition Queue
- Alerts
- CLI Commands
- REST API
- Cron Setup
- Troubleshooting
Overview¶
The Integrity Assurance plugin automates the verification of digital object integrity in your AtoM Heratio instance. It ensures that archival files have not been corrupted, modified, or lost by comparing stored cryptographic checksums (generated by the Preservation plugin) against current file hashes.
How it works:
- The Preservation plugin generates baseline checksums (SHA-256 or SHA-512) when digital objects are ingested
- The Integrity plugin runs scheduled or ad-hoc verification passes
- Each verification is recorded in an append-only ledger with actor, hostname, and previous hash chain tracking
- Persistent failures are escalated to a dead-letter queue for manual investigation
- Retention policies determine when records become eligible for disposition review
- Legal holds block disposition of records under review
- Threshold-based alerts notify administrators of integrity issues via email or webhook
- Reports and dashboards provide visibility into overall collection health
Installation¶
Prerequisites¶
- AtoM Heratio v2.8+ with atom-framework v2.8.0+
- ahgPreservationPlugin must be installed and enabled
- MySQL 8.0+
- PHP 8.1+
Steps¶
# 1. Install database tables
mysql -u root archive < atom-ahg-plugins/ahgIntegrityPlugin/database/install.sql
# 2. Enable the plugin
php bin/atom extension:enable ahgIntegrityPlugin
# 3. Clear cache
php symfony cc
# 4. Verify
php symfony integrity:verify --status
The installation creates two default schedules: - Daily Sample Check (enabled): Verifies 200 random master objects daily - Weekly Full Scan (disabled): Comprehensive scan of all master objects
Upgrading from v1.0.0 / v1.1.0¶
If upgrading from an earlier version, the plugin will automatically add new columns (actor, hostname, previous_hash on integrity_ledger; object_format on integrity_retention_policy) on first use. You can also run the migration manually:
# Re-run install.sql (safe — uses CREATE TABLE IF NOT EXISTS)
mysql -u root archive < atom-ahg-plugins/ahgIntegrityPlugin/database/install.sql
# The ALTER for actor/hostname is handled programmatically
php symfony integrity:verify --status
Dashboard¶
Access: Admin > Integrity or navigate to /admin/integrity
The dashboard provides an at-a-glance view of your collection's integrity health:
Top Row (6 cards): - Master Objects: Total number of master digital objects in the system - Total Verifications: Cumulative verification count across all runs - Pass Rate: Percentage of verifications that matched baseline checksums - Open Dead Letters: Number of unresolved persistent failures (requires attention) - Never Verified (backlog): Master objects that have never been verified - Throughput (7d): Verification speed in objects/hour and GB/hour
Storage Growth KPI (v1.2.0): - Total storage scanned over the last 30 days (in GB) - Average GB/day scan rate
Daily Trend (30 days): - Interactive Chart.js stacked bar chart showing daily pass (green) and fail (red) counts - Helps identify trends and anomalies at a glance
Repository Breakdown: - Per-repository table with total verifications, passed, failed, and pass rate - Click a repository name to filter all dashboard statistics to that repository (v1.2.0)
Failure Type Breakdown: - Distribution of failure types (mismatch, missing, unreadable, etc.) over the last 30 days - Helps prioritize remediation efforts
Format Breakdown (v1.2.0): - Verification results grouped by file format (from preservation_object_format table) - Shows total, passed, and failed counts per format
Repository Filter (v1.2.0): - Dropdown at the top of the dashboard to scope all statistics to a specific repository - Click "Clear Filter" to return to global view
Navigation buttons in the header provide quick access to: Export, Policies, Holds, Alerts, Schedules, Ledger, Report.
Schedules¶
Access: Admin > Integrity > Schedules
Creating a Schedule¶
- Click New Schedule
- Configure the schedule:
- Name: Descriptive name (e.g., "Monthly Repository X Audit")
- Scope: Global (all objects), Repository (specific institution), or Hierarchy (specific node and descendants)
- Algorithm: SHA-256 (faster) or SHA-512 (more secure)
- Frequency: Daily, Weekly, Monthly, or Ad hoc
- Cron Expression: Optional override for custom schedules (e.g.,
0 3 * * 5for Fridays at 3am) - Configure concurrency controls:
- Batch Size: Objects per run (0 = unlimited). Use 200-500 for daily, 0 for weekly full scans
- IO Throttle: Milliseconds pause between objects (0-50ms recommended)
- Max Memory: Memory limit in MB (default 512MB)
- Max Runtime: Time limit in minutes (default 120)
- Max Concurrent Runs: Prevents overlapping executions (default 1)
- Configure notifications:
- Notify on failure: Alert when a run fails completely
- Notify on mismatch: Alert when hash mismatches are detected
- Email: Notification recipient address
Managing Schedules¶
From the schedule list, you can: - Toggle (play/pause icon): Enable or disable a schedule - Run Now (bolt icon): Execute immediately regardless of schedule - Edit (pencil icon): Modify schedule settings - Delete (trash icon): Remove the schedule (blocked if a run is active)
Run History¶
Access: Admin > Integrity > Runs
Each verification run records: - Status: Running, Completed, Partial (memory limit), Failed, Timeout, Cancelled - Counters: Objects scanned, passed, failed, missing, error, skipped - Trigger: Scheduler (automated), Manual (web UI), CLI (command line), API - Timing: Start and completion timestamps
Click a run ID to view detailed results including all ledger entries for that run.
Verification Ledger¶
Access: Admin > Integrity > Ledger
The ledger is an append-only audit trail. Entries are never updated or deleted, providing forensic-grade evidence of verification activities.
Each entry records: - Digital object ID (survives even if the object is later deleted) - File path, size, existence, and readability status - Hash algorithm, expected hash, computed hash, and match result - Outcome: pass, mismatch, missing, unreadable, permission_error, path_drift, no_baseline, error - Actor: Who/what triggered the verification (user, system, scheduler, CLI) - Hostname: Server that performed the verification - Previous Hash (v1.2.0): The computed hash from the most recent successful verification of the same object, enabling chain-of-custody verification and tamper detection - Verification timestamp and duration
Database Immutability Advisory (v1.2.0)¶
The integrity_ledger table is designed as an append-only audit trail. To enforce this at the database level, we recommend revoking UPDATE and DELETE privileges on this table for the web application user:
-- Create a dedicated web user with restricted privileges on the ledger
GRANT SELECT, INSERT ON archive.integrity_ledger TO 'atom_web'@'localhost';
-- Revoke any existing UPDATE/DELETE privileges
REVOKE UPDATE, DELETE ON archive.integrity_ledger FROM 'atom_web'@'localhost';
This ensures that even in the event of a compromised application, the verification audit trail cannot be tampered with. The previous_hash column provides an additional layer of chain verification — any gap or inconsistency in the hash chain indicates potential ledger tampering.
Filtering¶
- Outcome: Filter by specific outcome type
- Date range: Filter by verification date
- Repository: Filter by repository (uses denormalized repository_id)
Dead Letter Queue¶
Access: Admin > Integrity > Dead Letter
Objects that fail verification 3 or more consecutive times are automatically escalated to the dead letter queue. This prevents known-bad objects from consuming verification resources while ensuring they receive attention.
Workflow States¶
| State | Meaning | Next Actions |
|---|---|---|
| Open | New failure, needs attention | Acknowledge, Investigate, Resolve, Ignore |
| Acknowledged | Someone has seen it | Investigate, Resolve, Ignore |
| Investigating | Under active investigation | Resolve, Ignore |
| Resolved | Issue has been fixed | Reopen (if it fails again) |
| Ignored | Intentionally excluded | Reopen |
Common Failure Types¶
| Type | Cause | Resolution |
|---|---|---|
| mismatch | File hash differs from baseline | Investigate: was the file modified intentionally? If corruption, restore from backup |
| missing | File not found at expected path | Check if file was moved, deleted, or if storage mount is offline |
| unreadable | File exists but cannot be read | Check file permissions and ownership |
| permission_error | Access denied to file path | Fix filesystem permissions |
| path_drift | File path has changed | Update digital object record or restore symlinks |
Reports¶
Access: Admin > Integrity > Report
The report page shows: - Summary statistics (master objects, verifications, pass rate, dead letters) - Outcome breakdown with percentage bars - Monthly trend table (12 months) showing pass rates over time - CLI command reference for generating machine-readable reports
Export & Auditor Pack¶
Access: Admin > Integrity > Export
CSV Export¶
Download the verification ledger as a CSV file with filters: - Date range (from/to) - Repository - Outcome type
The CSV includes all ledger columns: ID, run ID, digital object ID, file path, hashes, outcome, actor, hostname, and timestamp. Maximum 50,000 rows per export.
Auditor Pack (ZIP)¶
Download a self-contained ZIP archive for compliance audits containing: - summary.html: Standalone HTML report with inline CSS (no external dependencies), showing statistics, schedule configuration, and overall health - exceptions.csv: All non-pass verification entries - config-snapshot.json: Complete schedule configuration, dead letter summary, and current statistics
Both export types support the same filter parameters.
CLI Export¶
# Export to CSV file
php symfony integrity:report --export-csv=/tmp/ledger.csv
# Export to stdout (pipe to another tool)
php symfony integrity:report --export-csv=-
# Generate auditor pack
php symfony integrity:report --auditor-pack=/tmp/auditor.zip
# With filters
php symfony integrity:report --export-csv=/tmp/q1.csv --date-from=2026-01-01 --date-to=2026-03-31
Retention Policies¶
Access: Admin > Integrity > Policies
Retention policies define how long records should be kept before becoming eligible for disposition review. This does NOT automatically delete records — it only identifies candidates for human review.
Creating a Policy¶
- Click New Policy
- Configure:
- Name: Descriptive name (e.g., "7-Year Financial Records")
- Retention Period: Number of days (0 = indefinite, never eligible)
- Trigger Type: When the clock starts
ingest_date: From when the record was created in AtoMlast_modified: From last modification dateclosure_date: From closure/completion datelast_access: From last access date
- Object Format (v1.2.0): Optional MIME type filter (e.g.,
image/tiff,application/pdf). Leave empty for all formats. Uses prefix matching, soimage/matches all image types. - Scope: Global, per-repository, or per-hierarchy node
- Enabled: Toggle to activate/deactivate
Managing Policies¶
From the policy list: - Toggle: Enable/disable the policy - Edit: Modify policy settings - Delete: Remove the policy (also removes its disposition queue entries)
Scanning for Eligible Records¶
Use the Scan for Eligible button on the Disposition page or CLI:
php symfony integrity:retention --scan-eligible
php symfony integrity:retention --scan-eligible --policy-id=1
Legal Holds¶
Access: Admin > Integrity > Holds
Legal holds prevent records from being disposed of, even if they are past their retention period. Use legal holds when records are subject to litigation, investigation, or regulatory review.
Placing a Hold¶
- Click Place Hold
- Enter the Information Object ID
- Provide a reason (required for audit trail)
- Click Place Hold
When a hold is placed: - The hold is recorded with the placer's name and timestamp - Any matching disposition queue entries are moved to "held" status - A ledger entry is created for audit purposes
Releasing a Hold¶
Click the unlock icon next to an active hold. When released: - The hold is marked as "released" with the releaser's name and timestamp - If no other active holds exist on the record, disposition queue entries revert to "eligible" - A ledger entry is created for audit purposes
Disposition Queue¶
Access: Admin > Integrity > Disposition
The disposition queue shows records that have passed their retention period and are candidates for review.
Status Flow¶
Reviewing Records¶
- Click the checkmark to approve disposition
- Click the X to reject disposition
- Optionally add review notes
Important: "Disposed" status only marks the record — it does NOT delete anything. Actual deletion (if required) is a separate manual process outside the plugin.
Status Summary¶
The page header shows counts for each status, helping prioritize review work.
Alerts¶
Access: Admin > Integrity > Alerts
Configure threshold-based alerts to be notified when integrity metrics cross defined boundaries.
Alert Types¶
| Type | Description | Example |
|---|---|---|
| Pass rate below | Triggers when pass rate drops | Alert if pass rate < 95% |
| Failure count above | Triggers when failures exceed threshold | Alert if > 10 failures per run |
| Dead letter count above | Triggers when open dead letters exceed threshold | Alert if > 5 open dead letters |
| Backlog above | Triggers when unverified objects exceed threshold | Alert if > 1000 never-verified |
| Run failure | Triggers on any failed/timeout/partial run | Alert on any run failure |
Notification Channels¶
- Email: Sent via the configured SwiftMailer (same as AtoM's email system)
- Webhook: HTTP POST to a URL with JSON payload
- Optional HMAC-SHA256 signature in
X-Signatureheader for verification - Useful for integration with Slack, Teams, PagerDuty, etc.
Creating an Alert¶
- Click New Alert
- Select the alert type and comparison operator
- Set the threshold value
- Provide email and/or webhook URL
- Optionally add a webhook secret for HMAC signing
- Enable/disable the alert
Alerts are evaluated after each batch verification run. Alert failures are non-fatal — they never break the verification process.
CLI Commands¶
integrity:verify¶
# Show status
php symfony integrity:verify --status
# Verify single object
php symfony integrity:verify --object-id=123
# Run a schedule
php symfony integrity:verify --schedule-id=1
# Verify stale objects (not checked in 14 days)
php symfony integrity:verify --limit=500 --stale-days=14
# Verify all objects in a repository
php symfony integrity:verify --repository-id=5 --limit=1000
# Verify all master objects
php symfony integrity:verify --all --throttle=20
# Dry run (preview only)
php symfony integrity:verify --dry-run --limit=100
integrity:schedule¶
# List all schedules
php symfony integrity:schedule --list
# Show status summary
php symfony integrity:schedule --status
# Run all due schedules (use in cron)
php symfony integrity:schedule --run-due
# Run specific schedule
php symfony integrity:schedule --run-id=1
# Enable/disable
php symfony integrity:schedule --enable=1
php symfony integrity:schedule --disable=2
integrity:report¶
# Summary report
php symfony integrity:report --summary
# Dead letter report
php symfony integrity:report --dead-letter
# Date-filtered ledger report
php symfony integrity:report --date-from=2026-01-01 --date-to=2026-02-28
# JSON output (for monitoring integration)
php symfony integrity:report --summary --format=json
# CSV output (for spreadsheets)
php symfony integrity:report --dead-letter --format=csv
# Export full ledger to CSV
php symfony integrity:report --export-csv=/tmp/ledger.csv
# Generate auditor pack
php symfony integrity:report --auditor-pack=/tmp/auditor.zip
integrity:retention¶
# List all retention policies
php symfony integrity:retention --list
# Show retention & disposition status
php symfony integrity:retention --status
# Scan for eligible disposition candidates
php symfony integrity:retention --scan-eligible
# Scan for specific policy
php symfony integrity:retention --scan-eligible --policy-id=1
# Process approved dispositions (mark as disposed)
php symfony integrity:retention --process-queue
# Place a legal hold
php symfony integrity:retention --hold=12345 --reason="Legal investigation"
# Release a legal hold
php symfony integrity:retention --release=1
REST API¶
The Integrity Assurance plugin provides a comprehensive REST API (v1.2.0) for integration with external monitoring tools, dashboards, and automation systems. All endpoints require administrator authentication via session cookie.
Paginated List Endpoints¶
| Endpoint | Method | Parameters | Description |
|---|---|---|---|
/api/integrity/ledger |
GET | limit, skip, repository_id, outcome, date_from, date_to |
Browse verification ledger |
/api/integrity/runs |
GET | limit, skip, status |
Browse run history |
/api/integrity/holds |
GET | limit, skip, status |
Browse legal holds |
/api/integrity/policies |
GET | limit, skip |
Browse retention policies |
All paginated endpoints return: {success: true, total: N, limit: N, skip: N, data: [...]}
Analytics Endpoints¶
| Endpoint | Method | Parameters | Description |
|---|---|---|---|
/api/integrity/stats |
GET | — | Dashboard statistics |
/api/integrity/daily-trend |
GET | days (default: 30) |
Daily pass/fail counts |
/api/integrity/repo-breakdown |
GET | — | Per-repository verification stats |
/api/integrity/format-breakdown |
GET | — | Per-format verification stats |
/api/integrity/throughput |
GET | days (default: 7) |
Verification throughput |
/api/integrity/storage-growth |
GET | days (default: 30) |
Storage scanned over time |
Action Endpoints¶
| Endpoint | Method | Parameters | Description |
|---|---|---|---|
/api/integrity/verify |
POST | object_id |
Verify a single digital object |
/api/integrity/schedule/:id/run |
POST | — | Execute a schedule immediately |
/api/integrity/schedule/:id/toggle |
POST | — | Enable/disable schedule |
/api/integrity/retention/scan |
POST | policy_id (optional) |
Scan for eligible dispositions |
/api/integrity/hold/place |
POST | information_object_id, reason |
Place a legal hold |
/api/integrity/hold/:id/release |
POST | — | Release a legal hold |
Example Usage¶
# Get dashboard stats
curl -s -b cookies.txt https://psis.theahg.co.za/api/integrity/stats | jq .
# Browse ledger (first 10 entries)
curl -s -b cookies.txt 'https://psis.theahg.co.za/api/integrity/ledger?limit=10' | jq .
# Get daily trend for last 7 days
curl -s -b cookies.txt 'https://psis.theahg.co.za/api/integrity/daily-trend?days=7' | jq .
Full OpenAPI 3.0.3 specification: docs/openapi.yaml
Cron Setup¶
Add these entries to your system crontab:
# Run due integrity schedules every 15 minutes
*/15 * * * * cd /usr/share/nginx/archive && php symfony integrity:schedule --run-due >> /var/log/atom/integrity-scheduler.log 2>&1
# Scan for retention-eligible objects daily at 1am
0 1 * * * cd /usr/share/nginx/archive && php symfony integrity:retention --scan-eligible >> /var/log/atom/integrity-retention.log 2>&1
# Process approved dispositions daily at 2am
0 2 * * * cd /usr/share/nginx/archive && php symfony integrity:retention --process-queue >> /var/log/atom/integrity-retention.log 2>&1
# Weekly integrity summary report (Monday 8am)
0 8 * * 1 cd /usr/share/nginx/archive && php symfony integrity:report --summary >> /var/log/atom/integrity-report.log 2>&1
# Weekly auditor pack export (Monday 8:30am)
30 8 * * 1 cd /usr/share/nginx/archive && php symfony integrity:report --auditor-pack=/tmp/integrity_weekly.zip >> /var/log/atom/integrity-report.log 2>&1
These cron entries are also documented in Admin > AHG Settings > Cron Jobs under the "Integrity Assurance" category.
Troubleshooting¶
"Schedule already has running instances"¶
A previous run is still active or was interrupted. Check for stale lock files:
If the process is dead, the lock will auto-recover on the next attempt (PID stale detection).No baseline checksums found¶
The Preservation plugin must have generated checksums for the digital objects. Run:
The Integrity plugin will also auto-generate baselines on first verification attempt."Memory limit reached" (partial status)¶
Increase the schedule's max memory setting or reduce the batch size.
Objects reported as "missing" but files exist¶
Check that the upload path symlink is correct:
The symlink should point to the NAS mount (e.g.,/mnt/nas/heratio/archive).
Pass rate declining¶
- Check dead letter queue for patterns (same repository, same failure type)
- Run a targeted verification:
php symfony integrity:verify --repository-id=X --limit=50 - Check storage health (NAS connectivity, disk errors)
Alerts not sending¶
- Verify email configuration in AtoM's
apps/qubit/config/factories.yml(mailer section) - Check webhook URL is accessible from the server
- Verify alert is enabled in Admin > Integrity > Alerts
- Check PHP error logs for alert-related exceptions
Retention scan finding no eligible records¶
- Ensure retention_period_days > 0 (0 = indefinite, never eligible)
- Check that the trigger_type column matches your data (e.g.,
ingest_daterequirescreated_at) - Verify the policy scope matches your records (repository ID, hierarchy node)
For technical support, contact The Archive and Heritage Group (Pty) Ltd at johan@theahg.co.za