Skip to content

AtoM Heratio — Integrity Assurance Plugin User Manual

Version: 1.2.0 Author: The Archive and Heritage Group (Pty) Ltd


Table of Contents

  1. Overview
  2. Installation
  3. Dashboard
  4. Schedules
  5. Run History
  6. Verification Ledger
  7. Dead Letter Queue
  8. Reports
  9. Export & Auditor Pack
  10. Retention Policies
  11. Legal Holds
  12. Disposition Queue
  13. Alerts
  14. CLI Commands
  15. REST API
  16. Cron Setup
  17. Troubleshooting

Overview

The Integrity Assurance plugin automates the verification of digital object integrity in your AtoM Heratio instance. It ensures that archival files have not been corrupted, modified, or lost by comparing stored cryptographic checksums (generated by the Preservation plugin) against current file hashes.

How it works:

  1. The Preservation plugin generates baseline checksums (SHA-256 or SHA-512) when digital objects are ingested
  2. The Integrity plugin runs scheduled or ad-hoc verification passes
  3. Each verification is recorded in an append-only ledger with actor, hostname, and previous hash chain tracking
  4. Persistent failures are escalated to a dead-letter queue for manual investigation
  5. Retention policies determine when records become eligible for disposition review
  6. Legal holds block disposition of records under review
  7. Threshold-based alerts notify administrators of integrity issues via email or webhook
  8. Reports and dashboards provide visibility into overall collection health

Installation

Prerequisites

  • AtoM Heratio v2.8+ with atom-framework v2.8.0+
  • ahgPreservationPlugin must be installed and enabled
  • MySQL 8.0+
  • PHP 8.1+

Steps

# 1. Install database tables
mysql -u root archive < atom-ahg-plugins/ahgIntegrityPlugin/database/install.sql

# 2. Enable the plugin
php bin/atom extension:enable ahgIntegrityPlugin

# 3. Clear cache
php symfony cc

# 4. Verify
php symfony integrity:verify --status

The installation creates two default schedules: - Daily Sample Check (enabled): Verifies 200 random master objects daily - Weekly Full Scan (disabled): Comprehensive scan of all master objects

Upgrading from v1.0.0 / v1.1.0

If upgrading from an earlier version, the plugin will automatically add new columns (actor, hostname, previous_hash on integrity_ledger; object_format on integrity_retention_policy) on first use. You can also run the migration manually:

# Re-run install.sql (safe — uses CREATE TABLE IF NOT EXISTS)
mysql -u root archive < atom-ahg-plugins/ahgIntegrityPlugin/database/install.sql

# The ALTER for actor/hostname is handled programmatically
php symfony integrity:verify --status

Dashboard

Access: Admin > Integrity or navigate to /admin/integrity

The dashboard provides an at-a-glance view of your collection's integrity health:

Top Row (6 cards): - Master Objects: Total number of master digital objects in the system - Total Verifications: Cumulative verification count across all runs - Pass Rate: Percentage of verifications that matched baseline checksums - Open Dead Letters: Number of unresolved persistent failures (requires attention) - Never Verified (backlog): Master objects that have never been verified - Throughput (7d): Verification speed in objects/hour and GB/hour

Storage Growth KPI (v1.2.0): - Total storage scanned over the last 30 days (in GB) - Average GB/day scan rate

Daily Trend (30 days): - Interactive Chart.js stacked bar chart showing daily pass (green) and fail (red) counts - Helps identify trends and anomalies at a glance

Repository Breakdown: - Per-repository table with total verifications, passed, failed, and pass rate - Click a repository name to filter all dashboard statistics to that repository (v1.2.0)

Failure Type Breakdown: - Distribution of failure types (mismatch, missing, unreadable, etc.) over the last 30 days - Helps prioritize remediation efforts

Format Breakdown (v1.2.0): - Verification results grouped by file format (from preservation_object_format table) - Shows total, passed, and failed counts per format

Repository Filter (v1.2.0): - Dropdown at the top of the dashboard to scope all statistics to a specific repository - Click "Clear Filter" to return to global view

Navigation buttons in the header provide quick access to: Export, Policies, Holds, Alerts, Schedules, Ledger, Report.

Schedules

Access: Admin > Integrity > Schedules

Creating a Schedule

  1. Click New Schedule
  2. Configure the schedule:
  3. Name: Descriptive name (e.g., "Monthly Repository X Audit")
  4. Scope: Global (all objects), Repository (specific institution), or Hierarchy (specific node and descendants)
  5. Algorithm: SHA-256 (faster) or SHA-512 (more secure)
  6. Frequency: Daily, Weekly, Monthly, or Ad hoc
  7. Cron Expression: Optional override for custom schedules (e.g., 0 3 * * 5 for Fridays at 3am)
  8. Configure concurrency controls:
  9. Batch Size: Objects per run (0 = unlimited). Use 200-500 for daily, 0 for weekly full scans
  10. IO Throttle: Milliseconds pause between objects (0-50ms recommended)
  11. Max Memory: Memory limit in MB (default 512MB)
  12. Max Runtime: Time limit in minutes (default 120)
  13. Max Concurrent Runs: Prevents overlapping executions (default 1)
  14. Configure notifications:
  15. Notify on failure: Alert when a run fails completely
  16. Notify on mismatch: Alert when hash mismatches are detected
  17. Email: Notification recipient address

Managing Schedules

From the schedule list, you can: - Toggle (play/pause icon): Enable or disable a schedule - Run Now (bolt icon): Execute immediately regardless of schedule - Edit (pencil icon): Modify schedule settings - Delete (trash icon): Remove the schedule (blocked if a run is active)

Run History

Access: Admin > Integrity > Runs

Each verification run records: - Status: Running, Completed, Partial (memory limit), Failed, Timeout, Cancelled - Counters: Objects scanned, passed, failed, missing, error, skipped - Trigger: Scheduler (automated), Manual (web UI), CLI (command line), API - Timing: Start and completion timestamps

Click a run ID to view detailed results including all ledger entries for that run.

Verification Ledger

Access: Admin > Integrity > Ledger

The ledger is an append-only audit trail. Entries are never updated or deleted, providing forensic-grade evidence of verification activities.

Each entry records: - Digital object ID (survives even if the object is later deleted) - File path, size, existence, and readability status - Hash algorithm, expected hash, computed hash, and match result - Outcome: pass, mismatch, missing, unreadable, permission_error, path_drift, no_baseline, error - Actor: Who/what triggered the verification (user, system, scheduler, CLI) - Hostname: Server that performed the verification - Previous Hash (v1.2.0): The computed hash from the most recent successful verification of the same object, enabling chain-of-custody verification and tamper detection - Verification timestamp and duration

Database Immutability Advisory (v1.2.0)

The integrity_ledger table is designed as an append-only audit trail. To enforce this at the database level, we recommend revoking UPDATE and DELETE privileges on this table for the web application user:

-- Create a dedicated web user with restricted privileges on the ledger
GRANT SELECT, INSERT ON archive.integrity_ledger TO 'atom_web'@'localhost';

-- Revoke any existing UPDATE/DELETE privileges
REVOKE UPDATE, DELETE ON archive.integrity_ledger FROM 'atom_web'@'localhost';

This ensures that even in the event of a compromised application, the verification audit trail cannot be tampered with. The previous_hash column provides an additional layer of chain verification — any gap or inconsistency in the hash chain indicates potential ledger tampering.

Filtering

  • Outcome: Filter by specific outcome type
  • Date range: Filter by verification date
  • Repository: Filter by repository (uses denormalized repository_id)

Dead Letter Queue

Access: Admin > Integrity > Dead Letter

Objects that fail verification 3 or more consecutive times are automatically escalated to the dead letter queue. This prevents known-bad objects from consuming verification resources while ensuring they receive attention.

Workflow States

State Meaning Next Actions
Open New failure, needs attention Acknowledge, Investigate, Resolve, Ignore
Acknowledged Someone has seen it Investigate, Resolve, Ignore
Investigating Under active investigation Resolve, Ignore
Resolved Issue has been fixed Reopen (if it fails again)
Ignored Intentionally excluded Reopen

Common Failure Types

Type Cause Resolution
mismatch File hash differs from baseline Investigate: was the file modified intentionally? If corruption, restore from backup
missing File not found at expected path Check if file was moved, deleted, or if storage mount is offline
unreadable File exists but cannot be read Check file permissions and ownership
permission_error Access denied to file path Fix filesystem permissions
path_drift File path has changed Update digital object record or restore symlinks

Reports

Access: Admin > Integrity > Report

The report page shows: - Summary statistics (master objects, verifications, pass rate, dead letters) - Outcome breakdown with percentage bars - Monthly trend table (12 months) showing pass rates over time - CLI command reference for generating machine-readable reports

Export & Auditor Pack

Access: Admin > Integrity > Export

CSV Export

Download the verification ledger as a CSV file with filters: - Date range (from/to) - Repository - Outcome type

The CSV includes all ledger columns: ID, run ID, digital object ID, file path, hashes, outcome, actor, hostname, and timestamp. Maximum 50,000 rows per export.

Auditor Pack (ZIP)

Download a self-contained ZIP archive for compliance audits containing: - summary.html: Standalone HTML report with inline CSS (no external dependencies), showing statistics, schedule configuration, and overall health - exceptions.csv: All non-pass verification entries - config-snapshot.json: Complete schedule configuration, dead letter summary, and current statistics

Both export types support the same filter parameters.

CLI Export

# Export to CSV file
php symfony integrity:report --export-csv=/tmp/ledger.csv

# Export to stdout (pipe to another tool)
php symfony integrity:report --export-csv=-

# Generate auditor pack
php symfony integrity:report --auditor-pack=/tmp/auditor.zip

# With filters
php symfony integrity:report --export-csv=/tmp/q1.csv --date-from=2026-01-01 --date-to=2026-03-31

Retention Policies

Access: Admin > Integrity > Policies

Retention policies define how long records should be kept before becoming eligible for disposition review. This does NOT automatically delete records — it only identifies candidates for human review.

Creating a Policy

  1. Click New Policy
  2. Configure:
  3. Name: Descriptive name (e.g., "7-Year Financial Records")
  4. Retention Period: Number of days (0 = indefinite, never eligible)
  5. Trigger Type: When the clock starts
    • ingest_date: From when the record was created in AtoM
    • last_modified: From last modification date
    • closure_date: From closure/completion date
    • last_access: From last access date
  6. Object Format (v1.2.0): Optional MIME type filter (e.g., image/tiff, application/pdf). Leave empty for all formats. Uses prefix matching, so image/ matches all image types.
  7. Scope: Global, per-repository, or per-hierarchy node
  8. Enabled: Toggle to activate/deactivate

Managing Policies

From the policy list: - Toggle: Enable/disable the policy - Edit: Modify policy settings - Delete: Remove the policy (also removes its disposition queue entries)

Scanning for Eligible Records

Use the Scan for Eligible button on the Disposition page or CLI:

php symfony integrity:retention --scan-eligible
php symfony integrity:retention --scan-eligible --policy-id=1

Access: Admin > Integrity > Holds

Legal holds prevent records from being disposed of, even if they are past their retention period. Use legal holds when records are subject to litigation, investigation, or regulatory review.

Placing a Hold

  1. Click Place Hold
  2. Enter the Information Object ID
  3. Provide a reason (required for audit trail)
  4. Click Place Hold

When a hold is placed: - The hold is recorded with the placer's name and timestamp - Any matching disposition queue entries are moved to "held" status - A ledger entry is created for audit purposes

Releasing a Hold

Click the unlock icon next to an active hold. When released: - The hold is marked as "released" with the releaser's name and timestamp - If no other active holds exist on the record, disposition queue entries revert to "eligible" - A ledger entry is created for audit purposes

Disposition Queue

Access: Admin > Integrity > Disposition

The disposition queue shows records that have passed their retention period and are candidates for review.

Status Flow

eligible → pending_review → approved → disposed
                          → rejected
                          → held (if legal hold placed)

Reviewing Records

  • Click the checkmark to approve disposition
  • Click the X to reject disposition
  • Optionally add review notes

Important: "Disposed" status only marks the record — it does NOT delete anything. Actual deletion (if required) is a separate manual process outside the plugin.

Status Summary

The page header shows counts for each status, helping prioritize review work.

Alerts

Access: Admin > Integrity > Alerts

Configure threshold-based alerts to be notified when integrity metrics cross defined boundaries.

Alert Types

Type Description Example
Pass rate below Triggers when pass rate drops Alert if pass rate < 95%
Failure count above Triggers when failures exceed threshold Alert if > 10 failures per run
Dead letter count above Triggers when open dead letters exceed threshold Alert if > 5 open dead letters
Backlog above Triggers when unverified objects exceed threshold Alert if > 1000 never-verified
Run failure Triggers on any failed/timeout/partial run Alert on any run failure

Notification Channels

  • Email: Sent via the configured SwiftMailer (same as AtoM's email system)
  • Webhook: HTTP POST to a URL with JSON payload
  • Optional HMAC-SHA256 signature in X-Signature header for verification
  • Useful for integration with Slack, Teams, PagerDuty, etc.

Creating an Alert

  1. Click New Alert
  2. Select the alert type and comparison operator
  3. Set the threshold value
  4. Provide email and/or webhook URL
  5. Optionally add a webhook secret for HMAC signing
  6. Enable/disable the alert

Alerts are evaluated after each batch verification run. Alert failures are non-fatal — they never break the verification process.

CLI Commands

integrity:verify

# Show status
php symfony integrity:verify --status

# Verify single object
php symfony integrity:verify --object-id=123

# Run a schedule
php symfony integrity:verify --schedule-id=1

# Verify stale objects (not checked in 14 days)
php symfony integrity:verify --limit=500 --stale-days=14

# Verify all objects in a repository
php symfony integrity:verify --repository-id=5 --limit=1000

# Verify all master objects
php symfony integrity:verify --all --throttle=20

# Dry run (preview only)
php symfony integrity:verify --dry-run --limit=100

integrity:schedule

# List all schedules
php symfony integrity:schedule --list

# Show status summary
php symfony integrity:schedule --status

# Run all due schedules (use in cron)
php symfony integrity:schedule --run-due

# Run specific schedule
php symfony integrity:schedule --run-id=1

# Enable/disable
php symfony integrity:schedule --enable=1
php symfony integrity:schedule --disable=2

integrity:report

# Summary report
php symfony integrity:report --summary

# Dead letter report
php symfony integrity:report --dead-letter

# Date-filtered ledger report
php symfony integrity:report --date-from=2026-01-01 --date-to=2026-02-28

# JSON output (for monitoring integration)
php symfony integrity:report --summary --format=json

# CSV output (for spreadsheets)
php symfony integrity:report --dead-letter --format=csv

# Export full ledger to CSV
php symfony integrity:report --export-csv=/tmp/ledger.csv

# Generate auditor pack
php symfony integrity:report --auditor-pack=/tmp/auditor.zip

integrity:retention

# List all retention policies
php symfony integrity:retention --list

# Show retention & disposition status
php symfony integrity:retention --status

# Scan for eligible disposition candidates
php symfony integrity:retention --scan-eligible

# Scan for specific policy
php symfony integrity:retention --scan-eligible --policy-id=1

# Process approved dispositions (mark as disposed)
php symfony integrity:retention --process-queue

# Place a legal hold
php symfony integrity:retention --hold=12345 --reason="Legal investigation"

# Release a legal hold
php symfony integrity:retention --release=1

REST API

The Integrity Assurance plugin provides a comprehensive REST API (v1.2.0) for integration with external monitoring tools, dashboards, and automation systems. All endpoints require administrator authentication via session cookie.

Paginated List Endpoints

Endpoint Method Parameters Description
/api/integrity/ledger GET limit, skip, repository_id, outcome, date_from, date_to Browse verification ledger
/api/integrity/runs GET limit, skip, status Browse run history
/api/integrity/holds GET limit, skip, status Browse legal holds
/api/integrity/policies GET limit, skip Browse retention policies

All paginated endpoints return: {success: true, total: N, limit: N, skip: N, data: [...]}

Analytics Endpoints

Endpoint Method Parameters Description
/api/integrity/stats GET Dashboard statistics
/api/integrity/daily-trend GET days (default: 30) Daily pass/fail counts
/api/integrity/repo-breakdown GET Per-repository verification stats
/api/integrity/format-breakdown GET Per-format verification stats
/api/integrity/throughput GET days (default: 7) Verification throughput
/api/integrity/storage-growth GET days (default: 30) Storage scanned over time

Action Endpoints

Endpoint Method Parameters Description
/api/integrity/verify POST object_id Verify a single digital object
/api/integrity/schedule/:id/run POST Execute a schedule immediately
/api/integrity/schedule/:id/toggle POST Enable/disable schedule
/api/integrity/retention/scan POST policy_id (optional) Scan for eligible dispositions
/api/integrity/hold/place POST information_object_id, reason Place a legal hold
/api/integrity/hold/:id/release POST Release a legal hold

Example Usage

# Get dashboard stats
curl -s -b cookies.txt https://psis.theahg.co.za/api/integrity/stats | jq .

# Browse ledger (first 10 entries)
curl -s -b cookies.txt 'https://psis.theahg.co.za/api/integrity/ledger?limit=10' | jq .

# Get daily trend for last 7 days
curl -s -b cookies.txt 'https://psis.theahg.co.za/api/integrity/daily-trend?days=7' | jq .

Full OpenAPI 3.0.3 specification: docs/openapi.yaml

Cron Setup

Add these entries to your system crontab:

# Run due integrity schedules every 15 minutes
*/15 * * * * cd /usr/share/nginx/archive && php symfony integrity:schedule --run-due >> /var/log/atom/integrity-scheduler.log 2>&1

# Scan for retention-eligible objects daily at 1am
0 1 * * * cd /usr/share/nginx/archive && php symfony integrity:retention --scan-eligible >> /var/log/atom/integrity-retention.log 2>&1

# Process approved dispositions daily at 2am
0 2 * * * cd /usr/share/nginx/archive && php symfony integrity:retention --process-queue >> /var/log/atom/integrity-retention.log 2>&1

# Weekly integrity summary report (Monday 8am)
0 8 * * 1 cd /usr/share/nginx/archive && php symfony integrity:report --summary >> /var/log/atom/integrity-report.log 2>&1

# Weekly auditor pack export (Monday 8:30am)
30 8 * * 1 cd /usr/share/nginx/archive && php symfony integrity:report --auditor-pack=/tmp/integrity_weekly.zip >> /var/log/atom/integrity-report.log 2>&1

These cron entries are also documented in Admin > AHG Settings > Cron Jobs under the "Integrity Assurance" category.

Troubleshooting

"Schedule already has running instances"

A previous run is still active or was interrupted. Check for stale lock files:

ls -la cache/integrity_locks/
If the process is dead, the lock will auto-recover on the next attempt (PID stale detection).

No baseline checksums found

The Preservation plugin must have generated checksums for the digital objects. Run:

php symfony preservation:fixity --age=0 --limit=100
The Integrity plugin will also auto-generate baselines on first verification attempt.

"Memory limit reached" (partial status)

Increase the schedule's max memory setting or reduce the batch size.

Objects reported as "missing" but files exist

Check that the upload path symlink is correct:

ls -la uploads/r
The symlink should point to the NAS mount (e.g., /mnt/nas/heratio/archive).

Pass rate declining

  1. Check dead letter queue for patterns (same repository, same failure type)
  2. Run a targeted verification: php symfony integrity:verify --repository-id=X --limit=50
  3. Check storage health (NAS connectivity, disk errors)

Alerts not sending

  1. Verify email configuration in AtoM's apps/qubit/config/factories.yml (mailer section)
  2. Check webhook URL is accessible from the server
  3. Verify alert is enabled in Admin > Integrity > Alerts
  4. Check PHP error logs for alert-related exceptions

Retention scan finding no eligible records

  1. Ensure retention_period_days > 0 (0 = indefinite, never eligible)
  2. Check that the trigger_type column matches your data (e.g., ingest_date requires created_at)
  3. Verify the policy scope matches your records (repository ID, hierarchy node)

For technical support, contact The Archive and Heritage Group (Pty) Ltd at johan@theahg.co.za