Skip to content

AI Tools (ahgAIPlugin)

User Guide

Powerful AI-powered tools for archival records: Named Entity Recognition (NER), Translation, Summarization, Spellcheck, Handwriting Text Recognition (HTR), and LLM Description Suggestions.


Overview

+-------------------------------------------------------------------------+
|                          AI TOOLS SUITE                                  |
+-------------------------------------------------------------------------+
|                                                                          |
|  +----------+ +----------+ +----------+ +----------+ +----------------+  |
|  |   NER    | | TRANSLATE| | SUMMARIZE| | SUGGEST  | |   SPELLCHECK   |  |
|  +----+-----+ +----+-----+ +----+-----+ +----+-----+ +-------+--------+  |
|       |            |            |            |               |           |
|       v            v            v            v               v           |
|  Extract      Translate    Generate     LLM-powered     Check spelling   |
|  names,       between      AI           description     and grammar      |
|  places,      languages    summaries    suggestions     in metadata      |
|  dates                     from PDFs    from OCR                         |
|                                                                          |
+-------------------------------------------------------------------------+

Features Summary

+--------------------+--------------------------------------------------+
|    Feature         |           Description                            |
+--------------------+--------------------------------------------------+
|  NER               | Extract persons, organizations, places, dates    |
|  Translation       | Offline machine translation (Argos Translate)    |
|  Summarization     | AI-powered text summarization from PDFs          |
|  Suggest Descr.    | LLM-powered scope_and_content from OCR/metadata  |
|  Spellcheck        | Spelling and grammar checking                    |
|  HTR               | Handwriting Text Recognition for images          |
+--------------------+--------------------------------------------------+

Named Entity Recognition (NER)

What is NER?

NER automatically identifies and extracts named entities from your archival records:

+-----------------------------------------------------------------+
|                    ENTITY TYPES DETECTED                         |
+-----------------------------------------------------------------+
|  PERSON     - Individual names (John Smith, Dr. Jane Doe)        |
|  ORG        - Organizations (UNESCO, British Museum)             |
|  GPE        - Places/Locations (London, South Africa)            |
|  DATE       - Dates and time periods (1985, January 2020)        |
+-----------------------------------------------------------------+

Using NER from the Interface

Step 1: Navigate to Record

Go to any archival description (Information Object)

Step 2: Click Extract Entities

+--------------------------------------------------+
|  AI Tools                                         |
|  +--------------------------------------------+  |
|  |  [Generate Summary]                        |  |
|  +--------------------------------------------+  |
|  |  [Extract Entities]  <-- Click here        |  |
|  +--------------------------------------------+  |
+--------------------------------------------------+

Step 3: View Results

+--------------------------------------------------+
|  Extraction Results                               |
|  +--------------------------------------------+  |
|  |  Found 12 entities                         |  |
|  |  [Review & Link ->]                        |  |
|  +--------------------------------------------+  |
+--------------------------------------------------+

Reviewing Extracted Entities

Access the Review Dashboard

Navigate to: Admin -> AI Tools -> NER Review

Review Dashboard

+------------------------------------------------------------+
|  NER Review Dashboard                                       |
+------------------------------------------------------------+
|                                                             |
|  +------------------------+  +-------------------------+    |
|  |        127             |  |          23             |    |
|  |  Entities Pending      |  |  Objects to Review      |    |
|  +------------------------+  +-------------------------+    |
|                                                             |
|  Objects with Pending Entities                              |
|  +------------------------------------------------------+  |
|  | Object                    | Pending | Actions        |  |
|  +------------------------------------------------------+  |
|  | Meeting Minutes 1985-90   |   15    | [Review]       |  |
|  | Personnel Records Box 3   |   12    | [Review]       |  |
|  | Annual Report 2023        |    8    | [Review]       |  |
|  +------------------------------------------------------+  |
|                                                             |
+------------------------------------------------------------+

Entity Review Actions

+----------------------------------------------------------+
|              ACTIONS FOR EACH ENTITY                      |
+----------------------------------------------------------+
|                                                           |
|  CREATE & LINK     Create new actor/place/subject and    |
|                    link to record                         |
|                                                           |
|  LINK TO EXISTING  Link to existing authority record      |
|                    (exact or fuzzy match suggested)       |
|                                                           |
|  APPROVE           Mark as correct but don't link         |
|                                                           |
|  REJECT            Mark as incorrect/not relevant         |
|                                                           |
+----------------------------------------------------------+

Editing Entities Before Saving

+------------------------------------------------------------+
|  Review Extracted Entities                                  |
+------------------------------------------------------------+
|                                                             |
|  People (4)                                                 |
|  +------------------------------------------------------+  |
|  | [J. Smith_______] [Type: Person v] [Create Actor v]  |  |
|  | Match: John Smith                                    |  |
|  +------------------------------------------------------+  |
|  | [Dr. Brown______] [Type: Person v] [Link to: Dr...v] |  |
|  | Similar: Dr. James Brown, Dr. Mary Brown             |  |
|  +------------------------------------------------------+  |
|                                                             |
|  Organizations (2)                                          |
|  +------------------------------------------------------+  |
|  | [UNESCO_________] [Type: Org    v] [Link to: UNE..v] |  |
|  | Exact Match: UNESCO                                  |  |
|  +------------------------------------------------------+  |
|                                                             |
|           [Create All]  [Reject All]  [Save All Decisions] |
+------------------------------------------------------------+

Auto-trigger NER on Document Upload

NER extraction can be automatically triggered when documents are uploaded, saving time and ensuring all uploaded content is processed.

Enabling Auto-trigger

Navigate to: Admin > AHG Settings > AI Services > NER

Enable the "Auto-extract on upload" setting.

How It Works

+------------------------------------------------------------------+
|                  AUTO-TRIGGER WORKFLOW                            |
+------------------------------------------------------------------+
|                                                                   |
|   1. USER UPLOADS DOCUMENT                                        |
|      (PDF, Word, RTF, or text file)                               |
|                |                                                  |
|                v                                                  |
|   2. SYSTEM CHECKS                                                |
|      - Is auto-trigger enabled?                                   |
|      - Is file type processable?                                  |
|                |                                                  |
|                v                                                  |
|   3. NER EXTRACTION QUEUED                                        |
|      - Background processing via Gearman                          |
|      - Or pending queue if Gearman unavailable                    |
|                |                                                  |
|                v                                                  |
|   4. ENTITIES EXTRACTED                                           |
|      - Results available in NER Review Dashboard                  |
|                                                                   |
+------------------------------------------------------------------+

Supported Document Types for Auto-trigger

+----------------------------------------------------------------+
|                PROCESSABLE DOCUMENT TYPES                       |
+----------------------------------------------------------------+
|  Type                              | Description                |
+------------------------------------+----------------------------+
|  application/pdf                   | PDF documents              |
|  text/plain                        | Plain text files           |
|  text/html                         | HTML documents             |
|  application/msword                | Word documents (.doc)      |
|  application/vnd.openxmlformats-   | Word documents (.docx)     |
|    officedocument.wordprocessingml.document                     |
|  application/rtf                   | Rich text format           |
+----------------------------------------------------------------+

Processing Pending Queue

If the background job system (Gearman) is unavailable, uploaded documents are queued for later processing. Run the pending queue processor via cron:

# Process pending NER extractions every 5 minutes
*/5 * * * * cd /usr/share/nginx/atom && php symfony ai:process-pending --limit=20 >> /var/log/atom/ai-pending.log 2>&1

This cron job is available in the Admin > AHG Settings > Cron Jobs page.


Translation

About Translation

Translate archival descriptions between languages using offline machine translation (Argos Translate).

Supported Languages

+---------------------+
| Language  | Code    |
+-----------+---------+
| English   |   en    |
| Afrikaans |   af    |
| French    |   fr    |
| Dutch     |   nl    |
| Portuguese|   pt    |
| Spanish   |   es    |
| German    |   de    |
+-----------+---------+

Using Translation

From the Command Line

# Translate a single record
php symfony ai:translate --from=en --to=af --object=12345

# Translate all records in a repository
php symfony ai:translate --from=en --to=af --repository=5 --limit=50

# Install language package if missing
php symfony ai:translate --from=en --to=af --install-package

# Preview what would be translated
php symfony ai:translate --from=en --to=af --object=12345 --dry-run

Translation Options

+----------------------------------------------------------------+
|                    TRANSLATION OPTIONS                          |
+----------------------------------------------------------------+
|  --from           Source language code (e.g., en)               |
|  --to             Target language code (e.g., af)               |
|  --object         Translate specific object ID                  |
|  --repository     Translate all in repository ID                |
|  --fields         Fields to translate (default: title,scope)    |
|  --limit          Maximum records to translate                  |
|  --dry-run        Preview without making changes                |
|  --install-package Install language package if missing          |
+----------------------------------------------------------------+

Translation Process

     Source Record                     Target Record
     (English)                         (Afrikaans)
         |                                  |
         v                                  v
  +-------------+                    +-------------+
  | Title       | ----Translate----> | Titel       |
  | Scope &     | ----Translate----> | Omvang en   |
  | Content     |                    | Inhoud      |
  +-------------+                    +-------------+

Summarization

About Summarization

Automatically generate summaries for records with attached PDF documents, saving time on cataloging.

Using Summarization

From the Interface

+--------------------------------------------------+
|  AI Tools                                         |
|  +--------------------------------------------+  |
|  |  [Generate Summary]  <-- Click here        |  |
|  +--------------------------------------------+  |
|  |  [Extract Entities]                        |  |
|  +--------------------------------------------+  |
+--------------------------------------------------+

Result

+--------------------------------------------------+
|  Summary Generated                                |
|  +--------------------------------------------+  |
|  |  Summary saved to Scope & Content          |  |
|  |  Processing time: 2345ms                   |  |
|  |  [Refresh Page]                            |  |
|  +--------------------------------------------+  |
+--------------------------------------------------+

From the Command Line

# Summarize a specific record
php symfony ai:summarize --object=12345

# Summarize records with empty Scope & Content
php symfony ai:summarize --all-empty --limit=50

# Summarize records in a repository
php symfony ai:summarize --repository=5 --limit=100

# Preview what would be processed
php symfony ai:summarize --all-empty --dry-run

Summarization Settings

+----------------------------------------------------------------+
|                  SUMMARIZATION OPTIONS                          |
+----------------------------------------------------------------+
|  --object         Summarize specific object ID                  |
|  --repository     Summarize all in repository ID                |
|  --all-empty      Process records with empty summary            |
|  --field          Target field (default: scope_and_content)     |
|  --limit          Maximum records to process                    |
|  --dry-run        Preview without making changes                |
+----------------------------------------------------------------+

How It Works

+------------------------------------------------------------------+
|                   SUMMARIZATION WORKFLOW                          |
+------------------------------------------------------------------+
|                                                                   |
|   1. EXTRACT TEXT                                                 |
|      +----------------+                                           |
|      |  PDF Document  | ---> pdftotext ---> Raw Text              |
|      +----------------+         OR                                |
|      | Metadata Fields| ---> Direct extraction                    |
|      +----------------+                                           |
|                |                                                  |
|                v                                                  |
|   2. ANALYZE & SUMMARIZE                                          |
|      +----------------+                                           |
|      |  AI API        | ---> Generate concise summary             |
|      +----------------+                                           |
|                |                                                  |
|                v                                                  |
|   3. SAVE RESULT                                                  |
|      +----------------+                                           |
|      |  Scope &       | <--- Summary saved                        |
|      |  Content Field |                                           |
|      +----------------+                                           |
|                                                                   |
+------------------------------------------------------------------+

LLM Description Suggestions

About Description Suggestions

Generate intelligent Scope and Content descriptions using Large Language Models (LLMs). The system combines OCR text, metadata, and contextual information to suggest comprehensive archival descriptions.

Supported LLM Providers

+----------------------------------------------------------------+
|                    LLM PROVIDERS                                |
+----------------------------------------------------------------+
|  Provider    | Type    | Description                           |
+--------------+---------+---------------------------------------+
|  Ollama      | Local   | Privacy-focused, runs on your server |
|              |         | Models: llama3.1, mistral, mixtral    |
+--------------+---------+---------------------------------------+
|  OpenAI      | Cloud   | GPT models via API                    |
|              |         | Models: gpt-4o-mini, gpt-4o           |
+--------------+---------+---------------------------------------+
|  Anthropic   | Cloud   | Claude models via API                 |
|              |         | Models: claude-3-haiku, claude-3-sonnet|
+----------------------------------------------------------------+

Using Description Suggestions

From the Interface

Step 1: Navigate to Record

Go to any archival description with OCR text or metadata

Step 2: Click Suggest Description
+--------------------------------------------------+
|  AI Tools                                         |
|  +--------------------------------------------+  |
|  |  [Generate Summary]                        |  |
|  +--------------------------------------------+  |
|  |  [Extract Entities]                        |  |
|  +--------------------------------------------+  |
|  |  [Suggest Description (AI)]  <-- Click     |  |
|  +--------------------------------------------+  |
+--------------------------------------------------+
Step 3: Review Side-by-Side
+------------------------------------------------------------+
|  AI Description Suggestion                                  |
+------------------------------------------------------------+
|                                                             |
|  +------------------------+  +------------------------+     |
|  | CURRENT DESCRIPTION    |  | AI SUGGESTION          |     |
|  +------------------------+  +------------------------+     |
|  |                        |  |                        |     |
|  | [Existing text or      |  | [AI-generated text     |     |
|  |  empty]                |  |  based on OCR and      |     |
|  |                        |  |  metadata - EDITABLE]  |     |
|  |                        |  |                        |     |
|  +------------------------+  +------------------------+     |
|                                                             |
|  Review Notes: [_______________________________________]    |
|                                                             |
|  Model: llama3.1:8b  |  Tokens: 450  |  Time: 2.3s         |
|                                                             |
|              [Approve]  [Edit & Approve]  [Reject]          |
+------------------------------------------------------------+
Step 4: Make Decision
+----------------------------------------------------------------+
|                    DECISION OPTIONS                             |
+----------------------------------------------------------------+
|  APPROVE         Accept suggestion as-is, save to record        |
|  EDIT & APPROVE  Modify suggestion, then save to record         |
|  REJECT          Discard suggestion, add rejection notes        |
+----------------------------------------------------------------+

Review Dashboard

Access via: Admin -> AI Tools -> Suggestion Review

+------------------------------------------------------------+
|  Description Suggestion Review                              |
+------------------------------------------------------------+
|                                                             |
|  +----------+  +----------+  +----------+  +----------+     |
|  |   45     |  |   23     |  |   12     |  |    8     |     |
|  | Pending  |  | Approved |  | Rejected |  |  Edited  |     |
|  +----------+  +----------+  +----------+  +----------+     |
|                                                             |
|  Filter by Repository: [All Repositories        v]          |
|                                                             |
|  Pending Suggestions                                        |
|  +------------------------------------------------------+  |
|  | Record              | Generated    | Model    | Act  |  |
|  +------------------------------------------------------+  |
|  | Annual Report 2023  | 2 hours ago  | llama3.1 | [->] |  |
|  | Meeting Minutes Q1  | 3 hours ago  | llama3.1 | [->] |  |
|  | Personnel File #42  | 1 day ago    | gpt-4o   | [->] |  |
|  +------------------------------------------------------+  |
|                                                             |
+------------------------------------------------------------+

From the Command Line

# Generate suggestion for specific record
php symfony ai:suggest-description --object=12345

# Process records with empty scope_and_content
php symfony ai:suggest-description --empty-only --limit=50

# Process only records with OCR text
php symfony ai:suggest-description --with-ocr --limit=100

# Process records in a specific repository
php symfony ai:suggest-description --repository=5 --limit=50

# Preview what would be processed (dry run)
php symfony ai:suggest-description --empty-only --dry-run

# Use specific prompt template
php symfony ai:suggest-description --template=2 --limit=20

# Use specific LLM configuration
php symfony ai:suggest-description --llm-config=1 --limit=20

Command Options

+----------------------------------------------------------------+
|              SUGGEST DESCRIPTION OPTIONS                        |
+----------------------------------------------------------------+
|  --object=ID        Process specific object ID                  |
|  --repository=ID    Process all in repository ID                |
|  --level=LEVEL      Filter by level (fonds, series, file, item) |
|  --empty-only       Only records with empty scope_and_content   |
|  --with-ocr         Only records that have OCR text available   |
|  --limit=N          Maximum records to process (default: 50)    |
|  --template=ID      Prompt template ID to use                   |
|  --llm-config=ID    LLM configuration ID                        |
|  --dry-run          Preview without generating suggestions      |
|  --delay=MS         Delay between API calls (default: 1000)     |
+----------------------------------------------------------------+

How It Works

+------------------------------------------------------------------+
|                DESCRIPTION SUGGESTION WORKFLOW                    |
+------------------------------------------------------------------+
|                                                                   |
|   1. GATHER CONTEXT                                               |
|      +----------------+                                           |
|      | Record Data    | ---> Title, identifier, dates, level     |
|      | OCR Text       | ---> Full text from digital objects      |
|      | Metadata       | ---> Creator, repository, existing data  |
|      +----------------+                                           |
|                |                                                  |
|                v                                                  |
|   2. SELECT TEMPLATE                                              |
|      +----------------+                                           |
|      | Prompt Template| ---> Based on level/repository/default   |
|      | - System prompt|                                           |
|      | - User template| ---> Variables: {title}, {ocr_text}, etc |
|      +----------------+                                           |
|                |                                                  |
|                v                                                  |
|   3. CALL LLM                                                     |
|      +----------------+                                           |
|      | LLM Provider   | ---> Ollama / OpenAI / Anthropic         |
|      +----------------+                                           |
|                |                                                  |
|                v                                                  |
|   4. SAVE & REVIEW                                                |
|      +----------------+                                           |
|      | Pending        | ---> Custodian reviews suggestion        |
|      | Suggestion     |                                           |
|      +----------------+                                           |
|                |                                                  |
|                v                                                  |
|   5. APPLY (on approval)                                          |
|      +----------------+                                           |
|      | scope_and_     | <--- Approved text saved                  |
|      | content        |                                           |
|      +----------------+                                           |
|                                                                   |
+------------------------------------------------------------------+

Prompt Templates

The system includes default templates that can be customized:

+----------------------------------------------------------------+
|                    DEFAULT TEMPLATES                            |
+----------------------------------------------------------------+
|  Template            | Use Case                                 |
+----------------------+------------------------------------------+
|  Standard Archival   | General archival descriptions            |
|  Item-Level OCR      | Items with OCR text (transcriptions)     |
|  Photograph          | Photographs and image collections        |
+----------------------------------------------------------------+

Template Variables

+----------------------------------------------------------------+
|                  TEMPLATE VARIABLES                             |
+----------------------------------------------------------------+
|  Variable                | Description                          |
+--------------------------+--------------------------------------+
|  {title}                 | Record title                         |
|  {identifier}            | Reference code/identifier            |
|  {level_of_description}  | Level (fonds, series, file, item)    |
|  {date_range}            | Date expression                      |
|  {creator}               | Creator name                         |
|  {repository}            | Repository name                      |
|  {ocr_text}              | Full OCR text from digital objects   |
|  {existing_metadata}     | All available metadata fields        |
+----------------------------------------------------------------+

Cron Job Scheduling

Automate description suggestion generation:

# Generate for empty records (daily at 2am)
0 2 * * * cd /usr/share/nginx/atom && php symfony ai:suggest-description --empty-only --limit=100

# Generate for OCR records (weekly Sunday 3am)
0 3 * * 0 cd /usr/share/nginx/atom && php symfony ai:suggest-description --with-ocr --limit=200

# Cleanup expired suggestions (monthly on 1st at 4am)
0 4 1 * * cd /usr/share/nginx/atom && php symfony ai:suggest-description --cleanup

Best Practices

+------------------------------------------------------------+
|          DESCRIPTION SUGGESTION BEST PRACTICES              |
+------------------------------------------------------------+
|  DO                           |  DON'T                      |
+-------------------------------+-----------------------------+
|  Always review suggestions    |  Auto-approve without review|
|  Use OCR for richer context   |  Ignore OCR text            |
|  Edit suggestions if needed   |  Accept low-quality output  |
|  Choose appropriate template  |  Use wrong template type    |
|  Process in small batches     |  Generate thousands at once |
|  Use local Ollama for privacy |  Send sensitive data to cloud|
+-------------------------------+-----------------------------+

Spellcheck

About Spellcheck

Check spelling and grammar in metadata fields to improve data quality.

Using Spellcheck

From the Command Line

# Check a specific record
php symfony ai:spellcheck --object=12345

# Check all records in a repository
php symfony ai:spellcheck --repository=5 --limit=100

# Check all records
php symfony ai:spellcheck --all --limit=100

# Specify language
php symfony ai:spellcheck --all --language=en_ZA

# Preview what would be checked
php symfony ai:spellcheck --all --dry-run

Spellcheck Options

+----------------------------------------------------------------+
|                    SPELLCHECK OPTIONS                           |
+----------------------------------------------------------------+
|  --object         Check specific object ID                      |
|  --repository     Check all in repository ID                    |
|  --all            Check all objects                             |
|  --language       Language code (default: en_US)                |
|  --limit          Maximum records to check                      |
|  --dry-run        Preview without making changes                |
+----------------------------------------------------------------+

Spellcheck Results

+----------------------------------------------------------------+
|                    SPELLCHECK OUTPUT                            |
+----------------------------------------------------------------+
|  Checked 50 objects (lang: en)                                  |
|  Object 12345: 3 issues                                         |
|  Object 12346: 0 issues                                         |
|  Object 12347: 5 issues                                         |
|  Done: 50 checked, 12 with issues                               |
+----------------------------------------------------------------+

Handwriting Text Recognition (HTR)

About HTR

Extract text from handwritten documents using AI-powered recognition with zone detection.

How HTR Works

+------------------------------------------------------------------+
|                      HTR WORKFLOW                                 |
+------------------------------------------------------------------+
|                                                                   |
|   1. IMAGE INPUT                                                  |
|      +----------------+                                           |
|      |  Scanned Image | ---> Load image file                      |
|      |  (JPG/PNG/TIFF)|                                           |
|      +----------------+                                           |
|                |                                                  |
|                v                                                  |
|   2. ZONE DETECTION                                               |
|      +----------------+                                           |
|      | Detect text    | ---> Identify text line regions           |
|      | zones/lines    |                                           |
|      +----------------+                                           |
|                |                                                  |
|                v                                                  |
|   3. TEXT RECOGNITION                                             |
|      +----------------+                                           |
|      | HTR Models     | ---> Recognize handwritten text           |
|      | (date/digits/  |      in each zone                         |
|      |  letters)      |                                           |
|      +----------------+                                           |
|                |                                                  |
|                v                                                  |
|   4. OUTPUT                                                       |
|      +----------------+                                           |
|      | Extracted Text | ---> Per-zone results with coordinates    |
|      +----------------+                                           |
|                                                                   |
+------------------------------------------------------------------+

HTR Recognition Modes

+----------------------------------------------------------------+
|                      HTR MODES                                  |
+----------------------------------------------------------------+
|  Mode      | Description                                        |
+------------+----------------------------------------------------+
|  all       | Use all models (date, digits, letters) - default   |
|  date      | Optimized for date recognition                     |
|  digits    | Optimized for numeric content                      |
|  letters   | Optimized for alphabetic text                      |
+----------------------------------------------------------------+

CLI Commands Reference

Installation & Setup

# Install plugin database tables
php symfony ai:install

# Uninstall (keeps data by default)
php symfony ai:uninstall

# Uninstall and remove all data
php symfony ai:uninstall --no-keep-data

NER Commands

# Extract entities from all unprocessed records
php symfony ai:ner-extract --all --limit=100

# Extract from specific object
php symfony ai:ner-extract --object=12345

# Extract from objects in a repository
php symfony ai:ner-extract --repository=5 --limit=50

# Extract including PDF text
php symfony ai:ner-extract --all --with-pdf --limit=100

# Queue jobs for background processing
php symfony ai:ner-extract --all --queue

# Preview (dry run)
php symfony ai:ner-extract --all --dry-run

Training Data Sync

# Sync corrections to training server
php symfony ai:ner-sync

# Export corrections to local file
php symfony ai:ner-sync --export-file

# View training statistics
php symfony ai:ner-sync --stats

Note: Training sync requires AHG Central integration to be configured. Go to Admin > AHG Plugin Settings > AHG Central to set up the API URL and key.

Pending Queue Processing

# Process pending NER extractions (fallback for Gearman)
php symfony ai:process-pending --limit=50

# Process pending summarization tasks
php symfony ai:process-pending --task-type=summarize --limit=20

# Preview what would be processed (dry run)
php symfony ai:process-pending --dry-run

Note: This command is needed when Gearman is unavailable. Auto-triggered NER jobs from document uploads are queued to the database and processed by this command.

Translation Commands

# Translate single record
php symfony ai:translate --from=en --to=af --object=12345

# Translate repository records
php symfony ai:translate --from=en --to=af --repository=5 --limit=50

# Install language package
php symfony ai:translate --from=en --to=af --install-package

Summarization Commands

# Summarize single record
php symfony ai:summarize --object=12345

# Summarize records with empty scope
php symfony ai:summarize --all-empty --limit=50

Spellcheck Commands

# Check single record
php symfony ai:spellcheck --object=12345

# Check all records
php symfony ai:spellcheck --all --limit=100

Description Suggestion Commands

# Generate suggestion for single record
php symfony ai:suggest-description --object=12345

# Process records with empty scope_and_content
php symfony ai:suggest-description --empty-only --limit=50

# Process records with OCR text
php symfony ai:suggest-description --with-ocr --limit=100

# Use specific template and LLM
php symfony ai:suggest-description --template=2 --llm-config=1 --limit=20

# Preview what would be processed
php symfony ai:suggest-description --empty-only --dry-run

NER Review Workflow

Complete Workflow

+------------------------------------------------------------------+
|                   NER REVIEW WORKFLOW                             |
+------------------------------------------------------------------+
|                                                                   |
|   1. EXTRACTION                                                   |
|      Run extraction via UI or CLI                                 |
|      -> Entities stored with 'pending' status                     |
|                                                                   |
|   2. REVIEW                                                       |
|      Open NER Review Dashboard                                    |
|      -> Select object to review                                   |
|      -> Edit entity values/types if needed                        |
|      -> Choose action for each entity                             |
|                                                                   |
|   3. SAVE DECISIONS                                               |
|      Click "Save All Decisions"                                   |
|      -> Entities processed in batches                             |
|      -> Access points created/linked                              |
|                                                                   |
|   4. TRAINING FEEDBACK                                            |
|      Corrections tracked for model improvement                    |
|      -> Run ai:ner-sync to export training data                   |
|                                                                   |
+------------------------------------------------------------------+

Entity Linking Results

+----------------------------------------------------------------+
|                 LINKING RESULTS BY TYPE                         |
+----------------------------------------------------------------+
|  Entity Type  | Creates/Links To                                |
+---------------+------------------------------------------------+
|  PERSON       | Actor (Name Access Point)                       |
|  ORG          | Actor - Corporate Body (Name Access Point)      |
|  GPE          | Place Term (Place Access Point)                 |
|  DATE         | Subject Term or Event                           |
+----------------------------------------------------------------+

Training Data Export

Correction Types Tracked

+----------------------------------------------------------------+
|                 CORRECTION TYPES                                |
+----------------------------------------------------------------+
|  Type         | Description                                     |
+---------------+------------------------------------------------+
|  value_edit   | Entity value was edited before saving           |
|  type_change  | Entity type was changed (e.g., PERSON -> ORG)   |
|  both         | Both value and type were changed                |
|  approved     | Entity approved as-is (no link)                 |
|  rejected     | Entity marked as incorrect                      |
+----------------------------------------------------------------+

Export Training Data

# View correction statistics
php symfony ai:ner-sync --stats

# Output:
# value_edit: 45 total, 20 exported, 25 pending
# type_change: 12 total, 8 exported, 4 pending
# rejected: 30 total, 25 exported, 5 pending

# Export to file
php symfony ai:ner-sync --export-file
# Creates: /tmp/ner_corrections_2026-01-30_143215.json

# Push to training server
php symfony ai:ner-sync

AHG Central Configuration

Training data sync requires AHG Central integration. Configure it at:

Admin > AHG Plugin Settings > AHG Central

Setting Description
Enable Integration Master switch for cloud sync
API URL AHG Central endpoint (default: https://train.theahg.co.za/api)
API Key Your authentication key (contact support@theahg.co.za)
Site ID Unique identifier for your AtoM instance

You can test the connection before saving to verify your credentials.


Best Practices

NER Best Practices

+------------------------------------------------------------+
|                    NER BEST PRACTICES                       |
+------------------------------------------------------------+
|  DO                           |  DON'T                      |
+-------------------------------+-----------------------------+
|  Review entities regularly    |  Auto-link without review   |
|  Fix entity values if wrong   |  Ignore fuzzy matches       |
|  Export training data         |  Delete all pending         |
|  Use batch processing         |  Process one at a time      |
|  Process PDFs for more data   |  Skip PDF extraction        |
+-------------------------------+-----------------------------+

Translation Best Practices

+------------------------------------------------------------+
|               TRANSLATION BEST PRACTICES                    |
+------------------------------------------------------------+
|  DO                           |  DON'T                      |
+-------------------------------+-----------------------------+
|  Install packages first       |  Translate without packages |
|  Use --dry-run to preview     |  Bulk translate blindly     |
|  Review translated content    |  Trust 100% accuracy        |
|  Process in batches           |  Translate entire database  |
+-------------------------------+-----------------------------+

Summarization Best Practices

+------------------------------------------------------------+
|             SUMMARIZATION BEST PRACTICES                    |
+------------------------------------------------------------+
|  DO                           |  DON'T                      |
+-------------------------------+-----------------------------+
|  Use for records with PDFs    |  Expect perfect summaries   |
|  Review generated summaries   |  Auto-publish without review|
|  Set appropriate min/max      |  Use default for all types  |
|  Process records in batches   |  Summarize entire archive   |
+-------------------------------+-----------------------------+

Troubleshooting

Common Issues

Issue Solution
NER returns no entities Check if record has text content in title/scope
Translation fails Install language package with --install-package
Summarization fails Ensure PDF has extractable text (not image-only)
Spellcheck errors Install aspell dictionary for language
HTR not working Ensure image file is accessible and valid format
LLM suggestion fails Check Ollama is running (ollama serve) or API keys are configured
"Ollama not available" Install and start Ollama, or configure OpenAI/Anthropic
Empty suggestion Record needs OCR text or substantial metadata for context
Slow LLM response Increase timeout or use smaller/faster model

Error Messages

+----------------------------------------------------------------+
|                    ERROR MESSAGES                               |
+----------------------------------------------------------------+
|  "No text content found"                                        |
|  -> Record has no title, scope, or extractable PDF text         |
|                                                                 |
|  "Language package not installed"                               |
|  -> Run: php symfony ai:translate --from=X --to=Y --install-package |
|                                                                 |
|  "Summarizer service not available"                             |
|  -> Check AI API is running and accessible                      |
|                                                                 |
|  "NER is disabled in settings"                                  |
|  -> Enable in ahg_ai_settings table (feature='ner', key='enabled') |
|                                                                 |
|  "LLM provider not available"                                   |
|  -> Check Ollama: curl http://localhost:11434/api/tags          |
|  -> Or configure OpenAI/Anthropic API keys                      |
|                                                                 |
|  "No LLM configuration found"                                   |
|  -> Run database migration to create default configs            |
|  -> Or add config in ahg_llm_config table                       |
|                                                                 |
|  "Suggestion requires review"                                   |
|  -> Suggestions are pending - approve via Review Dashboard      |
+----------------------------------------------------------------+

Checking Service Health

# Check AI API health
curl http://localhost:5004/ai/v1/health

# Expected response:
# {"status": "healthy", "services": {"ner": true, "summarizer": true, "translate": true}}

# Check Ollama health (for LLM suggestions)
curl http://localhost:11434/api/tags

# Expected response:
# {"models": [{"name": "llama3.1:8b", ...}]}

# Check LLM health via AtoM
curl https://your-site.com/ai/llm/health

Configuration

Settings Table

+----------------------------------------------------------------+
|                    AI SETTINGS                                  |
+----------------------------------------------------------------+
|  Feature     | Setting Key              | Default Value         |
+--------------|--------------------------|----------------------+
|  general     | api_url                  | http://192.168.0.112:5004/ai/v1 |
|  general     | api_key                  | ahg_ai_demo_internal_2026 |
|  general     | api_timeout              | 60                    |
|  ner         | enabled                  | 1                     |
|  ner         | confidence_threshold     | 0.85                  |
|  ner         | enabled_entity_types     | ["PERSON","ORG","GPE","DATE"] |
|  summarize   | enabled                  | 1                     |
|  summarize   | max_length               | 1000                  |
|  summarize   | min_length               | 100                   |
|  translate   | enabled                  | 1                     |
|  translate   | engine                   | argos                 |
|  spellcheck  | enabled                  | 1                     |
|  spellcheck  | language                 | en                    |
|  suggest     | enabled                  | 1                     |
|  suggest     | require_review           | 1                     |
|  suggest     | auto_expire_days         | 30                    |
|  suggest     | default_llm_config       | 1                     |
|  suggest     | default_template         | 1                     |
+----------------------------------------------------------------+

LLM Configurations

+----------------------------------------------------------------+
|                    LLM CONFIGURATIONS                           |
+----------------------------------------------------------------+
|  Provider  | Default Model           | Endpoint               |
+------------+-------------------------+------------------------+
|  ollama    | llama3.1:8b             | http://localhost:11434 |
|  openai    | gpt-4o-mini             | https://api.openai.com |
|  anthropic | claude-3-haiku-20240307 | https://api.anthropic.com |
+----------------------------------------------------------------+

Need Help?

Contact your system administrator if you experience issues.


Part of the AtoM AHG Framework