Skip to content

Semantic Search - Technical Manual

Overview

The Semantic Search system provides query expansion capabilities for AtoM archives using a thesaurus-based approach. It integrates with Elasticsearch for search optimization and supports multiple data sources including WordNet, Wikidata, and local synonym definitions.

Namespace: AtomFramework\Services\SemanticSearch Database: Laravel Query Builder (Illuminate\Database\Capsule\Manager) Location: /usr/share/nginx/archive/atom-framework/src/Services/SemanticSearch/


Architecture

System Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              CLIENT LAYER                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   ┌──────────────────┐    ┌──────────────────┐    ┌──────────────────┐     │
│   │   Search Box     │    │   Admin UI       │    │   CLI Commands   │     │
│   │   (_box.php)     │    │(semanticSearch   │    │(ThesaurusCommand)│     │
│   │                  │    │    Admin)        │    │                  │     │
│   └────────┬─────────┘    └────────┬─────────┘    └────────┬─────────┘     │
│            │                       │                       │                │
└────────────┼───────────────────────┼───────────────────────┼────────────────┘
             │                       │                       │
             ▼                       ▼                       ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            SERVICE LAYER                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   ┌──────────────────────────────────────────────────────────────────────┐  │
│   │                     SemanticSearchService                             │  │
│   │   - expandSearchQuery()    - getExpansionInfo()                      │  │
│   │   - buildElasticsearchQuery()   - logSearch()                        │  │
│   └──────────────────────────────────────────────────────────────────────┘  │
│                           │                                                  │
│                           ▼                                                  │
│   ┌──────────────────────────────────────────────────────────────────────┐  │
│   │                      ThesaurusService                                 │  │
│   │   - addTerm()          - getSynonyms()      - expandQuery()          │  │
│   │   - importLocalSynonyms()   - exportToElasticsearch()                │  │
│   └──────────────────────────────────────────────────────────────────────┘  │
│              │                    │                    │                     │
│              ▼                    ▼                    ▼                     │
│   ┌────────────────┐   ┌────────────────┐   ┌────────────────┐             │
│   │ WordNetSync    │   │ WikidataSync   │   │ EmbeddingService│            │
│   │ Service        │   │ Service        │   │ (Ollama)       │             │
│   └────────────────┘   └────────────────┘   └────────────────┘             │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
             │                       │                       │
             ▼                       ▼                       ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            DATA LAYER                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   ┌─────────────────┐   ┌─────────────────┐   ┌─────────────────┐          │
│   │   MySQL DB      │   │  Elasticsearch  │   │  External APIs  │          │
│   │ (ahg_thesaurus  │   │  (synonyms.txt) │   │ - Datamuse      │          │
│   │   _* tables)    │   │                 │   │ - Wikidata      │          │
│   └─────────────────┘   └─────────────────┘   │ - Ollama        │          │
│                                               └─────────────────┘          │
└─────────────────────────────────────────────────────────────────────────────┘

Entity Relationship Diagram (ERD)

Database Schema

┌─────────────────────────────────────────────────────────────────────────────┐
│                          SEMANTIC SEARCH ERD                                 │
└─────────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────┐
│   ahg_thesaurus_term     │
├──────────────────────────┤
│ PK id            BIGINT  │
│    term          VARCHAR │◄────────────┐
│    source        VARCHAR │             │
│    domain        VARCHAR │             │
│    external_id   VARCHAR │             │
│    created_at    DATETIME│             │
│    updated_at    DATETIME│             │
└──────────────────────────┘             │
           │                             │
           │ 1                           │
           │                             │
           │                             │
           ▼ N                           │
┌──────────────────────────┐             │
│  ahg_thesaurus_synonym   │             │
├──────────────────────────┤             │
│ PK id            BIGINT  │             │
│ FK term_id       BIGINT  │─────────────┘
│    synonym       VARCHAR │
│    relationship  VARCHAR │  (exact, related, broader, narrower)
│    weight        DECIMAL │  (0.0 - 1.0)
│    source        VARCHAR │
│    created_at    DATETIME│
└──────────────────────────┘


┌──────────────────────────┐
│  ahg_thesaurus_embedding │
├──────────────────────────┤
│ PK id            BIGINT  │
│ FK term_id       BIGINT  │─────────────┐
│    model         VARCHAR │             │
│    embedding     BLOB    │             │
│    created_at    DATETIME│             │
└──────────────────────────┘             │
           ┌─────────────────────────────┘
┌──────────────────────────┐
│   ahg_thesaurus_term     │
│        (reference)       │
└──────────────────────────┘


┌──────────────────────────┐          ┌──────────────────────────┐
│  ahg_thesaurus_sync_log  │          │ahg_semantic_search_log   │
├──────────────────────────┤          ├──────────────────────────┤
│ PK id            BIGINT  │          │ PK id            BIGINT  │
│    source        VARCHAR │          │    original_query VARCHAR│
│    status        VARCHAR │          │    expanded_query TEXT   │
│    terms_synced  INT     │          │    was_expanded  BOOLEAN │
│    started_at    DATETIME│          │    terms_expanded INT    │
│    completed_at  DATETIME│          │    user_id       INT     │
│    error_message TEXT    │          │    ip_address    VARCHAR │
└──────────────────────────┘          │    created_at    DATETIME│
                                      └──────────────────────────┘


┌──────────────────────────┐
│      ahg_settings        │
├──────────────────────────┤
│ PK id            BIGINT  │
│    setting_key   VARCHAR │  (UNIQUE)
│    setting_value TEXT    │
│    setting_type  VARCHAR │  (string, boolean, integer, json)
│    setting_group VARCHAR │  (semantic_search)
│    updated_by    INT     │
│    created_at    DATETIME│
│    updated_at    DATETIME│
└──────────────────────────┘

Table Relationships

ahg_thesaurus_term (1) ──────< (N) ahg_thesaurus_synonym
                    (1) ──────< (N) ahg_thesaurus_embedding

ahg_settings ──── (filtered by setting_group = 'semantic_search')

Database Schema DDL

Migration File

Location: /usr/share/nginx/archive/atom-framework/database/migrations/2026_01_21_semantic_search_tables.sql

-- Thesaurus Terms
CREATE TABLE IF NOT EXISTS ahg_thesaurus_term (
    id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    term VARCHAR(255) NOT NULL,
    source VARCHAR(50) NOT NULL DEFAULT 'local',
    domain VARCHAR(100) DEFAULT 'general',
    external_id VARCHAR(255) NULL,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    UNIQUE KEY idx_term_source (term, source),
    INDEX idx_source (source),
    INDEX idx_domain (domain)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

-- Synonyms
CREATE TABLE IF NOT EXISTS ahg_thesaurus_synonym (
    id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    term_id BIGINT UNSIGNED NOT NULL,
    synonym VARCHAR(255) NOT NULL,
    relationship_type VARCHAR(50) DEFAULT 'exact',
    weight DECIMAL(3,2) DEFAULT 0.80,
    source VARCHAR(50) DEFAULT 'local',
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (term_id) REFERENCES ahg_thesaurus_term(id) ON DELETE CASCADE,
    UNIQUE KEY idx_term_synonym (term_id, synonym),
    INDEX idx_synonym (synonym),
    INDEX idx_weight (weight)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

-- Sync Log
CREATE TABLE IF NOT EXISTS ahg_thesaurus_sync_log (
    id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    source VARCHAR(50) NOT NULL,
    status VARCHAR(20) DEFAULT 'running',
    terms_synced INT DEFAULT 0,
    started_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    completed_at DATETIME NULL,
    error_message TEXT NULL,
    INDEX idx_source_status (source, status)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

-- Embeddings
CREATE TABLE IF NOT EXISTS ahg_thesaurus_embedding (
    id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    term_id BIGINT UNSIGNED NOT NULL,
    model VARCHAR(100) NOT NULL,
    embedding BLOB NOT NULL,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (term_id) REFERENCES ahg_thesaurus_term(id) ON DELETE CASCADE,
    UNIQUE KEY idx_term_model (term_id, model)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

-- Search Log
CREATE TABLE IF NOT EXISTS ahg_semantic_search_log (
    id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    original_query VARCHAR(500) NOT NULL,
    expanded_query TEXT NULL,
    was_expanded TINYINT(1) DEFAULT 0,
    terms_expanded INT DEFAULT 0,
    user_id INT NULL,
    ip_address VARCHAR(45) NULL,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_created (created_at),
    INDEX idx_query (original_query(100))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Service Classes

Class Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                           CLASS HIERARCHY                                    │
└─────────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                          ThesaurusService                                 │
├──────────────────────────────────────────────────────────────────────────┤
│ - logger: Logger                                                          │
│ - synonymsPath: string                                                    │
├──────────────────────────────────────────────────────────────────────────┤
│ + addTerm(term, source, domain, externalId): int                         │
│ + addSynonym(termId, synonym, type, weight, source): int                 │
│ + getSynonyms(term, minWeight, limit): array                             │
│ + expandQuery(query, limit): array                                        │
│ + importLocalSynonyms(): array                                           │
│ + exportToElasticsearch(): string                                        │
│ + getTermByWord(word): ?object                                           │
│ + startSyncLog(source): int                                              │
│ + completeSyncLog(logId, count): void                                    │
│ + failSyncLog(logId, error): void                                        │
└──────────────────────────────────────────────────────────────────────────┘
                                    │ uses
┌──────────────────────────────────────────────────────────────────────────┐
│                         WordNetSyncService                                │
├──────────────────────────────────────────────────────────────────────────┤
│ - thesaurus: ThesaurusService                                            │
│ - apiUrl: string                                                          │
│ - ARCHIVAL_TERMS: array                                                   │
│ - LIBRARY_TERMS: array                                                    │
│ - MUSEUM_TERMS: array                                                     │
├──────────────────────────────────────────────────────────────────────────┤
│ + syncTerm(term, domain): array                                          │
│ + syncArchivalTerms(): array                                             │
│ + syncLibraryTerms(): array                                              │
│ + syncMuseumTerms(): array                                               │
│ - fetchFromDatamuse(word): array                                         │
│ - mapRelationshipType(datamuseType): string                              │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                        WikidataSyncService                                │
├──────────────────────────────────────────────────────────────────────────┤
│ - thesaurus: ThesaurusService                                            │
│ - sparqlEndpoint: string                                                  │
│ - HERITAGE_CLASSES: array                                                 │
│ - SA_HERITAGE_ITEMS: array                                                │
├──────────────────────────────────────────────────────────────────────────┤
│ + syncHeritageTerms(): array                                             │
│ + syncSouthAfricanTerms(): array                                         │
│ - executeSparqlQuery(query): array                                       │
│ - buildHeritageQuery(): string                                           │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                         EmbeddingService                                  │
├──────────────────────────────────────────────────────────────────────────┤
│ - ollamaEndpoint: string                                                  │
│ - model: string                                                           │
├──────────────────────────────────────────────────────────────────────────┤
│ + generateEmbedding(text): array                                         │
│ + storeEmbedding(termId, embedding): void                                │
│ + findSimilar(text, limit): array                                        │
│ + cosineSimilarity(vec1, vec2): float                                    │
│ - callOllama(prompt): array                                              │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                       SemanticSearchService                               │
├──────────────────────────────────────────────────────────────────────────┤
│ - thesaurus: ThesaurusService                                            │
│ - settings: array                                                         │
├──────────────────────────────────────────────────────────────────────────┤
│ + expandSearchQuery(query): string                                       │
│ + getExpansionInfo(): array                                              │
│ + isEnabled(): bool                                                       │
│ + logSearch(original, expanded, userId): void                            │
└──────────────────────────────────────────────────────────────────────────┘

Query Expansion Flow

Detailed Flow Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                        QUERY EXPANSION FLOW                                  │
└─────────────────────────────────────────────────────────────────────────────┘

User Input: "old photographs township"
┌───────────────────────────────────────┐
│ 1. TOKENIZATION                       │
│    Split query into terms             │
│    ["old", "photographs", "township"] │
└───────────────────────────────────────┘
┌───────────────────────────────────────┐
│ 2. STOPWORD FILTERING                 │
│    Remove common words                │
│    ["photographs", "township"]        │
│    (Note: "old" may be kept)          │
└───────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────────┐
│ 3. SYNONYM LOOKUP (per term)                                               │
│                                                                            │
│    SELECT synonym, weight                                                  │
│    FROM ahg_thesaurus_synonym s                                           │
│    JOIN ahg_thesaurus_term t ON s.term_id = t.id                          │
│    WHERE t.term = 'photographs'                                           │
│    AND s.weight >= 0.6                                                    │
│    ORDER BY s.weight DESC                                                 │
│    LIMIT 5                                                                │
│                                                                            │
│    Results:                                                                │
│    ┌────────────────┬────────┐   ┌────────────────┬────────┐             │
│    │ photographs    │        │   │ township       │        │             │
│    ├────────────────┼────────┤   ├────────────────┼────────┤             │
│    │ photo          │ 0.95   │   │ location       │ 0.85   │             │
│    │ picture        │ 0.90   │   │ settlement     │ 0.80   │             │
│    │ image          │ 0.85   │   │ informal       │ 0.70   │             │
│    │ snapshot       │ 0.75   │   │ settlement     │        │             │
│    └────────────────┴────────┘   └────────────────┴────────┘             │
└───────────────────────────────────────────────────────────────────────────┘
┌───────────────────────────────────────┐
│ 4. QUERY CONSTRUCTION                 │
│                                       │
│    (photographs OR photo OR picture   │
│     OR image OR snapshot)             │
│    AND                                │
│    (township OR location OR           │
│     settlement)                       │
└───────────────────────────────────────┘
┌───────────────────────────────────────┐
│ 5. ELASTICSEARCH QUERY                │
│                                       │
│    {                                  │
│      "query": {                       │
│        "bool": {                      │
│          "should": [                  │
│            {"match": {...}},          │
│            {"match": {...}}           │
│          ]                            │
│        }                              │
│      }                                │
│    }                                  │
└───────────────────────────────────────┘
┌───────────────────────────────────────┐
│ 6. RESULTS + EXPANSION INFO           │
│                                       │
│    Return search results with         │
│    expansion metadata for UI display  │
└───────────────────────────────────────┘

Data Sync Flow

WordNet Sync Process

┌─────────────────────────────────────────────────────────────────────────────┐
│                         WORDNET SYNC FLOW                                    │
└─────────────────────────────────────────────────────────────────────────────┘

┌────────────────────┐
│ Start Sync         │
│ (CLI or Admin UI)  │
└─────────┬──────────┘
┌────────────────────┐
│ Create sync log    │
│ status = 'running' │
└─────────┬──────────┘
┌────────────────────┐
│ Load term list     │
│ (ARCHIVAL_TERMS,   │
│  LIBRARY_TERMS,    │
│  MUSEUM_TERMS)     │
└─────────┬──────────┘
┌────────────────────────────────────────────────────────────────────────────┐
│ FOR EACH term:                                                              │
│                                                                             │
│   ┌────────────────────┐                                                   │
│   │ Call Datamuse API  │                                                   │
│   │ GET /words?rel_syn │                                                   │
│   │ =term&max=10       │                                                   │
│   └─────────┬──────────┘                                                   │
│             │                                                               │
│             ▼                                                               │
│   ┌────────────────────┐                                                   │
│   │ Rate limit: 100ms  │                                                   │
│   │ between requests   │                                                   │
│   └─────────┬──────────┘                                                   │
│             │                                                               │
│             ▼                                                               │
│   ┌────────────────────┐                                                   │
│   │ Parse response:    │                                                   │
│   │ [{"word":"...",    │                                                   │
│   │   "score":1000}]   │                                                   │
│   └─────────┬──────────┘                                                   │
│             │                                                               │
│             ▼                                                               │
│   ┌────────────────────┐                                                   │
│   │ Normalize score    │                                                   │
│   │ to weight (0-1)    │                                                   │
│   └─────────┬──────────┘                                                   │
│             │                                                               │
│             ▼                                                               │
│   ┌────────────────────┐                                                   │
│   │ Upsert term +      │                                                   │
│   │ synonyms to DB     │                                                   │
│   └─────────┬──────────┘                                                   │
│             │                                                               │
└─────────────┼───────────────────────────────────────────────────────────────┘
┌────────────────────┐
│ Update sync log    │
│ status='completed' │
│ terms_synced = N   │
└────────────────────┘

CLI Commands

Command Reference

Location: /usr/share/nginx/archive/atom-framework/src/Console/ThesaurusCommand.php

┌─────────────────────────────────────────────────────────────────────────────┐
│                          CLI COMMAND REFERENCE                               │
└─────────────────────────────────────────────────────────────────────────────┘

Usage: php bin/atom thesaurus:<command> [options] [arguments]

┌─────────────────────────┬───────────────────────────────────────────────────┐
│ Command                 │ Description                                        │
├─────────────────────────┼───────────────────────────────────────────────────┤
│ thesaurus:import-local  │ Import local JSON synonym files                    │
│ thesaurus:sync-wordnet  │ Sync from Datamuse API (WordNet)                   │
│ thesaurus:sync-wikidata │ Sync from Wikidata SPARQL                          │
│ thesaurus:export-es     │ Export synonyms to Elasticsearch file              │
│ thesaurus:stats         │ Display thesaurus statistics                       │
│ thesaurus:expand        │ Test query expansion                               │
│ thesaurus:embeddings    │ Generate vector embeddings (requires Ollama)       │
└─────────────────────────┴───────────────────────────────────────────────────┘

Examples:
  php bin/atom thesaurus:import-local
  php bin/atom thesaurus:sync-wordnet --archival
  php bin/atom thesaurus:expand "old photographs"
  php bin/atom thesaurus:export-es --path=/etc/elasticsearch/synonyms/

Settings Configuration

Settings Keys

┌────────────────────────────────┬──────────┬─────────────────────────────────┐
│ Key                            │ Type     │ Default                          │
├────────────────────────────────┼──────────┼─────────────────────────────────┤
│ semantic_search_enabled        │ boolean  │ true                             │
│ semantic_expansion_limit       │ integer  │ 5                                │
│ semantic_min_weight            │ float    │ 0.6                              │
│ semantic_show_expansion        │ boolean  │ true                             │
│ semantic_log_searches          │ boolean  │ true                             │
│ semantic_wordnet_enabled       │ boolean  │ true                             │
│ semantic_wikidata_enabled      │ boolean  │ false                            │
│ semantic_local_synonyms        │ boolean  │ true                             │
│ semantic_ollama_enabled        │ boolean  │ false                            │
│ semantic_ollama_endpoint       │ string   │ http://localhost:11434           │
│ semantic_ollama_model          │ string   │ nomic-embed-text                 │
│ semantic_es_synonyms_path      │ string   │ /etc/elasticsearch/synonyms/... │
└────────────────────────────────┴──────────┴─────────────────────────────────┘

Accessing Settings

use Illuminate\Database\Capsule\Manager as DB;

// Get single setting
$enabled = DB::table('ahg_settings')
    ->where('setting_key', 'semantic_search_enabled')
    ->value('setting_value');

// Get all semantic search settings
$settings = DB::table('ahg_settings')
    ->where('setting_group', 'semantic_search')
    ->pluck('setting_value', 'setting_key');

Elasticsearch Integration

Synonym File Format

Location: /etc/elasticsearch/synonyms/ahg_synonyms.txt

# Format: term => synonym1, synonym2, synonym3
# Generated by: php bin/atom thesaurus:export-es

archive => repository, depot, record office, holdings
photograph => photo, picture, image, snapshot
manuscript => document, text, codex
township => location, settlement, informal settlement
fonds => collection, papers, records

Elasticsearch Index Configuration

{
  "settings": {
    "analysis": {
      "filter": {
        "ahg_synonyms": {
          "type": "synonym",
          "synonyms_path": "synonyms/ahg_synonyms.txt",
          "updateable": true
        }
      },
      "analyzer": {
        "ahg_search_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ahg_synonyms"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "standard",
        "search_analyzer": "ahg_search_analyzer"
      }
    }
  }
}

Local Synonym Files

File Structure

Location: /usr/share/nginx/archive/atom-framework/data/synonyms/

data/synonyms/
├── archival.json      # Archival terminology
├── library.json       # Library terminology
├── museum.json        # Museum terminology
└── south_african.json # South African heritage terms

JSON Format

{
  "domain": "archival",
  "terms": [
    {
      "term": "archive",
      "synonyms": [
        {"word": "repository", "weight": 0.95, "type": "exact"},
        {"word": "record office", "weight": 0.90, "type": "exact"},
        {"word": "depot", "weight": 0.85, "type": "exact"},
        {"word": "holdings", "weight": 0.75, "type": "related"}
      ]
    },
    {
      "term": "fonds",
      "synonyms": [
        {"word": "collection", "weight": 0.90, "type": "related"},
        {"word": "papers", "weight": 0.85, "type": "related"},
        {"word": "records", "weight": 0.80, "type": "broader"}
      ]
    }
  ]
}

API Reference

ThesaurusService Methods

/**
 * Add a term to the thesaurus
 * @param string $term The term to add
 * @param string $source Source identifier (local, wordnet, wikidata)
 * @param string $domain Domain category
 * @param string|null $externalId External source ID
 * @return int The term ID
 */
public function addTerm(
    string $term,
    string $source = 'local',
    string $domain = 'general',
    ?string $externalId = null
): int

/**
 * Add a synonym to a term
 * @param int $termId The term ID
 * @param string $synonym The synonym word
 * @param string $relationshipType Relationship type
 * @param float $weight Relevance weight (0.0-1.0)
 * @param string $source Source identifier
 * @return int The synonym ID
 */
public function addSynonym(
    int $termId,
    string $synonym,
    string $relationshipType = 'exact',
    float $weight = 0.8,
    string $source = 'local'
): int

/**
 * Get synonyms for a term
 * @param string $term The term to look up
 * @param float $minWeight Minimum weight threshold
 * @param int $limit Maximum synonyms to return
 * @return array Array of synonym objects
 */
public function getSynonyms(
    string $term,
    float $minWeight = 0.6,
    int $limit = 5
): array

/**
 * Expand a search query with synonyms
 * @param string $query Original search query
 * @param int $limit Max synonyms per term
 * @return array Associative array [term => [synonyms]]
 */
public function expandQuery(string $query, int $limit = 5): array

/**
 * Import local JSON synonym files
 * @return array Stats: ['terms' => N, 'synonyms' => M]
 */
public function importLocalSynonyms(): array

/**
 * Export synonyms to Elasticsearch format
 * @param string|null $path Output file path
 * @return string Path to generated file
 */
public function exportToElasticsearch(?string $path = null): string

Admin Module

Module Structure

ahgThemeB5Plugin/modules/semanticSearchAdmin/
├── actions/
│   └── actions.class.php
├── config/
│   └── module.yml
└── templates/
    ├── indexSuccess.php       # Dashboard
    ├── configSuccess.php      # Settings form
    ├── termsSuccess.php       # Term browser
    ├── termAddSuccess.php     # Add term form
    ├── termViewSuccess.php    # View term details
    ├── syncLogsSuccess.php    # Sync history
    └── searchLogsSuccess.php  # Search log viewer

Actions

Action Method Description
index GET Dashboard with stats
config GET/POST Settings configuration
terms GET Browse thesaurus terms
termAdd GET/POST Add custom term
termView GET View term details
syncLogs GET View sync history
searchLogs GET View search logs
runSync POST Execute sync (AJAX)
testExpand GET Test query expansion (AJAX)

Logging

Log Configuration

Location: /usr/share/nginx/archive/logs/semantic_search.log

use Monolog\Logger;
use Monolog\Handler\RotatingFileHandler;

$logger = new Logger('semantic_search');
$logger->pushHandler(new RotatingFileHandler(
    '/usr/share/nginx/archive/logs/semantic_search.log',
    7,
    Logger::INFO
));

Log Levels

Level Usage
INFO Sync started/completed, terms imported
WARNING Rate limits hit, API timeouts
ERROR Sync failures, database errors
DEBUG Individual term processing (disabled in production)

Performance Considerations

Caching

// Static cache for repeated lookups within a request
private static array $synonymCache = [];

public function getSynonyms(string $term): array
{
    if (isset(self::$synonymCache[$term])) {
        return self::$synonymCache[$term];
    }

    // ... fetch from database ...

    self::$synonymCache[$term] = $results;
    return $results;
}

Indexing

Ensure proper indexes on: - ahg_thesaurus_term.term (for lookups) - ahg_thesaurus_synonym.synonym (for reverse lookups) - ahg_thesaurus_synonym.weight (for filtering)

Rate Limiting

External API calls are rate-limited: - Datamuse API: 100ms between requests - Wikidata SPARQL: 500ms between requests


Security

Access Control

All admin actions require administrator credentials:

public function preExecute()
{
    if (!$this->context->user->hasCredential('administrator')) {
        $this->forward('admin', 'secure');
    }
}

Input Validation

// Sanitize search input
$term = trim(strtolower($request->getParameter('term', '')));
$term = preg_replace('/[^a-z0-9\s\-]/', '', $term);

// Validate numeric parameters
$weight = max(0, min(1, (float)$request->getParameter('weight', 0.8)));
$limit = max(1, min(20, (int)$request->getParameter('limit', 5)));

Troubleshooting

Common Issues

Issue Cause Solution
No synonyms returned Table empty Run thesaurus:import-local
Sync fails API rate limit Wait and retry
Slow queries Missing indexes Check database indexes
ES synonyms not working File not reloaded Restart Elasticsearch

Diagnostic Queries

-- Check term count by source
SELECT source, COUNT(*) as count
FROM ahg_thesaurus_term
GROUP BY source;

-- Check synonym coverage
SELECT t.term, COUNT(s.id) as synonyms
FROM ahg_thesaurus_term t
LEFT JOIN ahg_thesaurus_synonym s ON t.id = s.term_id
GROUP BY t.id
ORDER BY synonyms DESC
LIMIT 20;

-- Check recent syncs
SELECT * FROM ahg_thesaurus_sync_log
ORDER BY started_at DESC
LIMIT 10;

Version History

Version Date Changes
1.0.0 Jan 2026 Initial release

Document Version: 1.0 Last Updated: January 2026 Author: The Archive and Heritage Group (Pty) Ltd