Fuzzy Search - Technical Documentation¶
Plugin: ahgDisplayPlugin Version: 3.2.23+ Last Updated: February 2026
1. Architecture Overview¶
┌─────────────────────────────────────────────────────────────────────────────────┐
│ FUZZY SEARCH ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────────────────────────────────────────────────────────────┐ │
│ │ LAYER 1: SPELL CORRECTION │ │
│ │ │ │
│ │ FuzzySearchService.php │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Levenshtein │ │ SOUNDEX │ │ Metaphone │ │ │
│ │ │ (edit dist) │ │ (phonetic) │ │ (phonetic) │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ │ Vocabulary Sources: │ │
│ │ • display_facet_cache (616 terms) │ │
│ │ • ahg_thesaurus_term (2,946 terms, try/catch) │ │
│ │ • term_i18n (taxonomy 35/42/78) │ │
│ │ • actor_i18n (creator names) │ │
│ └────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────────────┐ │
│ │ LAYER 2: FULLTEXT SEARCH │ │
│ │ │ │
│ │ MySQL FULLTEXT indexes on i18n tables │ │
│ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │
│ │ │ ft_ioi_title │ │ ft_ioi_scope │ │ ft_ai_name │ │ │
│ │ │ (title) │ │ (scope_and_ │ │ (authorized_ │ │ │
│ │ │ │ │ content) │ │ form_of_name) │ │ │
│ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │
│ │ ┌──────────────────┐ │ │
│ │ │ ft_ti_name │ Falls back to LIKE %term% if indexes missing │ │
│ │ │ (term name) │ │ │
│ │ └──────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────────────┐ │
│ │ LAYER 3: ELASTICSEARCH FUZZY FALLBACK │ │
│ │ │ │
│ │ Activated when SQL returns 0 results │ │
│ │ Uses multi_match with fuzziness: AUTO │ │
│ │ │ │
│ │ Fields searched: │ │
│ │ • i18n.en.title (boost: 3) │ │
│ │ • i18n.en.scopeAndContent (boost: 1) │ │
│ │ • display.creator (boost: 2) │ │
│ │ • autocomplete (boost: 2) │ │
│ │ │ │
│ │ Returns up to 200 matching IDs → rebuilds SQL query with whereIn() │ │
│ └────────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
2. Files¶
New Files¶
| File | Purpose |
|---|---|
ahgDisplayPlugin/lib/Services/FuzzySearchService.php |
Core spell-correction service |
ahgDisplayPlugin/database/fulltext_indexes.sql |
FULLTEXT index definitions |
ahgDisplayPlugin/lib/task/ahgAddFulltextIndexesTask.class.php |
CLI task to create indexes |
Modified Files¶
| File | Changes |
|---|---|
ahgDisplayPlugin/modules/display/actions/actions.class.php |
Fuzzy correction in executeBrowse(), FULLTEXT in applyTextSearchFilter(), ES fallback, new helper methods |
ahgDisplayPlugin/modules/display/templates/browseSuccess.php |
"Did you mean?" and auto-correct alert banners |
3. FuzzySearchService¶
Class Structure¶
namespace AhgDisplay\Services;
class FuzzySearchService
{
private array $vocabulary = []; // normalized => original
private array $soundexIndex = []; // soundex_code => [terms]
private array $metaphoneIndex = []; // metaphone_code => [terms]
public function loadVocabulary(): void;
public function correctQuery(string $query): array;
private function findLevenshteinMatch(string $word): ?array;
private function findPhoneticMatch(string $word): ?array;
private function buildPhoneticIndexes(): void;
}
correctQuery() Return Value¶
[
'original' => string, // Original query
'corrected' => string|null, // Corrected query (null if no correction)
'suggestion' => string|null, // Same as corrected
'confidence' => float, // 0.0 - 1.0
'corrections' => array, // Per-word correction details
'method' => string|null // 'levenshtein', 'soundex', 'metaphone', or null
]
Vocabulary Sources¶
┌─────────────────────────────────────────────────────────────────────────────────┐
│ VOCABULARY LOADING (in order) │
├─────────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. display_facet_cache │
│ ~616 terms (subjects, places, genres, creators, levels) │
│ Always available on any installation │
│ │
│ 2. ahg_thesaurus_term (try/catch) │
│ ~2,946 terms from ahgSemanticSearchPlugin │
│ Only available if semantic search plugin is installed │
│ │
│ 3. term_i18n │
│ Terms from taxonomies 35 (subject), 42 (place), 78 (genre) │
│ Core AtoM terms │
│ │
│ 4. actor_i18n │
│ Creator/authority names │
│ Enables name correction │
│ │
│ Total: ~5,000 terms │
│ Performance: PHP levenshtein() is C-implemented → <5ms per word │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
Levenshtein Thresholds¶
| Word Length | Max Edit Distance | Example |
|---|---|---|
| 1-5 chars | 2 | "musem" → "museum" (distance 1) |
| 6+ chars | 3 | "archieves" → "archives" (distance 2) |
Confidence Thresholds¶
| Confidence | Action |
|---|---|
| >= 0.9 | Auto-correct (replace query, show "Showing results for X") |
| 0.5 - 0.89 | Suggest ("Did you mean: X?") |
| < 0.5 | No suggestion shown |
4. FULLTEXT Indexes¶
Index Definitions¶
CREATE FULLTEXT INDEX ft_ioi_title ON information_object_i18n(title);
CREATE FULLTEXT INDEX ft_ioi_scope ON information_object_i18n(scope_and_content);
CREATE FULLTEXT INDEX ft_ai_name ON actor_i18n(authorized_form_of_name);
CREATE FULLTEXT INDEX ft_ti_name ON term_i18n(name);
Installation¶
The task is idempotent - checks if indexes exist before creating.
Behavior¶
- With indexes: Uses
MATCH(column) AGAINST(? IN NATURAL LANGUAGE MODE)for relevance-ranked results - Without indexes: Falls back to
LIKE %term%(exact substring matching) - Detection is cached per request via
isFulltextAvailable()static property
5. Elasticsearch Fuzzy Fallback¶
Trigger Condition¶
ES fuzzy is called only when:
1. The primary SQL query returns 0 results
2. A search query exists
3. SearchEngineFactory class is available
Query Structure¶
{
"query": {
"multi_match": {
"query": "archieves",
"fields": [
"i18n.en.title^3",
"i18n.en.scopeAndContent",
"display.creator^2",
"autocomplete^2"
],
"fuzziness": "AUTO",
"type": "best_fields"
}
},
"size": 200,
"_source": false
}
Fuzziness AUTO Behavior¶
| Term Length | Max Edits Allowed |
|---|---|
| 1-2 chars | Exact match only |
| 3-5 chars | 1 edit |
| 6+ chars | 2 edits |
6. Request Flow¶
executeBrowse()
│
├─ 1. Read query from request: $this->queryFilter
│
├─ 2. FuzzySearchService.correctQuery($this->queryFilter)
│ │
│ ├─ loadVocabulary()
│ ├─ For each word:
│ │ ├─ findLevenshteinMatch()
│ │ └─ findPhoneticMatch() (if no Levenshtein match)
│ │
│ ├─ confidence >= 0.9 → auto-correct ($this->queryFilter = corrected)
│ └─ confidence < 0.9 → set $this->didYouMean
│
├─ 3. applyTextSearchFilter($this->queryFilter)
│ │
│ ├─ isFulltextAvailable()?
│ │ ├─ YES → MATCH ... AGAINST (natural language mode)
│ │ └─ NO → LIKE %term%
│ │
│ └─ Identifier field always uses LIKE
│
├─ 4. Execute query, get $this->total
│
├─ 5. If $this->total === 0 → tryElasticsearchFuzzy()
│ │
│ ├─ class_exists('SearchEngineFactory')?
│ │ ├─ YES → multi_match with fuzziness:AUTO → get IDs
│ │ │ Rebuild query with whereIn('io.id', $esIds)
│ │ └─ NO → skip (no ES available)
│ │
│ └─ All wrapped in try/catch (ES failure never breaks browse)
│
└─ 6. Template renders alerts:
├─ $this->didYouMean → "Did you mean: X?" info alert
├─ $this->correctedQuery → "Showing results for X" success alert
└─ $this->esAssistedSearch → "Fuzzy matches" warning alert
7. Template Alerts¶
Three alert types in browseSuccess.php (no <script> or <style> tags, no CSP nonce needed):
| Variable | Alert Type | Content |
|---|---|---|
$didYouMean |
alert-info |
"Did you mean: [link]?" |
$correctedQuery |
alert-success |
"Showing results for X. Search instead for: [link]" |
$esAssistedSearch |
alert-warning |
"No exact matches. Showing fuzzy matches from search index." |
The "Search instead for" link appends &noCorrect=1 to bypass correction.
8. Graceful Degradation¶
┌──────────────────────────────┬──────────────────────────────────────────────┐
│ Scenario │ Behavior │
├──────────────────────────────┼──────────────────────────────────────────────┤
│ All layers available │ Full correction + FULLTEXT + ES fallback │
│ ES not running │ Levenshtein + FULLTEXT still work │
│ No FULLTEXT indexes │ Falls back to LIKE, Levenshtein + ES work │
│ No ahgSemanticSearchPlugin │ Vocabulary from facet_cache + term_i18n │
│ All layers unavailable │ Behaves exactly like pre-fuzzy search │
│ │ (LIKE %term%) │
└──────────────────────────────┴──────────────────────────────────────────────┘
Every layer is wrapped in try/catch. A failure in any layer never breaks the browse page.
9. CLI Commands¶
Create FULLTEXT Indexes¶
Creates the four FULLTEXT indexes. Idempotent (checks existence first). Non-blocking on MySQL 8.0.12+.
10. Configuration¶
No configuration is needed. Fuzzy search activates automatically on the GLAM Browse page. The noCorrect URL parameter is the only user-facing control.
11. Dependencies¶
| Dependency | Required | Purpose |
|---|---|---|
| MySQL 8 InnoDB | Yes | FULLTEXT indexes, Levenshtein vocabulary queries |
| Elasticsearch/OpenSearch | No | Fuzzy fallback (graceful degradation) |
| ahgSemanticSearchPlugin | No | Adds thesaurus vocabulary (graceful degradation) |
PHP levenshtein() |
Yes | Built-in PHP function (C implementation) |
PHP soundex() |
Yes | Built-in PHP function |
PHP metaphone() |
Yes | Built-in PHP function |
Part of the AtoM AHG Framework - ahgDisplayPlugin