PageIndexService
in package
PageIndex Service
Builds hierarchical JSON trees ("table of contents") from source documents and performs LLM-driven retrieval over them.
Three document types supported:
- EAD finding aids — parsed from information_object + related tables
- Uploaded PDFs — text extracted via OCR service at 192.168.0.115:5006
- RiC-O metadata — queried from Fuseki via SPARQL
Tags
Table of Contents
Methods
- __construct() : mixed
- buildTree() : array<string|int, mixed>
- Build a PageIndex tree for a given object.
- getStatus() : array<string|int, mixed>|null
- Get the indexing status for an object.
- getTree() : array<string|int, mixed>|null
- Get the stored tree JSON for an object.
- query() : array<string|int, mixed>
- Query a single PageIndex tree with a natural language query.
- searchAll() : array<string|int, mixed>
- Search across all ready trees for a query.
Methods
__construct()
public
__construct([OllamaPageIndexClient|null $llmClient = null ]) : mixed
Parameters
- $llmClient : OllamaPageIndexClient|null = null
buildTree()
Build a PageIndex tree for a given object.
public
buildTree(int $objectId, string $objectType[, string $culture = 'en' ]) : array<string|int, mixed>
Sets status to 'building', extracts document content, calls LLM to build the tree, then stores it with status 'ready' or 'error'.
Parameters
- $objectId : int
-
The information_object.id or external doc ID
- $objectType : string
-
One of: ead, pdf, rico
- $culture : string = 'en'
-
Language culture for i18n fields (default: en)
Return values
array<string|int, mixed> —['success' => bool, 'tree_id' => int|null, 'tree' => array|null, 'node_count' => int, 'model' => string, 'error' => string|null]
getStatus()
Get the indexing status for an object.
public
getStatus(int $objectId, string $objectType) : array<string|int, mixed>|null
Parameters
- $objectId : int
- $objectType : string
Return values
array<string|int, mixed>|nullgetTree()
Get the stored tree JSON for an object.
public
getTree(int $objectId, string $objectType) : array<string|int, mixed>|null
Parameters
- $objectId : int
- $objectType : string
Return values
array<string|int, mixed>|nullquery()
Query a single PageIndex tree with a natural language query.
public
query(int $treeId, string $query[, int|null $userId = null ]) : array<string|int, mixed>
Parameters
- $treeId : int
-
The ahg_pageindex_tree.id
- $query : string
-
The user's search query
- $userId : int|null = null
-
Optional user ID for logging
Return values
array<string|int, mixed> —['success' => bool, 'matches' => array, 'reasoning' => string, 'tree_path' => array, 'model' => string, 'error' => string|null]
searchAll()
Search across all ready trees for a query.
public
searchAll(string $query[, string|null $objectType = null ][, int $limit = 20 ][, int|null $userId = null ]) : array<string|int, mixed>
Parameters
- $query : string
-
The user's search query
- $objectType : string|null = null
-
Filter by object type (ead, pdf, rico) or null for all
- $limit : int = 20
-
Max trees to search
- $userId : int|null = null
-
Optional user ID for logging
Return values
array<string|int, mixed> —['success' => bool, 'results' => array, 'total_matches' => int]