Heratio Framework API Reference

PageIndexService
in package

PageIndex Service

Builds hierarchical JSON trees ("table of contents") from source documents and performs LLM-driven retrieval over them.

Three document types supported:

  1. EAD finding aids — parsed from information_object + related tables
  2. Uploaded PDFs — text extracted via OCR service at 192.168.0.115:5006
  3. RiC-O metadata — queried from Fuseki via SPARQL
Tags
author

The Archive and Heritage Group

Table of Contents

Methods

__construct()  : mixed
buildTree()  : array<string|int, mixed>
Build a PageIndex tree for a given object.
getStatus()  : array<string|int, mixed>|null
Get the indexing status for an object.
getTree()  : array<string|int, mixed>|null
Get the stored tree JSON for an object.
query()  : array<string|int, mixed>
Query a single PageIndex tree with a natural language query.
searchAll()  : array<string|int, mixed>
Search across all ready trees for a query.

Methods

buildTree()

Build a PageIndex tree for a given object.

public buildTree(int $objectId, string $objectType[, string $culture = 'en' ]) : array<string|int, mixed>

Sets status to 'building', extracts document content, calls LLM to build the tree, then stores it with status 'ready' or 'error'.

Parameters
$objectId : int

The information_object.id or external doc ID

$objectType : string

One of: ead, pdf, rico

$culture : string = 'en'

Language culture for i18n fields (default: en)

Return values
array<string|int, mixed>

['success' => bool, 'tree_id' => int|null, 'tree' => array|null, 'node_count' => int, 'model' => string, 'error' => string|null]

getStatus()

Get the indexing status for an object.

public getStatus(int $objectId, string $objectType) : array<string|int, mixed>|null
Parameters
$objectId : int
$objectType : string
Return values
array<string|int, mixed>|null

getTree()

Get the stored tree JSON for an object.

public getTree(int $objectId, string $objectType) : array<string|int, mixed>|null
Parameters
$objectId : int
$objectType : string
Return values
array<string|int, mixed>|null

query()

Query a single PageIndex tree with a natural language query.

public query(int $treeId, string $query[, int|null $userId = null ]) : array<string|int, mixed>
Parameters
$treeId : int

The ahg_pageindex_tree.id

$query : string

The user's search query

$userId : int|null = null

Optional user ID for logging

Return values
array<string|int, mixed>

['success' => bool, 'matches' => array, 'reasoning' => string, 'tree_path' => array, 'model' => string, 'error' => string|null]

searchAll()

Search across all ready trees for a query.

public searchAll(string $query[, string|null $objectType = null ][, int $limit = 20 ][, int|null $userId = null ]) : array<string|int, mixed>
Parameters
$query : string

The user's search query

$objectType : string|null = null

Filter by object type (ead, pdf, rico) or null for all

$limit : int = 20

Max trees to search

$userId : int|null = null

Optional user ID for logging

Return values
array<string|int, mixed>

['success' => bool, 'results' => array, 'total_matches' => int]


        
On this page

Search results