Metadata Extraction¶
User Guide¶
Automatically extract embedded metadata from uploaded digital objects and populate archival description fields.
Overview¶
+-------------------------------------------------------------+
| METADATA EXTRACTION |
+-------------------------------------------------------------+
| |
| UPLOAD EXTRACT MAP APPLY |
| | | | | |
| v v v v |
| File --> Read EXIF --> Match to --> Update |
| Added IPTC/XMP AtoM Fields Record |
| |
+-------------------------------------------------------------+
Supported File Types¶
+-------------------------------------------------------------+
| SUPPORTED FORMATS |
+-------------------------------------------------------------+
| |
| IMAGES - JPEG, PNG, TIFF, WebP, GIF, BMP |
| (EXIF, IPTC, XMP metadata) |
| |
| DOCUMENTS - PDF (title, author, keywords) |
| - DOCX, XLSX, PPTX (Open XML) |
| - DOC, XLS, PPT (Legacy Office) |
| |
| VIDEO - MP4, WebM, MKV, MOV, AVI, OGG |
| (duration, resolution, codec) |
| |
| AUDIO - MP3, WAV, FLAC, OGG, AAC, M4A |
| (ID3 tags, duration, bitrate) |
| |
+-------------------------------------------------------------+
How It Works¶
Automatic Extraction¶
When you upload a digital object, the plugin automatically:
- Detects the file type
- Extracts embedded metadata
- Maps metadata to AtoM fields
- Populates the archival description
Upload File
|
v
+-------------------+
| Detect File Type |
+-------------------+
|
+---> Image? ---> Extract EXIF/IPTC/XMP
|
+---> PDF? -----> Extract Document Info
|
+---> Office? --> Extract Open XML Properties
|
+---> Video? ---> Extract Media Info (FFprobe)
|
+---> Audio? ---> Extract ID3 Tags
|
v
+-------------------+
| Map to AtoM Fields|
+-------------------+
|
v
+-------------------+
| Update Record |
+-------------------+
What Gets Extracted¶
Images (EXIF/IPTC/XMP)¶
+-------------------------------------------------------------+
| IMAGE METADATA |
+-------------------------------------------------------------+
| |
| DESCRIPTIVE |
| - Title / Object Name |
| - Description / Caption |
| - Keywords / Tags |
| - Creator / Photographer |
| - Copyright Notice |
| - Date Taken |
| |
| LOCATION |
| - GPS Coordinates |
| - City, State, Country |
| - Altitude |
| |
| TECHNICAL |
| - Camera Make/Model |
| - Exposure Settings |
| - Image Dimensions |
| - Color Space |
| |
+-------------------------------------------------------------+
PDF Documents¶
+-------------------------------------------------------------+
| PDF METADATA |
+-------------------------------------------------------------+
| |
| - Title |
| - Author |
| - Subject |
| - Keywords |
| - Creator Application |
| - Producer |
| - Creation Date |
| - Modification Date |
| - Page Count |
| |
+-------------------------------------------------------------+
Office Documents (DOCX, XLSX, PPTX)¶
+-------------------------------------------------------------+
| OFFICE METADATA |
+-------------------------------------------------------------+
| |
| CORE PROPERTIES |
| - Title |
| - Creator / Author |
| - Subject |
| - Description |
| - Keywords |
| - Category |
| |
| APPLICATION PROPERTIES |
| - Application Name & Version |
| - Company |
| - Manager |
| - Total Editing Time |
| - Page/Word/Character Counts |
| - Slide Count (PPTX) |
| |
| CUSTOM PROPERTIES |
| - Any custom metadata fields defined in the document |
| |
+-------------------------------------------------------------+
Video Files¶
+-------------------------------------------------------------+
| VIDEO METADATA |
+-------------------------------------------------------------+
| |
| GENERAL |
| - Title |
| - Artist/Creator |
| - Date Created |
| - Comment |
| |
| TECHNICAL |
| - Duration (HH:MM:SS) |
| - Resolution (width x height) |
| - Frame Rate (fps) |
| - Video Codec |
| - Audio Codec |
| - Bitrate |
| - Container Format |
| |
+-------------------------------------------------------------+
Audio Files¶
+-------------------------------------------------------------+
| AUDIO METADATA |
+-------------------------------------------------------------+
| |
| ID3 TAGS |
| - Title |
| - Artist |
| - Album |
| - Year |
| - Genre |
| - Track Number |
| - Composer |
| - Publisher |
| - Copyright |
| |
| TECHNICAL |
| - Duration |
| - Bitrate |
| - Sample Rate |
| - Channels |
| - Audio Codec |
| |
+-------------------------------------------------------------+
AtoM Field Mapping¶
Extracted metadata is mapped to AtoM's archival description fields:
+----------------------+--------------------------------+
| Extracted Field | AtoM Field |
+----------------------+--------------------------------+
| Title | Title (if empty) |
| Description/Caption | Scope and Content |
| Creator/Artist | Name Access Point (Creator) |
| Keywords | Subject Access Points |
| Date Created | Event Date (Creation) |
| Copyright | Access Conditions |
| GPS Coordinates | Scope and Content (appended) |
| Technical Summary | Physical Characteristics |
+----------------------+--------------------------------+
Configuration Settings¶
How to Access Settings¶
Main Menu
|
v
Admin
|
v
AHG Settings
|
v
Metadata Extraction --------------------------------+
| |
+---> Enable/Disable Extraction |
| |
+---> Select Metadata Types |
| (EXIF, IPTC, XMP) |
| |
+---> Field Mapping Options |
| |
+---> Technical Metadata Location |
|
+------------------------------------------------------+
Available Settings¶
+-------------------------------------------------------------+
| EXTRACTION SETTINGS |
+-------------------------------------------------------------+
| |
| [x] Enable metadata extraction |
| |
| METADATA TYPES |
| [x] Extract EXIF data |
| [x] Extract IPTC data |
| [x] Extract XMP data |
| |
| FIELD BEHAVIOR |
| [ ] Overwrite existing title |
| [ ] Overwrite existing description |
| [x] Auto-generate keywords from metadata |
| [x] Extract GPS coordinates |
| |
| TECHNICAL METADATA |
| [x] Add technical metadata summary |
| Target field: [Physical Characteristics v] |
| |
| [Save Settings] |
+-------------------------------------------------------------+
Setting Descriptions¶
| Setting | Default | Description |
|---|---|---|
| Enable extraction | On | Master switch for metadata extraction |
| Extract EXIF | On | Extract camera/technical data from images |
| Extract IPTC | On | Extract editorial metadata from images |
| Extract XMP | On | Extract Adobe XMP metadata |
| Overwrite title | Off | Replace existing title with extracted title |
| Overwrite description | Off | Replace existing description |
| Auto-generate keywords | On | Create subject access points from keywords |
| Extract GPS | On | Extract and store GPS coordinates |
| Add technical metadata | On | Add technical summary to record |
| Target field | Physical Characteristics | Where to store technical metadata |
Viewing Extracted Metadata¶
After upload, extracted metadata appears in the record:
Title and Description¶
+-------------------------------------------------------------+
| ARCHIVAL DESCRIPTION |
+-------------------------------------------------------------+
| |
| Title: [Extracted from metadata if record was untitled] |
| |
| Scope and Content: |
| [Original description] |
| |
| [Extracted description/caption if available] |
| |
+-------------------------------------------------------------+
Physical Characteristics (Technical Metadata)¶
+-------------------------------------------------------------+
| PHYSICAL CHARACTERISTICS |
+-------------------------------------------------------------+
| |
| === FILE INFO === |
| Name: photograph_001.jpg |
| Size: 2.4 MB |
| Type: image/jpeg |
| |
| === IMAGE === |
| Dimensions: 4032 x 3024 pixels |
| Camera: Canon EOS 5D Mark IV |
| Date: 2025-06-15 |
| |
| === GPS === |
| Coordinates: -33.918861, 18.423300 |
| Altitude: 42m |
| |
+-------------------------------------------------------------+
Access Points¶
+-------------------------------------------------------------+
| ACCESS POINTS |
+-------------------------------------------------------------+
| |
| Name Access Points: |
| - John Smith (Creator) |
| |
| Subject Access Points: |
| - Architecture |
| - Historical Buildings |
| - Cape Town |
| |
+-------------------------------------------------------------+
Metadata Extraction Dashboard¶
Accessing the Dashboard¶
Navigate to /metadataExtraction to access the metadata extraction management interface.
┌──────────────────────────────────────────────────────────────┐
│ METADATA EXTRACTION DASHBOARD │
├──────────────────────────────────────────────────────────────┤
│ │
│ [Status] [Batch Extract] │
│ │
│ FILTERS │
│ MIME Type: [All types v] │
│ Has Metadata: [All v] [Filter] [Clear] │
│ │
├──────────────────────────────────────────────────────────────┤
│ ID │ File Name │ MIME Type │ Size │ Record │ Actions │
│───────┼─────────────┼────────────┼──────┼─────────┼─────────│
│ 1 │ photo.jpg │ image/jpeg │ 2MB │ Title │ [View] │
│ 2 │ doc.pdf │ app/pdf │ 1MB │ Title │ [View] │
│ │
└──────────────────────────────────────────────────────────────┘
Dashboard Features¶
| Feature | Description |
|---|---|
| MIME Type Filter | Filter by file type (image, PDF, video, etc.) |
| Extraction Status Filter | Show only extracted/not extracted objects |
| Batch Extract | Process up to 50 unextracted objects at once |
| Status Page | View ExifTool status and extraction statistics |
| View Metadata | See all extracted fields for any object |
Status Page¶
The status page (/metadataExtraction/status) shows:
┌──────────────────────────────────────────────────────────────┐
│ SYSTEM STATUS │
├──────────────────────────────────────────────────────────────┤
│ │
│ ExifTool Status │
│ ├── Installed: [✓ Yes] │
│ └── Version: 12.70 │
│ │
│ Extraction Statistics │
│ ├── Total Digital Objects: 1,234 │
│ ├── Objects with Metadata: 890 (72%) │
│ ├── Total Metadata Fields: 45,678 │
│ └── Average Fields per Object: 51 │
│ │
│ MIME Type Breakdown │
│ ├── image/jpeg: 800 [Supported ✓] │
│ ├── application/pdf: 300 [Supported ✓] │
│ └── video/mp4: 134 [Supported ✓] │
│ │
└──────────────────────────────────────────────────────────────┘
Manual Extraction¶
From the Dashboard¶
- Navigate to
/metadataExtraction - Find the digital object in the list
- Click the Extract button (download icon)
- Metadata is extracted and saved automatically
From the Detail View¶
- Navigate to
/metadataExtraction/view/:id - Click Extract Metadata button
- View extracted metadata grouped by category (EXIF, IPTC, XMP, etc.)
Batch Extraction¶
- Navigate to
/metadataExtraction - Click Batch Extract button
- Up to 50 unextracted objects are processed
- Repeat if more objects remain
Viewing Extracted Metadata¶
Metadata Detail View¶
Navigate to /metadataExtraction/view/:id to see all extracted metadata:
┌──────────────────────────────────────────────────────────────┐
│ EXTRACTED METADATA │
├──────────────────────────────────────────────────────────────┤
│ │
│ Digital Object Details │
│ ├── ID: 123 │
│ ├── File Name: photograph_001.jpg │
│ ├── MIME Type: image/jpeg │
│ ├── Size: 2.4 MB │
│ └── Parent Record: Smith Family Papers │
│ │
│ [Extract Metadata] [Delete Metadata] │
│ │
├──────────────────────────────────────────────────────────────┤
│ ▼ EXIF (32 fields) │
│ ├── Make: Canon │
│ ├── Model: EOS 5D Mark IV │
│ ├── ExposureTime: 1/250 │
│ └── ... │
│ │
│ ▶ IPTC (8 fields) │
│ ▶ XMP (12 fields) │
│ ▶ File (5 fields) │
│ │
└──────────────────────────────────────────────────────────────┘
Metadata is organized into collapsible sections by source (EXIF, IPTC, XMP, File, etc.).
Troubleshooting¶
Common Issues¶
+--------------------+----------------------------------------+
| Issue | Solution |
+--------------------+----------------------------------------+
| No metadata | Check if file contains embedded |
| extracted | metadata. Not all files have metadata. |
+--------------------+----------------------------------------+
| Missing EXIF | Ensure PHP EXIF extension is enabled. |
| | Check with: php -m | grep exif |
+--------------------+----------------------------------------+
| Video metadata | Install FFprobe: |
| not extracting | sudo apt install ffmpeg |
+--------------------+----------------------------------------+
| ExifTool errors | Install ExifTool: |
| | sudo apt install libimage-exiftool-perl|
+--------------------+----------------------------------------+
| GPS not appearing | Check if GPS extraction is enabled |
| | in settings. |
+--------------------+----------------------------------------+
| Keywords not | Ensure auto-generate keywords is |
| created | enabled in settings. |
+--------------------+----------------------------------------+
Checking System Requirements¶
# Check PHP EXIF extension
php -m | grep exif
# Check ExifTool
which exiftool
exiftool -ver
# Check FFprobe (for video/audio)
which ffprobe
ffprobe -version
Best Practices¶
Before Upload¶
+--------------------------------+--------------------------------+
| DO | DON'T |
+--------------------------------+--------------------------------+
| Embed metadata in files | Strip metadata before upload |
| before uploading | |
+--------------------------------+--------------------------------+
| Use standard metadata | Use proprietary metadata |
| formats (EXIF, IPTC, XMP) | formats |
+--------------------------------+--------------------------------+
| Include descriptive titles | Leave metadata fields empty |
| and keywords | |
+--------------------------------+--------------------------------+
| Add copyright information | Assume copyright will be |
| | added later |
+--------------------------------+--------------------------------+
Setting Up¶
- Configure extraction settings before bulk uploads
- Test with sample files to verify field mapping
- Decide whether to overwrite existing fields
Regular Use¶
- Review extracted metadata after upload
- Supplement with additional description as needed
- Check GPS coordinates for sensitive locations
Privacy Considerations¶
GPS Data¶
+-------------------------------------------------------------+
| GPS PRIVACY WARNING |
+-------------------------------------------------------------+
| |
| Photos from mobile devices often contain GPS coordinates. |
| This can reveal: |
| - Home addresses |
| - Work locations |
| - Travel patterns |
| |
| RECOMMENDATION: Review GPS data before publishing records. |
| Disable GPS extraction if location data is sensitive. |
| |
+-------------------------------------------------------------+
Personal Information¶
- Creator names may identify individuals
- Keywords may contain sensitive terms
- Document properties may reveal organization details
Use Cases¶
Photograph Collections¶
- Automatically capture photographer name
- Extract camera and exposure information
- Map GPS coordinates to locate image subjects
- Populate keywords from IPTC tags
Document Archives¶
- Extract author and title information
- Capture creation and modification dates
- Identify creating application
Audio/Visual Archives¶
- Record duration and technical specifications
- Extract title and artist information
- Document codec and quality settings
Need Help?¶
Contact your system administrator if you experience issues with metadata extraction.
Part of the AtoM AHG Framework