Expanded Document Fields
When data is extracted from an ID document through OCR, data is returned within a document fields object as detailed here.
In addition to document fields, an expanded set of fields can be returned, providing a full list of each extracted field, the source of the extraction, and any corresponding transliterated values.
The following transliterations are currently supported:
- Cyrillic and Arabic families of languages
- Chinese
- Korean
- Vietnamese
- Turkish languages
Configure Expanded Document Fields
Expanded fields are not returned by default. This must be added to the ID Document Text Extraction task.
For non-latin documents you must enable them separately, see here for guidance
new RequestedTextExtractionTaskBuilder()
.withCreateExpandedDocumentFields(true) // default is false
.build()
Retrieving Expanded Document Fields
The expanded document fields object will not always be present. Expanded fields will only be available within the session when a document has been captured and the OCR (Optical character recognition) has been successful.
Data which is keyed in manually will not be returned in the expanded fields section, this will continue to be returned only within the standard document fields payload.
//Retrieve Expanded Document Fields Media ID,
idvClient.getSession(sessionId)
.then((session) => {
// Returns all resources in the session
const resources = session.getResources();
// Returns a collection of ID Documents
const idDocuments = resources.getIdDocuments();
idDocuments.map((idDocument) => {
const expandedFields = document.getExpandedDocumentFields()
const expandedFieldsMediaId = expandedFields.getMedia().getId();
});
});
// Retrieve data
idvClient.getMediaContent(sessionId, expandedFieldsMediaId).then(media => {
const buffer = media.getContent();
const jsonData = JSON.parse(buffer);
// handle jsonData here
}).catch(error => {
// handle error
})
Response
A JSON object will be returned when retrieving the expanded document fields media. Some documents will have multiple sources, eg a VIZ and a barcode. Fields that are in both sources will be returned separately with the source available in the response.
{
"fields": [
{
"name": "date_of_birth", //the name of the field
"value": "1970-01-01", //the contents of the field
"locale": "la", //the language/script the field is in on the document
"source": "VIZ" //where this field has been extracted from // VIZ || BARCODE || MRZ
"is_non_latin": true // if the field includes non-latin characters
"is_transliteration": true // if field is transliterated into latin script
},
{
"name": "full_name",
"value": "MELISSA PETERSON",
"locale": "la",
"source": "MRZ"
},
]
}
Values
Field | Description | Always present |
---|---|---|
name | The name of the field. A full list is available here | ✅ |
value | The data from the document field | ✅ |
locale | The locale of the field - for any latin script fields this will be "la", for non-latin script this will return the detailed locale eg "ja-JP" for Japan | ✅ |
source | Where on the document this field has been returned, this can be VIZ, MRZ or BARCODE | ✅ |
is_non_latin | This will be present and true if the field is in a non-latin script | ❌ |
is_transliteration | We can transliterate some non-latin fields. This will return the values of the field into latin script | ❌ |
Examples Document Extractions
See the below code blocks for examples of the Expanded Document Fields. The Document Fields JSON is also available at the bottom to compare.
You may use the following keyboard shortcuts to expand or collapse:
Expand: Ctrl + I
Collapse: Ctrl + Y
expandedDocumentFields
{ }
documentFields:
{ }