Cambrion API (1.0)
Download OpenAPI specification:Download
The official Cambrion API specification. To receive a free API key reach out at info@cambrion.de with brief description of your use-case.
Creates an execution
Create an execution from an ID (optional). If an execution ID is given that ID will be used, otherwise a new one is created. If the execution already exists, it will be ignored and 204 will be returned. If a new execution was created, 200 is returned with the execution ID as body.
Authorizations:
Request Body schema: application/json
Execution is the context which holds data related to a specific execution
executionId | string ID of the execution |
tag | string Tag to identify the execution |
createdAt | string Creation time |
object |
Responses
Request samples
- Payload
{- "executionId": "string",
- "tag": "string",
- "createdAt": "string",
- "metaData": { }
}
Response samples
- 200
- 400
- 401
{- "executionId": "string",
- "tag": "string",
- "createdAt": "string",
- "metaData": { }
}
Merge a raw observation into the current observation
The raw observation is merged into the current observation context.
Authorizations:
path Parameters
executionId required | string ID of an execution |
Request Body schema: application/json
Observation request
executionId required | string |
Array of objects (Image Content) |
Responses
Request samples
- Payload
{- "executionId": "execution-1",
- "mediaContents": [
- {
- "id": "media-1",
- "mediaId": "media-1",
- "documentPages": [
- {
- "page": 1,
- "document": {
- "id": "string",
- "tables": [
- {
- "id": "string",
- "entity": {
- "id": "string",
- "block": {
- "text": null,
- "geometry": null
}, - "confidence": 0,
- "label": "string",
- "type": "STRING",
- "data": {
- "documentId": null,
- "textValue": null,
- "quantityValue": null,
- "numberValue": null,
- "unitValue": null,
- "dateValue": null,
- "textData": null,
- "quantityData": null,
- "numberData": null,
- "unitData": null,
- "dateData": null,
- "field": null,
- "score": null,
- "sourceIndex": null
}, - "embedding": [
- null
], - "similarity": {
- "type": null,
- "cosineSimilarity": null,
- "amountDiff": null,
- "numberDiff": null,
- "same": null
}, - "layoutType": "WORD"
}, - "tag": "string",
- "headers": [
- {
- "id": null,
- "block": null,
- "confidence": null,
- "label": null,
- "type": null,
- "data": null,
- "embedding": [ ],
- "similarity": null,
- "layoutType": null
}
], - "rows": [
- {
- "id": null,
- "tag": null,
- "pairs": [ ],
- "entity": null
}
]
}
], - "entities": [
- {
- "id": "string",
- "block": {
- "text": "string",
- "geometry": {
- "polygon": null,
- "boundingBox": null
}
}, - "confidence": 0,
- "label": "string",
- "type": "STRING",
- "data": {
- "documentId": "string",
- "textValue": "string",
- "quantityValue": 0,
- "numberValue": 0,
- "unitValue": "string",
- "dateValue": "2019-08-24",
- "textData": "string",
- "quantityData": 0,
- "numberData": 0,
- "unitData": "string",
- "dateData": "2019-08-24",
- "field": "string",
- "score": 0,
- "sourceIndex": "string"
}, - "embedding": [
- 0
], - "similarity": {
- "type": "TEXT_SIM",
- "cosineSimilarity": 0,
- "amountDiff": 0,
- "numberDiff": 0,
- "same": true
}, - "layoutType": "WORD"
}
], - "keyValueSet": {
- "id": "string",
- "tag": "string",
- "pairs": [
- {
- "key": {
- "id": null,
- "block": null,
- "confidence": null,
- "label": null,
- "type": null,
- "data": null,
- "embedding": [ ],
- "similarity": null,
- "layoutType": null
}, - "entityValue": {
- "id": null,
- "block": null,
- "confidence": null,
- "label": null,
- "type": null,
- "data": null,
- "embedding": [ ],
- "similarity": null,
- "layoutType": null
}, - "keyValueSetValue": { },
- "tableValue": {
- "id": null,
- "entity": null,
- "tag": null,
- "headers": [ ],
- "rows": [ ]
}, - "tag": "string"
}
], - "entity": {
- "id": "string",
- "block": {
- "text": "string",
- "geometry": {
- "polygon": null,
- "boundingBox": null
}
}, - "confidence": 0,
- "label": "string",
- "type": "STRING",
- "data": {
- "documentId": "string",
- "textValue": "string",
- "quantityValue": 0,
- "numberValue": 0,
- "unitValue": "string",
- "dateValue": "2019-08-24",
- "textData": "string",
- "quantityData": 0,
- "numberData": 0,
- "unitData": "string",
- "dateData": "2019-08-24",
- "field": "string",
- "score": 0,
- "sourceIndex": "string"
}, - "embedding": [
- 0
], - "similarity": {
- "type": "TEXT_SIM",
- "cosineSimilarity": 0,
- "amountDiff": 0,
- "numberDiff": 0,
- "same": true
}, - "layoutType": "WORD"
}
}
}
}
], - "imageHash": "string",
- "codes": [
- {
- "id": "string",
- "entity": {
- "id": "string",
- "block": {
- "text": "string",
- "geometry": {
- "polygon": {
- "points": [
- null
]
}, - "boundingBox": {
- "width": 0,
- "height": 0,
- "left": 0,
- "top": 0
}
}
}, - "confidence": 0,
- "label": "string",
- "type": "STRING",
- "data": {
- "documentId": "string",
- "textValue": "string",
- "quantityValue": 0,
- "numberValue": 0,
- "unitValue": "string",
- "dateValue": "2019-08-24",
- "textData": "string",
- "quantityData": 0,
- "numberData": 0,
- "unitData": "string",
- "dateData": "2019-08-24",
- "field": "string",
- "score": 0,
- "sourceIndex": "string"
}, - "embedding": [
- 0
], - "similarity": {
- "type": "TEXT_SIM",
- "cosineSimilarity": 0,
- "amountDiff": 0,
- "numberDiff": 0,
- "same": true
}, - "layoutType": "WORD"
}, - "tag": "string",
- "payload": "string",
- "type": "UPC_A"
}
], - "metaData": {
- "width": 0,
- "height": 0
}, - "label": {
- "index": 0,
- "name": "string",
- "confidence": 0
}, - "rawText": "string"
}
]
}
Response samples
- 400
- 401
- 404
{- "code": "string",
- "message": "string"
}
Get observation
Get a full observation of the execution.
Authorizations:
path Parameters
executionId required | string ID of an execution |
Responses
Response samples
- 200
- 400
- 401
- 404
{- "executionId": "execution-1",
- "mediaContents": [
- {
- "id": "media-1",
- "mediaId": "media-1",
- "documentPages": [
- {
- "page": 1,
- "document": {
- "id": "string",
- "tables": [
- {
- "id": "string",
- "entity": {
- "id": "string",
- "block": {
- "text": null,
- "geometry": null
}, - "confidence": 0,
- "label": "string",
- "type": "STRING",
- "data": {
- "documentId": null,
- "textValue": null,
- "quantityValue": null,
- "numberValue": null,
- "unitValue": null,
- "dateValue": null,
- "textData": null,
- "quantityData": null,
- "numberData": null,
- "unitData": null,
- "dateData": null,
- "field": null,
- "score": null,
- "sourceIndex": null
}, - "embedding": [
- null
], - "similarity": {
- "type": null,
- "cosineSimilarity": null,
- "amountDiff": null,
- "numberDiff": null,
- "same": null
}, - "layoutType": "WORD"
}, - "tag": "string",
- "headers": [
- {
- "id": null,
- "block": null,
- "confidence": null,
- "label": null,
- "type": null,
- "data": null,
- "embedding": [ ],
- "similarity": null,
- "layoutType": null
}
], - "rows": [
- {
- "id": null,
- "tag": null,
- "pairs": [ ],
- "entity": null
}
]
}
], - "entities": [
- {
- "id": "string",
- "block": {
- "text": "string",
- "geometry": {
- "polygon": null,
- "boundingBox": null
}
}, - "confidence": 0,
- "label": "string",
- "type": "STRING",
- "data": {
- "documentId": "string",
- "textValue": "string",
- "quantityValue": 0,
- "numberValue": 0,
- "unitValue": "string",
- "dateValue": "2019-08-24",
- "textData": "string",
- "quantityData": 0,
- "numberData": 0,
- "unitData": "string",
- "dateData": "2019-08-24",
- "field": "string",
- "score": 0,
- "sourceIndex": "string"
}, - "embedding": [
- 0
], - "similarity": {
- "type": "TEXT_SIM",
- "cosineSimilarity": 0,
- "amountDiff": 0,
- "numberDiff": 0,
- "same": true
}, - "layoutType": "WORD"
}
], - "keyValueSet": {
- "id": "string",
- "tag": "string",
- "pairs": [
- {
- "key": {
- "id": null,
- "block": null,
- "confidence": null,
- "label": null,
- "type": null,
- "data": null,
- "embedding": [ ],
- "similarity": null,
- "layoutType": null
}, - "entityValue": {
- "id": null,
- "block": null,
- "confidence": null,
- "label": null,
- "type": null,
- "data": null,
- "embedding": [ ],
- "similarity": null,
- "layoutType": null
}, - "keyValueSetValue": { },
- "tableValue": {
- "id": null,
- "entity": null,
- "tag": null,
- "headers": [ ],
- "rows": [ ]
}, - "tag": "string"
}
], - "entity": {
- "id": "string",
- "block": {
- "text": "string",
- "geometry": {
- "polygon": null,
- "boundingBox": null
}
}, - "confidence": 0,
- "label": "string",
- "type": "STRING",
- "data": {
- "documentId": "string",
- "textValue": "string",
- "quantityValue": 0,
- "numberValue": 0,
- "unitValue": "string",
- "dateValue": "2019-08-24",
- "textData": "string",
- "quantityData": 0,
- "numberData": 0,
- "unitData": "string",
- "dateData": "2019-08-24",
- "field": "string",
- "score": 0,
- "sourceIndex": "string"
}, - "embedding": [
- 0
], - "similarity": {
- "type": "TEXT_SIM",
- "cosineSimilarity": 0,
- "amountDiff": 0,
- "numberDiff": 0,
- "same": true
}, - "layoutType": "WORD"
}
}
}
}
], - "imageHash": "string",
- "codes": [
- {
- "id": "string",
- "entity": {
- "id": "string",
- "block": {
- "text": "string",
- "geometry": {
- "polygon": {
- "points": [
- null
]
}, - "boundingBox": {
- "width": 0,
- "height": 0,
- "left": 0,
- "top": 0
}
}
}, - "confidence": 0,
- "label": "string",
- "type": "STRING",
- "data": {
- "documentId": "string",
- "textValue": "string",
- "quantityValue": 0,
- "numberValue": 0,
- "unitValue": "string",
- "dateValue": "2019-08-24",
- "textData": "string",
- "quantityData": 0,
- "numberData": 0,
- "unitData": "string",
- "dateData": "2019-08-24",
- "field": "string",
- "score": 0,
- "sourceIndex": "string"
}, - "embedding": [
- 0
], - "similarity": {
- "type": "TEXT_SIM",
- "cosineSimilarity": 0,
- "amountDiff": 0,
- "numberDiff": 0,
- "same": true
}, - "layoutType": "WORD"
}, - "tag": "string",
- "payload": "string",
- "type": "UPC_A"
}
], - "metaData": {
- "width": 0,
- "height": 0
}, - "label": {
- "index": 0,
- "name": "string",
- "confidence": 0
}, - "rawText": "string"
}
]
}
Transform an observation
Transform a raw observation into an object using a JSONata statement. JSONata is a transformation language for JSON data. It can be used to transform . For more information see http://docs.jsonata.org/overview.html
Authorizations:
path Parameters
executionId required | string ID of an execution |
Request Body schema: text/plain
Responses
Response samples
- 400
- 401
- 404
{- "code": "string",
- "message": "string"
}
Transform an observation into JSON
Transform a raw observation into the corresponding JSON object. The values in the JSON object correspond to the data values in the observation. If data values are not available, the raw text is used.
Authorizations:
path Parameters
executionId required | string ID of an execution |
Responses
Response samples
- 200
- 400
- 401
- 404
{ }
Link results of an execution
Link contents of observation to documents in an index.
Authorizations:
path Parameters
executionId required | string ID of an execution |
Request Body schema: application/json
Linker request
Array of objects (Match Group) | |
object | |
Array of objects (Top K Index Filter) |
Responses
Request samples
- Payload
{- "group": [
- {
- "tag": "string",
- "fields": [
- {
- "fieldName": "string",
- "clause": "MUST",
- "fuzziness": 0,
- "auto": "string",
- "filter": {
- "tag": "string",
- "label": "string",
- "regExp": "string",
- "hasData": true,
- "hasValue": true,
- "layoutType": "WORD"
}, - "collection": {
- "source": "ENTITY_TEXT"
}, - "threshold": 0,
- "num_results": 0,
- "mode": "SEARCH",
- "dimension": "EMPTY",
- "analyzer": "string"
}
], - "index": "string"
}
], - "document": { },
- "topk": [
- {
- "index": "string",
- "topk": 0
}
]
}
Response samples
- 200
- 400
- 401
- 404
[- {
- "document": { },
- "fields": [
- {
- "documentId": "string",
- "index": "string",
- "score": 0,
- "fieldName": "string",
- "textValue": "string",
- "quantityValue": 0,
- "numberValue": 0,
- "unitValue": "string",
- "dateValue": "2019-08-24T14:15:22Z",
- "textData": "string",
- "quantityData": 0,
- "numberData": 0,
- "unitData": "string",
- "dateData": "2019-08-24T14:15:22Z",
- "cosineSimilarity": 0,
- "quantityDiff": 0,
- "numberDiff": 0,
- "same": true,
- "entityId": "string"
}
], - "tag": "string"
}
]
Create new pipeline
Authorizations:
Request Body schema: application/json
Pipeline request
pipelineId | string |
name | string |
deploy | boolean Default: true Whether to deploy the pipeline when creating/updating it |
description | string |
version | integer |
object (PipelineDefinition) |
Responses
Request samples
- Payload
{- "pipelineId": "receipt-pipeline",
- "name": "receipt-pipeline",
- "deploy": true,
- "description": "A pipeline to extract contents from a receipt",
- "version": 1,
- "pipelineDefinition": {
- "pipelineDefinitionId": "receipt-pipeline-definition",
- "nodes": [
- {
- "modelId": "ocr_recognizer",
- "modelName": "ocr_recognizer",
- "modelVersion": 1,
- "modelParameters": {
- "param1": 1,
- "param2": 2
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 250
}
}, - "inputs": {
- "inputName": "info_array_ocr_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_ocr_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}, - {
- "modelId": "static_layout_recognizer",
- "modelName": "static_layout_recognizer",
- "modelVersion": 1,
- "modelParameters": {
- "targetModel": "some_model",
- "labels": [
- "date",
- "name",
- "amount"
]
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 500
}
}, - "inputs": {
- "inputName": "info_array_static_layout_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_static_layout_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}, - {
- "modelId": "entity_parser",
- "modelName": "entity_parser",
- "modelVersion": 1,
- "modelParameters": {
- "date": "DATE",
- "name": "STRING",
- "amount": "NUMBER"
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 750
}
}, - "inputs": {
- "inputName": "info_array_parser_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_parser_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}, - {
- "modelId": "entity_deduplicator",
- "modelName": "entity_deduplicator",
- "modelVersion": 1,
- "modelParameters": {
- "keys": [
- "date",
- "name",
- "amount"
]
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 1000
}
}, - "inputs": {
- "inputName": "info_array_deduplicator_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_deduplicator_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}
], - "edges": [
- {
- "id": "edge-1",
- "dataHandle": "ocr_result",
- "source": "ocr_recognizer",
- "target": "layout_recognizer",
- "sourceHandle": "info_array_ocr_output",
- "targetHandle": "info_array_static_layout_input"
}, - {
- "id": "edge-2",
- "dataHandle": "recognizer_result",
- "source": "layout_recognizer",
- "target": "entity_parser",
- "sourceHandle": "info_array_static_layout_output",
- "targetHandle": "info_array_parser_input"
}, - {
- "id": "edge-3",
- "dataHandle": "parser_result",
- "source": "entity_parser",
- "target": "entity_deduplicator",
- "sourceHandle": "info_array_parser_output",
- "targetHandle": "info_array_deduplicator_input"
}
]
}
}
Response samples
- 200
- 400
- 401
- 404
{- "pipelineId": "string"
}
Get a specific pipeline
Authorizations:
path Parameters
pipelineId required | string ID of the pipeline to execute |
Responses
Response samples
- 200
- 400
- 401
- 404
{- "pipeline": {
- "pipelineId": "string",
- "name": "string",
- "description": "string",
- "tag": "string",
- "status": "string",
- "version": 0
}, - "pipelineDefinition": {
- "pipelineDefinitionId": "receipt-pipeline-definition",
- "nodes": [
- {
- "modelId": "ocr_recognizer",
- "modelName": "ocr_recognizer",
- "modelVersion": 1,
- "modelParameters": {
- "param1": 1,
- "param2": 2
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 250
}
}, - "inputs": {
- "inputName": "info_array_ocr_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_ocr_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}, - {
- "modelId": "static_layout_recognizer",
- "modelName": "static_layout_recognizer",
- "modelVersion": 1,
- "modelParameters": {
- "targetModel": "some_model",
- "labels": [
- "date",
- "name",
- "amount"
]
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 500
}
}, - "inputs": {
- "inputName": "info_array_static_layout_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_static_layout_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}, - {
- "modelId": "entity_parser",
- "modelName": "entity_parser",
- "modelVersion": 1,
- "modelParameters": {
- "date": "DATE",
- "name": "STRING",
- "amount": "NUMBER"
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 750
}
}, - "inputs": {
- "inputName": "info_array_parser_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_parser_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}, - {
- "modelId": "entity_deduplicator",
- "modelName": "entity_deduplicator",
- "modelVersion": 1,
- "modelParameters": {
- "keys": [
- "date",
- "name",
- "amount"
]
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 1000
}
}, - "inputs": {
- "inputName": "info_array_deduplicator_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_deduplicator_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}
], - "edges": [
- {
- "id": "edge-1",
- "dataHandle": "ocr_result",
- "source": "ocr_recognizer",
- "target": "layout_recognizer",
- "sourceHandle": "info_array_ocr_output",
- "targetHandle": "info_array_static_layout_input"
}, - {
- "id": "edge-2",
- "dataHandle": "recognizer_result",
- "source": "layout_recognizer",
- "target": "entity_parser",
- "sourceHandle": "info_array_static_layout_output",
- "targetHandle": "info_array_parser_input"
}, - {
- "id": "edge-3",
- "dataHandle": "parser_result",
- "source": "entity_parser",
- "target": "entity_deduplicator",
- "sourceHandle": "info_array_parser_output",
- "targetHandle": "info_array_deduplicator_input"
}
]
}
}
Update an existing pipeline
Authorizations:
path Parameters
pipelineId required | string ID of the pipeline to execute |
Request Body schema: application/json
Pipeline request
pipelineId | string |
name | string |
deploy | boolean Default: true Whether to deploy the pipeline when creating/updating it |
description | string |
version | integer |
object (PipelineDefinition) |
Responses
Request samples
- Payload
{- "pipelineId": "receipt-pipeline",
- "name": "receipt-pipeline",
- "deploy": true,
- "description": "A pipeline to extract contents from a receipt",
- "version": 1,
- "pipelineDefinition": {
- "pipelineDefinitionId": "receipt-pipeline-definition",
- "nodes": [
- {
- "modelId": "ocr_recognizer",
- "modelName": "ocr_recognizer",
- "modelVersion": 1,
- "modelParameters": {
- "param1": 1,
- "param2": 2
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 250
}
}, - "inputs": {
- "inputName": "info_array_ocr_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_ocr_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}, - {
- "modelId": "static_layout_recognizer",
- "modelName": "static_layout_recognizer",
- "modelVersion": 1,
- "modelParameters": {
- "targetModel": "some_model",
- "labels": [
- "date",
- "name",
- "amount"
]
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 500
}
}, - "inputs": {
- "inputName": "info_array_static_layout_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_static_layout_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}, - {
- "modelId": "entity_parser",
- "modelName": "entity_parser",
- "modelVersion": 1,
- "modelParameters": {
- "date": "DATE",
- "name": "STRING",
- "amount": "NUMBER"
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 750
}
}, - "inputs": {
- "inputName": "info_array_parser_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_parser_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}, - {
- "modelId": "entity_deduplicator",
- "modelName": "entity_deduplicator",
- "modelVersion": 1,
- "modelParameters": {
- "keys": [
- "date",
- "name",
- "amount"
]
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 1000
}
}, - "inputs": {
- "inputName": "info_array_deduplicator_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_deduplicator_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}
], - "edges": [
- {
- "id": "edge-1",
- "dataHandle": "ocr_result",
- "source": "ocr_recognizer",
- "target": "layout_recognizer",
- "sourceHandle": "info_array_ocr_output",
- "targetHandle": "info_array_static_layout_input"
}, - {
- "id": "edge-2",
- "dataHandle": "recognizer_result",
- "source": "layout_recognizer",
- "target": "entity_parser",
- "sourceHandle": "info_array_static_layout_output",
- "targetHandle": "info_array_parser_input"
}, - {
- "id": "edge-3",
- "dataHandle": "parser_result",
- "source": "entity_parser",
- "target": "entity_deduplicator",
- "sourceHandle": "info_array_parser_output",
- "targetHandle": "info_array_deduplicator_input"
}
]
}
}
Response samples
- 200
- 400
- 401
- 404
{- "pipeline": {
- "pipelineId": "string",
- "name": "string",
- "description": "string",
- "tag": "string",
- "status": "string",
- "version": 0
}, - "pipelineDefinition": {
- "pipelineDefinitionId": "receipt-pipeline-definition",
- "nodes": [
- {
- "modelId": "ocr_recognizer",
- "modelName": "ocr_recognizer",
- "modelVersion": 1,
- "modelParameters": {
- "param1": 1,
- "param2": 2
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 250
}
}, - "inputs": {
- "inputName": "info_array_ocr_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_ocr_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}, - {
- "modelId": "static_layout_recognizer",
- "modelName": "static_layout_recognizer",
- "modelVersion": 1,
- "modelParameters": {
- "targetModel": "some_model",
- "labels": [
- "date",
- "name",
- "amount"
]
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 500
}
}, - "inputs": {
- "inputName": "info_array_static_layout_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_static_layout_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}, - {
- "modelId": "entity_parser",
- "modelName": "entity_parser",
- "modelVersion": 1,
- "modelParameters": {
- "date": "DATE",
- "name": "STRING",
- "amount": "NUMBER"
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 750
}
}, - "inputs": {
- "inputName": "info_array_parser_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_parser_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}, - {
- "modelId": "entity_deduplicator",
- "modelName": "entity_deduplicator",
- "modelVersion": 1,
- "modelParameters": {
- "keys": [
- "date",
- "name",
- "amount"
]
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 1000
}
}, - "inputs": {
- "inputName": "info_array_deduplicator_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_deduplicator_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}
], - "edges": [
- {
- "id": "edge-1",
- "dataHandle": "ocr_result",
- "source": "ocr_recognizer",
- "target": "layout_recognizer",
- "sourceHandle": "info_array_ocr_output",
- "targetHandle": "info_array_static_layout_input"
}, - {
- "id": "edge-2",
- "dataHandle": "recognizer_result",
- "source": "layout_recognizer",
- "target": "entity_parser",
- "sourceHandle": "info_array_static_layout_output",
- "targetHandle": "info_array_parser_input"
}, - {
- "id": "edge-3",
- "dataHandle": "parser_result",
- "source": "entity_parser",
- "target": "entity_deduplicator",
- "sourceHandle": "info_array_parser_output",
- "targetHandle": "info_array_deduplicator_input"
}
]
}
}
Get graph representation (definition) of a pipeline
Authorizations:
path Parameters
pipelineId required | string ID of the pipeline to execute |
Responses
Response samples
- 200
- 400
- 401
- 404
{- "pipelineDefinitionId": "receipt-pipeline-definition",
- "nodes": [
- {
- "modelId": "ocr_recognizer",
- "modelName": "ocr_recognizer",
- "modelVersion": 1,
- "modelParameters": {
- "param1": 1,
- "param2": 2
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 250
}
}, - "inputs": {
- "inputName": "info_array_ocr_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_ocr_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}, - {
- "modelId": "static_layout_recognizer",
- "modelName": "static_layout_recognizer",
- "modelVersion": 1,
- "modelParameters": {
- "targetModel": "some_model",
- "labels": [
- "date",
- "name",
- "amount"
]
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 500
}
}, - "inputs": {
- "inputName": "info_array_static_layout_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_static_layout_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}, - {
- "modelId": "entity_parser",
- "modelName": "entity_parser",
- "modelVersion": 1,
- "modelParameters": {
- "date": "DATE",
- "name": "STRING",
- "amount": "NUMBER"
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 750
}
}, - "inputs": {
- "inputName": "info_array_parser_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_parser_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}, - {
- "modelId": "entity_deduplicator",
- "modelName": "entity_deduplicator",
- "modelVersion": 1,
- "modelParameters": {
- "keys": [
- "date",
- "name",
- "amount"
]
}, - "canvas": {
- "position": {
- "x": 0,
- "y": 1000
}
}, - "inputs": {
- "inputName": "info_array_deduplicator_input",
- "inputShape": [
- 1
], - "inputType": "STRING"
}, - "outputs": {
- "inputName": "info_array_deduplicator_output",
- "inputShape": [
- 1
], - "inputType": "STRING"
}
}
], - "edges": [
- {
- "id": "edge-1",
- "dataHandle": "ocr_result",
- "source": "ocr_recognizer",
- "target": "layout_recognizer",
- "sourceHandle": "info_array_ocr_output",
- "targetHandle": "info_array_static_layout_input"
}, - {
- "id": "edge-2",
- "dataHandle": "recognizer_result",
- "source": "layout_recognizer",
- "target": "entity_parser",
- "sourceHandle": "info_array_static_layout_output",
- "targetHandle": "info_array_parser_input"
}, - {
- "id": "edge-3",
- "dataHandle": "parser_result",
- "source": "entity_parser",
- "target": "entity_deduplicator",
- "sourceHandle": "info_array_parser_output",
- "targetHandle": "info_array_deduplicator_input"
}
]
}
Execute pipeline synchronously
Execute a pipeline synchronously and return the corresponding observation
Authorizations:
path Parameters
pipelineId required | string ID of the pipeline to execute |
Request Body schema: application/json
Execution request for a pipeline
executionId | string ID of the execution |
tag | string Tag used to identify the resulting execution. Ignored if transient is true. |
transient | boolean Whether to delete all execution data after pipeline completion |
transform | string JSONata instruction to transform the result observation into a desired object. JSONata is a transformation language for JSON data. It can be used to transform . For more information see http://docs.jsonata.org/overview.html |
tryImageConversion | boolean Default: true Tries to convert the provided content to an image (e.g. PDF) |
trySimpleText | boolean Default: false Tries to extract readable text from input media (e.g. Word doc). A number of different file formats is supported. Internally Apache Tika is used for text extraction. A full list of supported file formats can be found here: https://tika.apache.org/2.9.1/formats.html |
idempotent | boolean Default: false Whether to update the existing observation with the results from pipeline run (always true if executionId is null) |
media | Array of strings <base64> Array of base 64 encoded media files. Content type will be detected automatically. For PDF, Docx, PPTX files the files will be rendered as images. The images can then be processed within a pipeline. |
text | string Raw text that can be used as input in pipelines |
Responses
Request samples
- Payload
{- "executionId": "string",
- "tag": "string",
- "transient": true,
- "transform": "string",
- "tryImageConversion": true,
- "trySimpleText": false,
- "idempotent": false,
- "media": [
- "string"
], - "text": "string"
}
Response samples
- 200
- 400
- 401
- 404
{- "executionId": "string",
- "observation": {
- "executionId": "execution-1",
- "mediaContents": [
- {
- "id": "media-1",
- "mediaId": "media-1",
- "documentPages": [
- {
- "page": 1,
- "document": {
- "id": "string",
- "tables": [
- {
- "id": "string",
- "entity": {
- "id": null,
- "block": null,
- "confidence": null,
- "label": null,
- "type": null,
- "data": null,
- "embedding": [ ],
- "similarity": null,
- "layoutType": null
}, - "tag": "string",
- "headers": [
- null
], - "rows": [
- null
]
}
], - "entities": [
- {
- "id": "string",
- "block": {
- "text": null,
- "geometry": null
}, - "confidence": 0,
- "label": "string",
- "type": "STRING",
- "data": {
- "documentId": null,
- "textValue": null,
- "quantityValue": null,
- "numberValue": null,
- "unitValue": null,
- "dateValue": null,
- "textData": null,
- "quantityData": null,
- "numberData": null,
- "unitData": null,
- "dateData": null,
- "field": null,
- "score": null,
- "sourceIndex": null
}, - "embedding": [
- null
], - "similarity": {
- "type": null,
- "cosineSimilarity": null,
- "amountDiff": null,
- "numberDiff": null,
- "same": null
}, - "layoutType": "WORD"
}
], - "keyValueSet": {
- "id": "string",
- "tag": "string",
- "pairs": [
- {
- "key": null,
- "entityValue": null,
- "keyValueSetValue": null,
- "tableValue": null,
- "tag": null
}
], - "entity": {
- "id": "string",
- "block": {
- "text": null,
- "geometry": null
}, - "confidence": 0,
- "label": "string",
- "type": "STRING",
- "data": {
- "documentId": null,
- "textValue": null,
- "quantityValue": null,
- "numberValue": null,
- "unitValue": null,
- "dateValue": null,
- "textData": null,
- "quantityData": null,
- "numberData": null,
- "unitData": null,
- "dateData": null,
- "field": null,
- "score": null,
- "sourceIndex": null
}, - "embedding": [
- null
], - "similarity": {
- "type": null,
- "cosineSimilarity": null,
- "amountDiff": null,
- "numberDiff": null,
- "same": null
}, - "layoutType": "WORD"
}
}
}
}
], - "imageHash": "string",
- "codes": [
- {
- "id": "string",
- "entity": {
- "id": "string",
- "block": {
- "text": "string",
- "geometry": {
- "polygon": {
- "points": [ ]
}, - "boundingBox": {
- "width": null,
- "height": null,
- "left": null,
- "top": null
}
}
}, - "confidence": 0,
- "label": "string",
- "type": "STRING",
- "data": {
- "documentId": "string",
- "textValue": "string",
- "quantityValue": 0,
- "numberValue": 0,
- "unitValue": "string",
- "dateValue": "2019-08-24",
- "textData": "string",
- "quantityData": 0,
- "numberData": 0,
- "unitData": "string",
- "dateData": "2019-08-24",
- "field": "string",
- "score": 0,
- "sourceIndex": "string"
}, - "embedding": [
- 0
], - "similarity": {
- "type": "TEXT_SIM",
- "cosineSimilarity": 0,
- "amountDiff": 0,
- "numberDiff": 0,
- "same": true
}, - "layoutType": "WORD"
}, - "tag": "string",
- "payload": "string",
- "type": "UPC_A"
}
], - "metaData": {
- "width": 0,
- "height": 0
}, - "label": {
- "index": 0,
- "name": "string",
- "confidence": 0
}, - "rawText": "string"
}
]
}
}
Transform an observation
Execute a pipeline synchronously and return the transformed observation
Authorizations:
path Parameters
pipelineId required | string ID of the pipeline to execute |
Request Body schema: application/json
Execution request for a pipeline
executionId | string ID of the execution |
tag | string Tag used to identify the resulting execution. Ignored if transient is true. |
transient | boolean Whether to delete all execution data after pipeline completion |
transform | string JSONata instruction to transform the result observation into a desired object. JSONata is a transformation language for JSON data. It can be used to transform . For more information see http://docs.jsonata.org/overview.html |
tryImageConversion | boolean Default: true Tries to convert the provided content to an image (e.g. PDF) |
trySimpleText | boolean Default: false Tries to extract readable text from input media (e.g. Word doc). A number of different file formats is supported. Internally Apache Tika is used for text extraction. A full list of supported file formats can be found here: https://tika.apache.org/2.9.1/formats.html |
idempotent | boolean Default: false Whether to update the existing observation with the results from pipeline run (always true if executionId is null) |
media | Array of strings <base64> Array of base 64 encoded media files. Content type will be detected automatically. For PDF, Docx, PPTX files the files will be rendered as images. The images can then be processed within a pipeline. |
text | string Raw text that can be used as input in pipelines |
Responses
Request samples
- Payload
{- "executionId": "string",
- "tag": "string",
- "transient": true,
- "transform": "string",
- "tryImageConversion": true,
- "trySimpleText": false,
- "idempotent": false,
- "media": [
- "string"
], - "text": "string"
}
Response samples
- 400
- 401
- 404
{- "code": "string",
- "message": "string"
}
Transform an observation
Execute a pipeline synchronously and return the corresponding JSON object.
Authorizations:
path Parameters
pipelineId required | string ID of the pipeline to execute |
Request Body schema: application/json
Execution request for a pipeline
executionId | string ID of the execution |
tag | string Tag used to identify the resulting execution. Ignored if transient is true. |
transient | boolean Whether to delete all execution data after pipeline completion |
transform | string JSONata instruction to transform the result observation into a desired object. JSONata is a transformation language for JSON data. It can be used to transform . For more information see http://docs.jsonata.org/overview.html |
tryImageConversion | boolean Default: true Tries to convert the provided content to an image (e.g. PDF) |
trySimpleText | boolean Default: false Tries to extract readable text from input media (e.g. Word doc). A number of different file formats is supported. Internally Apache Tika is used for text extraction. A full list of supported file formats can be found here: https://tika.apache.org/2.9.1/formats.html |
idempotent | boolean Default: false Whether to update the existing observation with the results from pipeline run (always true if executionId is null) |
media | Array of strings <base64> Array of base 64 encoded media files. Content type will be detected automatically. For PDF, Docx, PPTX files the files will be rendered as images. The images can then be processed within a pipeline. |
text | string Raw text that can be used as input in pipelines |
Responses
Request samples
- Payload
{- "executionId": "string",
- "tag": "string",
- "transient": true,
- "transform": "string",
- "tryImageConversion": true,
- "trySimpleText": false,
- "idempotent": false,
- "media": [
- "string"
], - "text": "string"
}
Response samples
- 200
- 400
- 401
- 404
{ }
Execute pipeline asynchronously
Authorizations:
path Parameters
pipelineId required | string ID of the pipeline to execute |
Request Body schema: application/json
Execution request for a pipeline
executionId | string ID of the execution |
tag | string Tag used to identify the resulting execution. Ignored if transient is true. |
transient | boolean Whether to delete all execution data after pipeline completion |
transform | string JSONata instruction to transform the result observation into a desired object. JSONata is a transformation language for JSON data. It can be used to transform . For more information see http://docs.jsonata.org/overview.html |
tryImageConversion | boolean Default: true Tries to convert the provided content to an image (e.g. PDF) |
trySimpleText | boolean Default: false Tries to extract readable text from input media (e.g. Word doc). A number of different file formats is supported. Internally Apache Tika is used for text extraction. A full list of supported file formats can be found here: https://tika.apache.org/2.9.1/formats.html |
idempotent | boolean Default: false Whether to update the existing observation with the results from pipeline run (always true if executionId is null) |
media | Array of strings <base64> Array of base 64 encoded media files. Content type will be detected automatically. For PDF, Docx, PPTX files the files will be rendered as images. The images can then be processed within a pipeline. |
text | string Raw text that can be used as input in pipelines |
Responses
Request samples
- Payload
{- "executionId": "string",
- "tag": "string",
- "transient": true,
- "transform": "string",
- "tryImageConversion": true,
- "trySimpleText": false,
- "idempotent": false,
- "media": [
- "string"
], - "text": "string"
}
Response samples
- 200
{- "executionId": "string"
}
Get all indices
Get a list of all indices.
Authorizations:
query Parameters
limit | integer Limits the number of indices on a page |
offset | integer Specifies the page number of the indices to be displayed |
Responses
Response samples
- 200
[- {
- "indexId": "string",
- "semanticSearchField": "string",
- "indexSchema": { }
}
]
Create index
Create a new index with an optional schema.
Authorizations:
Request Body schema: application/json
Index creation request
indexId | string Unique ID of the index. |
semanticSearchField | string This field is used to encode semantic information |
object An object that describe the data types of the document fields. This is equivalent to the Elasticsearch mappings object. For more information see https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html |
Responses
Request samples
- Payload
{- "indexId": "string",
- "semanticSearchField": "string",
- "indexSchema": { }
}
Response samples
- 400
- 401
{- "code": "string",
- "message": "string"
}
Get index description
Get the description of an index. Including the data model if present.
Authorizations:
path Parameters
indexId required | string Example: Warehouse-Index ID of the index |
Responses
Response samples
- 200
- 400
- 401
- 404
{- "indexId": "string",
- "semanticSearchField": "string",
- "indexSchema": { }
}
Query an index with a search string
Query an index with a search string
Authorizations:
path Parameters
indexId required | string Example: Warehouse-Index ID of the index |
Request Body schema: application/json
Query request
searchString | string The string that is used for the search |
k | integer The number of results to return that are similar to the search string |
Responses
Request samples
- Payload
{- "searchString": "string",
- "k": 0
}
Response samples
- 200
- 400
- 401
- 404
[- {
- "indexId": "string",
- "source": { }
}
]
Get all documents
Authorizations:
path Parameters
indexId required | string Example: Warehouse-Index ID of the index |
query Parameters
limit | integer Limits the number of documents on a page |
offset | integer Specifies the page number of the documents to be displayed |
Responses
Response samples
- 200
[- {
- "indexId": "string",
- "source": { }
}
]
Create document
Create a JSON document in an index. If the index does not exist it will be created automatically.
Authorizations:
path Parameters
indexId required | string Example: Warehouse-Index ID of the index |
Request Body schema: application/json
Document request
indexId | string |
object |
Responses
Request samples
- Payload
{- "indexId": "string",
- "source": { }
}
Response samples
- 400
- 401
- 404
{- "code": "string",
- "message": "string"
}
Register an uploaded model (currently internal only)
Authorizations:
Request Body schema: application/json
Model request
name | string |
description | string |
version | integer |
resourceTag | string |
Responses
Request samples
- Payload
{- "name": "string",
- "description": "string",
- "version": 0,
- "resourceTag": "string"
}
Response samples
- 401
- 404
{- "code": "string",
- "message": "string"
}