Cambrion API (1.0)

Download OpenAPI specification:Download

The official Cambrion API specification. To receive a free API key reach out at hello@cambrion.ai with brief description of your use-case.

Executions

Execution environment that store results.

Find out more

Creates an execution

Create an execution from an ID (optional). If an execution ID is given that ID will be used, otherwise a new one is created. If the execution already exists, it will be ignored and 204 will be returned. If a new execution was created, 200 is returned with the execution ID as body.

Authorizations:

ApiKeyAuth

Request Body schema: application/json

Execution is the context which holds data related to a specific execution

executionId	string ID of the execution
tag	string Tag to identify the execution
createdAt	string Creation time
	object

Responses

Request samples

Payload

Content type

application/json

{"executionId": "string",
"tag": "string",
"createdAt": "string",
"metaData": { }
}

Response samples

200
400
401

Content type

application/json

{"executionId": "string",
"tag": "string",
"createdAt": "string",
"metaData": { }
}

Gets all executions

Authorizations:

ApiKeyAuth

query Parameters

tag	string Filter executions by tag

Responses

Response samples

200
400
401

Content type

application/json

[{"executionId": "string",
"tag": "string",
"createdAt": "string",
"metaData": { }
}
]

Gets execution

Authorizations:

ApiKeyAuth

path Parameters

executionId

required

string

ID of an execution

Responses

Response samples

200
400
401

Content type

application/json

{"executionId": "string",
"tag": "string",
"createdAt": "string",
"metaData": { }
}

Add media to an observation

Authorizations:

ApiKeyAuth

path Parameters

executionId

required

string

ID of an execution

Request Body schema:
image/jpeg

string <base64>

Responses

Response samples

200
400
401
404

Content type

application/json

{"mediaId": "string"
}

Retrieve a specific media

Authorizations:

ApiKeyAuth

path Parameters

executionId required	string ID of an execution
mediaId required	string ID of uploaded media

Responses

Merge a raw observation into the current observation

The raw observation is merged into the current observation context.

Authorizations:

ApiKeyAuth

path Parameters

executionId

required

string

ID of an execution

Request Body schema: application/json

Observation request

executionId required	string
	Array of objects (Image Content)
	Array of objects (Linked Document)

Responses

Request samples

Payload

Content type

application/json

{"executionId": "execution-1",
"mediaContents": [{"id": "media-1",
"mediaId": "media-1",
"documentPages": [{"page": 1,
"document": {"id": "string",
"tables": [{"id": "string",
"entity": {"id": "string",
"block": {"text": null,
"geometry": null
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": null,
"textValue": null,
"quantityValue": null,
"numberValue": null,
"unitValue": null,
"dateValue": null,
"textData": null,
"quantityData": null,
"numberData": null,
"unitData": null,
"dateData": null,
"field": null,
"score": null,
"sourceIndex": null
},
"embedding": [null
],
"similarity": {"type": null,
"cosineSimilarity": null,
"amountDiff": null,
"numberDiff": null,
"same": null
},
"layoutType": "WORD"
},
"tag": "string",
"headers": [{"id": null,
"block": null,
"confidence": null,
"label": null,
"type": null,
"data": null,
"embedding": [ ],
"similarity": null,
"layoutType": null
}
],
"rows": [{"id": null,
"tag": null,
"pairs": [ ],
"entity": null
}
]
}
],
"entities": [{"id": "string",
"block": {"text": "string",
"geometry": {"polygon": null,
"boundingBox": null
}
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24",
"field": "string",
"score": 0,
"sourceIndex": "string"
},
"embedding": [0
],
"similarity": {"type": "TEXT_SIM",
"cosineSimilarity": 0,
"amountDiff": 0,
"numberDiff": 0,
"same": true
},
"layoutType": "WORD"
}
],
"keyValueSet": {"id": "string",
"tag": "string",
"pairs": [{"key": {"id": null,
"block": null,
"confidence": null,
"label": null,
"type": null,
"data": null,
"embedding": [ ],
"similarity": null,
"layoutType": null
},
"entityValue": {"id": null,
"block": null,
"confidence": null,
"label": null,
"type": null,
"data": null,
"embedding": [ ],
"similarity": null,
"layoutType": null
},
"keyValueSetValue": { },
"tableValue": {"id": null,
"entity": null,
"tag": null,
"headers": [ ],
"rows": [ ]
},
"tag": "string"
}
],
"entity": {"id": "string",
"block": {"text": "string",
"geometry": {"polygon": null,
"boundingBox": null
}
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24",
"field": "string",
"score": 0,
"sourceIndex": "string"
},
"embedding": [0
],
"similarity": {"type": "TEXT_SIM",
"cosineSimilarity": 0,
"amountDiff": 0,
"numberDiff": 0,
"same": true
},
"layoutType": "WORD"
}
}
}
}
],
"imageHash": "string",
"codes": [{"id": "string",
"entity": {"id": "string",
"block": {"text": "string",
"geometry": {"polygon": {"points": [null
]
},
"boundingBox": {"width": 0,
"height": 0,
"left": 0,
"top": 0
}
}
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24",
"field": "string",
"score": 0,
"sourceIndex": "string"
},
"embedding": [0
],
"similarity": {"type": "TEXT_SIM",
"cosineSimilarity": 0,
"amountDiff": 0,
"numberDiff": 0,
"same": true
},
"layoutType": "WORD"
},
"tag": "string",
"payload": "string",
"type": "UPC_A"
}
],
"metaData": {"width": 0,
"height": 0
},
"label": {"index": 0,
"name": "string",
"confidence": 0
},
"rawText": "string"
}
],
"documents": [{"document": { },
"fields": [{"documentId": "string",
"index": "string",
"score": 0,
"fieldName": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24T14:15:22Z",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24T14:15:22Z",
"cosineSimilarity": 0,
"quantityDiff": 0,
"numberDiff": 0,
"same": true,
"entityId": "string"
}
],
"tag": "string",
"score": 0
}
]
}

Response samples

400
401
404

Content type

application/json

{"code": "string",
"message": "string"
}

Get observation

Get a full observation of the execution.

Authorizations:

ApiKeyAuth

path Parameters

executionId

required

string

ID of an execution

Responses

Response samples

200
400
401
404

Content type

application/json

{"executionId": "execution-1",
"mediaContents": [{"id": "media-1",
"mediaId": "media-1",
"documentPages": [{"page": 1,
"document": {"id": "string",
"tables": [{"id": "string",
"entity": {"id": "string",
"block": {"text": null,
"geometry": null
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": null,
"textValue": null,
"quantityValue": null,
"numberValue": null,
"unitValue": null,
"dateValue": null,
"textData": null,
"quantityData": null,
"numberData": null,
"unitData": null,
"dateData": null,
"field": null,
"score": null,
"sourceIndex": null
},
"embedding": [null
],
"similarity": {"type": null,
"cosineSimilarity": null,
"amountDiff": null,
"numberDiff": null,
"same": null
},
"layoutType": "WORD"
},
"tag": "string",
"headers": [{"id": null,
"block": null,
"confidence": null,
"label": null,
"type": null,
"data": null,
"embedding": [ ],
"similarity": null,
"layoutType": null
}
],
"rows": [{"id": null,
"tag": null,
"pairs": [ ],
"entity": null
}
]
}
],
"entities": [{"id": "string",
"block": {"text": "string",
"geometry": {"polygon": null,
"boundingBox": null
}
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24",
"field": "string",
"score": 0,
"sourceIndex": "string"
},
"embedding": [0
],
"similarity": {"type": "TEXT_SIM",
"cosineSimilarity": 0,
"amountDiff": 0,
"numberDiff": 0,
"same": true
},
"layoutType": "WORD"
}
],
"keyValueSet": {"id": "string",
"tag": "string",
"pairs": [{"key": {"id": null,
"block": null,
"confidence": null,
"label": null,
"type": null,
"data": null,
"embedding": [ ],
"similarity": null,
"layoutType": null
},
"entityValue": {"id": null,
"block": null,
"confidence": null,
"label": null,
"type": null,
"data": null,
"embedding": [ ],
"similarity": null,
"layoutType": null
},
"keyValueSetValue": { },
"tableValue": {"id": null,
"entity": null,
"tag": null,
"headers": [ ],
"rows": [ ]
},
"tag": "string"
}
],
"entity": {"id": "string",
"block": {"text": "string",
"geometry": {"polygon": null,
"boundingBox": null
}
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24",
"field": "string",
"score": 0,
"sourceIndex": "string"
},
"embedding": [0
],
"similarity": {"type": "TEXT_SIM",
"cosineSimilarity": 0,
"amountDiff": 0,
"numberDiff": 0,
"same": true
},
"layoutType": "WORD"
}
}
}
}
],
"imageHash": "string",
"codes": [{"id": "string",
"entity": {"id": "string",
"block": {"text": "string",
"geometry": {"polygon": {"points": [null
]
},
"boundingBox": {"width": 0,
"height": 0,
"left": 0,
"top": 0
}
}
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24",
"field": "string",
"score": 0,
"sourceIndex": "string"
},
"embedding": [0
],
"similarity": {"type": "TEXT_SIM",
"cosineSimilarity": 0,
"amountDiff": 0,
"numberDiff": 0,
"same": true
},
"layoutType": "WORD"
},
"tag": "string",
"payload": "string",
"type": "UPC_A"
}
],
"metaData": {"width": 0,
"height": 0
},
"label": {"index": 0,
"name": "string",
"confidence": 0
},
"rawText": "string"
}
],
"documents": [{"document": { },
"fields": [{"documentId": "string",
"index": "string",
"score": 0,
"fieldName": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24T14:15:22Z",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24T14:15:22Z",
"cosineSimilarity": 0,
"quantityDiff": 0,
"numberDiff": 0,
"same": true,
"entityId": "string"
}
],
"tag": "string",
"score": 0
}
]
}

Transform an observation

Transform a raw observation into an object using a JSONata statement. JSONata is a transformation language for JSON data. It can be used to transform . For more information see http://docs.jsonata.org/overview.html

Authorizations:

ApiKeyAuth

path Parameters

executionId

required

string

ID of an execution

Request Body schema: text/plain

string

Responses

Response samples

400
401
404

Content type

application/json

{"code": "string",
"message": "string"
}

Transform an observation into JSON

Transform a raw observation into the corresponding JSON object. The values in the JSON object correspond to the data values in the observation. If data values are not available, the raw text is used.

Authorizations:

ApiKeyAuth

path Parameters

executionId

required

string

ID of an execution

Responses

Response samples

200
400
401
404

Content type

application/json

{ }

Link results of an execution

Link contents of observation to documents in an index.

Authorizations:

ApiKeyAuth

path Parameters

executionId

required

string

ID of an execution

Request Body schema: application/json

Linker request

	Array of objects (Match Group)
	object
	Array of objects (Top K Index Filter)

Responses

Request samples

Payload

Content type

application/json

{"group": [{"tag": "string",
"fields": [{"fieldName": "string",
"clause": "MUST",
"fuzziness": 0,
"auto": "string",
"filter": {"tag": "string",
"label": "string",
"regExp": "string",
"hasData": true,
"hasValue": true,
"layoutType": "WORD"
},
"collection": {"source": "ENTITY_TEXT"
},
"threshold": 0,
"num_results": 0,
"mode": "SEARCH",
"dimension": "EMPTY",
"analyzer": "string"
}
],
"index": "string"
}
],
"document": { },
"topk": [{"index": "string",
"topk": 0
}
]
}

Response samples

200
400
401
404

Content type

application/json

[{"document": { },
"fields": [{"documentId": "string",
"index": "string",
"score": 0,
"fieldName": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24T14:15:22Z",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24T14:15:22Z",
"cosineSimilarity": 0,
"quantityDiff": 0,
"numberDiff": 0,
"same": true,
"entityId": "string"
}
],
"tag": "string",
"score": 0
}
]

Pipelines

Machine learning pipelines.

Find out more in the documentation

Get all deployed pipelines

Authorizations:

ApiKeyAuth

Responses

Response samples

200
401

Content type

application/json

[{"pipelineId": "string",
"name": "string",
"description": "string",
"tag": "string",
"status": "string",
"version": 0
}
]

Create new pipeline

Authorizations:

ApiKeyAuth

Request Body schema: application/json

Pipeline request

pipelineId	string
name	string
deploy	boolean Default: true Whether to deploy the pipeline when creating/updating it
description	string
tag	string
version	integer
	object (PipelineDefinition)

Responses

Request samples

Payload

Content type

application/json

{"pipelineId": "receipt-pipeline",
"name": "receipt-pipeline",
"deploy": true,
"description": "A pipeline to extract contents from a receipt",
"tag": "string",
"version": 1,
"pipelineDefinition": {"pipelineDefinitionId": "receipt-pipeline-definition",
"nodes": [{"modelId": "ocr_recognizer",
"modelName": "ocr_recognizer",
"modelVersion": 1,
"modelParameters": {"param1": 1,
"param2": 2
},
"canvas": {"position": {"x": 0,
"y": 250
}
},
"inputs": {"inputName": "info_array_ocr_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_ocr_output",
"inputShape": [1
],
"inputType": "STRING"
}
},
{"modelId": "static_layout_recognizer",
"modelName": "static_layout_recognizer",
"modelVersion": 1,
"modelParameters": {"targetModel": "some_model",
"labels": ["date",
"name",
"amount"
]
},
"canvas": {"position": {"x": 0,
"y": 500
}
},
"inputs": {"inputName": "info_array_static_layout_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_static_layout_output",
"inputShape": [1
],
"inputType": "STRING"
}
},
{"modelId": "entity_parser",
"modelName": "entity_parser",
"modelVersion": 1,
"modelParameters": {"date": "DATE",
"name": "STRING",
"amount": "NUMBER"
},
"canvas": {"position": {"x": 0,
"y": 750
}
},
"inputs": {"inputName": "info_array_parser_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_parser_output",
"inputShape": [1
],
"inputType": "STRING"
}
},
{"modelId": "entity_deduplicator",
"modelName": "entity_deduplicator",
"modelVersion": 1,
"modelParameters": {"keys": ["date",
"name",
"amount"
]
},
"canvas": {"position": {"x": 0,
"y": 1000
}
},
"inputs": {"inputName": "info_array_deduplicator_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_deduplicator_output",
"inputShape": [1
],
"inputType": "STRING"
}
}
],
"edges": [{"id": "edge-1",
"dataHandle": "ocr_result",
"source": "ocr_recognizer",
"target": "layout_recognizer",
"sourceHandle": "info_array_ocr_output",
"targetHandle": "info_array_static_layout_input"
},
{"id": "edge-2",
"dataHandle": "recognizer_result",
"source": "layout_recognizer",
"target": "entity_parser",
"sourceHandle": "info_array_static_layout_output",
"targetHandle": "info_array_parser_input"
},
{"id": "edge-3",
"dataHandle": "parser_result",
"source": "entity_parser",
"target": "entity_deduplicator",
"sourceHandle": "info_array_parser_output",
"targetHandle": "info_array_deduplicator_input"
}
]
}
}

Response samples

200
400
401
404

Content type

application/json

{"pipelineId": "string"
}

Get a specific pipeline

Authorizations:

ApiKeyAuth

path Parameters

pipelineId

required

string

ID of the pipeline to execute

Responses

Response samples

200
400
401
404

Content type

application/json

{"pipeline": {"pipelineId": "string",
"name": "string",
"description": "string",
"tag": "string",
"status": "string",
"version": 0
},
"pipelineDefinition": {"pipelineDefinitionId": "receipt-pipeline-definition",
"nodes": [{"modelId": "ocr_recognizer",
"modelName": "ocr_recognizer",
"modelVersion": 1,
"modelParameters": {"param1": 1,
"param2": 2
},
"canvas": {"position": {"x": 0,
"y": 250
}
},
"inputs": {"inputName": "info_array_ocr_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_ocr_output",
"inputShape": [1
],
"inputType": "STRING"
}
},
{"modelId": "static_layout_recognizer",
"modelName": "static_layout_recognizer",
"modelVersion": 1,
"modelParameters": {"targetModel": "some_model",
"labels": ["date",
"name",
"amount"
]
},
"canvas": {"position": {"x": 0,
"y": 500
}
},
"inputs": {"inputName": "info_array_static_layout_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_static_layout_output",
"inputShape": [1
],
"inputType": "STRING"
}
},
{"modelId": "entity_parser",
"modelName": "entity_parser",
"modelVersion": 1,
"modelParameters": {"date": "DATE",
"name": "STRING",
"amount": "NUMBER"
},
"canvas": {"position": {"x": 0,
"y": 750
}
},
"inputs": {"inputName": "info_array_parser_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_parser_output",
"inputShape": [1
],
"inputType": "STRING"
}
},
{"modelId": "entity_deduplicator",
"modelName": "entity_deduplicator",
"modelVersion": 1,
"modelParameters": {"keys": ["date",
"name",
"amount"
]
},
"canvas": {"position": {"x": 0,
"y": 1000
}
},
"inputs": {"inputName": "info_array_deduplicator_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_deduplicator_output",
"inputShape": [1
],
"inputType": "STRING"
}
}
],
"edges": [{"id": "edge-1",
"dataHandle": "ocr_result",
"source": "ocr_recognizer",
"target": "layout_recognizer",
"sourceHandle": "info_array_ocr_output",
"targetHandle": "info_array_static_layout_input"
},
{"id": "edge-2",
"dataHandle": "recognizer_result",
"source": "layout_recognizer",
"target": "entity_parser",
"sourceHandle": "info_array_static_layout_output",
"targetHandle": "info_array_parser_input"
},
{"id": "edge-3",
"dataHandle": "parser_result",
"source": "entity_parser",
"target": "entity_deduplicator",
"sourceHandle": "info_array_parser_output",
"targetHandle": "info_array_deduplicator_input"
}
]
}
}

Update an existing pipeline

Authorizations:

ApiKeyAuth

path Parameters

pipelineId

required

string

ID of the pipeline to execute

Request Body schema: application/json

Pipeline request

pipelineId	string
name	string
deploy	boolean Default: true Whether to deploy the pipeline when creating/updating it
description	string
tag	string
version	integer
	object (PipelineDefinition)

Responses

Request samples

Payload

Content type

application/json

{"pipelineId": "receipt-pipeline",
"name": "receipt-pipeline",
"deploy": true,
"description": "A pipeline to extract contents from a receipt",
"tag": "string",
"version": 1,
"pipelineDefinition": {"pipelineDefinitionId": "receipt-pipeline-definition",
"nodes": [{"modelId": "ocr_recognizer",
"modelName": "ocr_recognizer",
"modelVersion": 1,
"modelParameters": {"param1": 1,
"param2": 2
},
"canvas": {"position": {"x": 0,
"y": 250
}
},
"inputs": {"inputName": "info_array_ocr_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_ocr_output",
"inputShape": [1
],
"inputType": "STRING"
}
},
{"modelId": "static_layout_recognizer",
"modelName": "static_layout_recognizer",
"modelVersion": 1,
"modelParameters": {"targetModel": "some_model",
"labels": ["date",
"name",
"amount"
]
},
"canvas": {"position": {"x": 0,
"y": 500
}
},
"inputs": {"inputName": "info_array_static_layout_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_static_layout_output",
"inputShape": [1
],
"inputType": "STRING"
}
},
{"modelId": "entity_parser",
"modelName": "entity_parser",
"modelVersion": 1,
"modelParameters": {"date": "DATE",
"name": "STRING",
"amount": "NUMBER"
},
"canvas": {"position": {"x": 0,
"y": 750
}
},
"inputs": {"inputName": "info_array_parser_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_parser_output",
"inputShape": [1
],
"inputType": "STRING"
}
},
{"modelId": "entity_deduplicator",
"modelName": "entity_deduplicator",
"modelVersion": 1,
"modelParameters": {"keys": ["date",
"name",
"amount"
]
},
"canvas": {"position": {"x": 0,
"y": 1000
}
},
"inputs": {"inputName": "info_array_deduplicator_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_deduplicator_output",
"inputShape": [1
],
"inputType": "STRING"
}
}
],
"edges": [{"id": "edge-1",
"dataHandle": "ocr_result",
"source": "ocr_recognizer",
"target": "layout_recognizer",
"sourceHandle": "info_array_ocr_output",
"targetHandle": "info_array_static_layout_input"
},
{"id": "edge-2",
"dataHandle": "recognizer_result",
"source": "layout_recognizer",
"target": "entity_parser",
"sourceHandle": "info_array_static_layout_output",
"targetHandle": "info_array_parser_input"
},
{"id": "edge-3",
"dataHandle": "parser_result",
"source": "entity_parser",
"target": "entity_deduplicator",
"sourceHandle": "info_array_parser_output",
"targetHandle": "info_array_deduplicator_input"
}
]
}
}

Response samples

200
400
401
404

Content type

application/json

{"pipeline": {"pipelineId": "string",
"name": "string",
"description": "string",
"tag": "string",
"status": "string",
"version": 0
},
"pipelineDefinition": {"pipelineDefinitionId": "receipt-pipeline-definition",
"nodes": [{"modelId": "ocr_recognizer",
"modelName": "ocr_recognizer",
"modelVersion": 1,
"modelParameters": {"param1": 1,
"param2": 2
},
"canvas": {"position": {"x": 0,
"y": 250
}
},
"inputs": {"inputName": "info_array_ocr_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_ocr_output",
"inputShape": [1
],
"inputType": "STRING"
}
},
{"modelId": "static_layout_recognizer",
"modelName": "static_layout_recognizer",
"modelVersion": 1,
"modelParameters": {"targetModel": "some_model",
"labels": ["date",
"name",
"amount"
]
},
"canvas": {"position": {"x": 0,
"y": 500
}
},
"inputs": {"inputName": "info_array_static_layout_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_static_layout_output",
"inputShape": [1
],
"inputType": "STRING"
}
},
{"modelId": "entity_parser",
"modelName": "entity_parser",
"modelVersion": 1,
"modelParameters": {"date": "DATE",
"name": "STRING",
"amount": "NUMBER"
},
"canvas": {"position": {"x": 0,
"y": 750
}
},
"inputs": {"inputName": "info_array_parser_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_parser_output",
"inputShape": [1
],
"inputType": "STRING"
}
},
{"modelId": "entity_deduplicator",
"modelName": "entity_deduplicator",
"modelVersion": 1,
"modelParameters": {"keys": ["date",
"name",
"amount"
]
},
"canvas": {"position": {"x": 0,
"y": 1000
}
},
"inputs": {"inputName": "info_array_deduplicator_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_deduplicator_output",
"inputShape": [1
],
"inputType": "STRING"
}
}
],
"edges": [{"id": "edge-1",
"dataHandle": "ocr_result",
"source": "ocr_recognizer",
"target": "layout_recognizer",
"sourceHandle": "info_array_ocr_output",
"targetHandle": "info_array_static_layout_input"
},
{"id": "edge-2",
"dataHandle": "recognizer_result",
"source": "layout_recognizer",
"target": "entity_parser",
"sourceHandle": "info_array_static_layout_output",
"targetHandle": "info_array_parser_input"
},
{"id": "edge-3",
"dataHandle": "parser_result",
"source": "entity_parser",
"target": "entity_deduplicator",
"sourceHandle": "info_array_parser_output",
"targetHandle": "info_array_deduplicator_input"
}
]
}
}

Delete a specific pipeline

Authorizations:

ApiKeyAuth

path Parameters

pipelineId

required

string

ID of the pipeline to execute

Responses

Response samples

401
404

Content type

application/json

{"code": "string",
"message": "string"
}

Get graph representation (definition) of a pipeline

Authorizations:

ApiKeyAuth

path Parameters

pipelineId

required

string

ID of the pipeline to execute

Responses

Response samples

200
400
401
404

Content type

application/json

{"pipelineDefinitionId": "receipt-pipeline-definition",
"nodes": [{"modelId": "ocr_recognizer",
"modelName": "ocr_recognizer",
"modelVersion": 1,
"modelParameters": {"param1": 1,
"param2": 2
},
"canvas": {"position": {"x": 0,
"y": 250
}
},
"inputs": {"inputName": "info_array_ocr_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_ocr_output",
"inputShape": [1
],
"inputType": "STRING"
}
},
{"modelId": "static_layout_recognizer",
"modelName": "static_layout_recognizer",
"modelVersion": 1,
"modelParameters": {"targetModel": "some_model",
"labels": ["date",
"name",
"amount"
]
},
"canvas": {"position": {"x": 0,
"y": 500
}
},
"inputs": {"inputName": "info_array_static_layout_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_static_layout_output",
"inputShape": [1
],
"inputType": "STRING"
}
},
{"modelId": "entity_parser",
"modelName": "entity_parser",
"modelVersion": 1,
"modelParameters": {"date": "DATE",
"name": "STRING",
"amount": "NUMBER"
},
"canvas": {"position": {"x": 0,
"y": 750
}
},
"inputs": {"inputName": "info_array_parser_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_parser_output",
"inputShape": [1
],
"inputType": "STRING"
}
},
{"modelId": "entity_deduplicator",
"modelName": "entity_deduplicator",
"modelVersion": 1,
"modelParameters": {"keys": ["date",
"name",
"amount"
]
},
"canvas": {"position": {"x": 0,
"y": 1000
}
},
"inputs": {"inputName": "info_array_deduplicator_input",
"inputShape": [1
],
"inputType": "STRING"
},
"outputs": {"inputName": "info_array_deduplicator_output",
"inputShape": [1
],
"inputType": "STRING"
}
}
],
"edges": [{"id": "edge-1",
"dataHandle": "ocr_result",
"source": "ocr_recognizer",
"target": "layout_recognizer",
"sourceHandle": "info_array_ocr_output",
"targetHandle": "info_array_static_layout_input"
},
{"id": "edge-2",
"dataHandle": "recognizer_result",
"source": "layout_recognizer",
"target": "entity_parser",
"sourceHandle": "info_array_static_layout_output",
"targetHandle": "info_array_parser_input"
},
{"id": "edge-3",
"dataHandle": "parser_result",
"source": "entity_parser",
"target": "entity_deduplicator",
"sourceHandle": "info_array_parser_output",
"targetHandle": "info_array_deduplicator_input"
}
]
}

Execute pipeline synchronously

Execute a pipeline synchronously and return the corresponding observation. The timeout is 30 seconds. If the computation takes longer than the timeout period a timeout error will be returned.

Authorizations:

ApiKeyAuth

path Parameters

pipelineId

required

string

ID of the pipeline to execute

Request Body schema: application/json

Execution request for a pipeline

executionId	string ID of the execution
tag	string Tag used to identify the resulting execution. Ignored if transient is true.
transient	boolean Whether to delete all execution data after pipeline completion
transform	string JSONata instruction to transform the result observation into a desired object. JSONata is a transformation language for JSON data. It can be used to transform . For more information see http://docs.jsonata.org/overview.html
tryImageConversion	boolean Default: false Tries to convert the provided content to an image (e.g. PDF)
trySimpleText	boolean Default: false Tries to extract readable text from input media (e.g. Word doc). A number of different file formats is supported. Internally Apache Tika is used for text extraction. A full list of supported file formats can be found here: https://tika.apache.org/2.9.1/formats.html
idempotent	boolean Default: false Whether to update the existing observation with the results from pipeline run (always true if executionId is null)
media	Array of strings <base64> Array of base 64 encoded media files. Content type will be detected automatically. For PDF, Docx, PPTX files the files will be rendered as images. The images can then be processed within a pipeline.
	object Not active yet!
	object (Execution Observation) The structured content of a set of media files.
text	string Raw text that can be used as input in pipelines

Responses

Request samples

Payload

Content type

application/json

{"executionId": "string",
"tag": "string",
"transient": true,
"transform": "string",
"tryImageConversion": false,
"trySimpleText": false,
"idempotent": false,
"media": ["string"
],
"runtimeParameters": { },
"observation": {"executionId": "execution-1",
"mediaContents": [{"id": "media-1",
"mediaId": "media-1",
"documentPages": [{"page": 1,
"document": {"id": "string",
"tables": [{"id": "string",
"entity": {"id": null,
"block": null,
"confidence": null,
"label": null,
"type": null,
"data": null,
"embedding": [ ],
"similarity": null,
"layoutType": null
},
"tag": "string",
"headers": [null
],
"rows": [null
]
}
],
"entities": [{"id": "string",
"block": {"text": null,
"geometry": null
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": null,
"textValue": null,
"quantityValue": null,
"numberValue": null,
"unitValue": null,
"dateValue": null,
"textData": null,
"quantityData": null,
"numberData": null,
"unitData": null,
"dateData": null,
"field": null,
"score": null,
"sourceIndex": null
},
"embedding": [null
],
"similarity": {"type": null,
"cosineSimilarity": null,
"amountDiff": null,
"numberDiff": null,
"same": null
},
"layoutType": "WORD"
}
],
"keyValueSet": {"id": "string",
"tag": "string",
"pairs": [{"key": null,
"entityValue": null,
"keyValueSetValue": null,
"tableValue": null,
"tag": null
}
],
"entity": {"id": "string",
"block": {"text": null,
"geometry": null
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": null,
"textValue": null,
"quantityValue": null,
"numberValue": null,
"unitValue": null,
"dateValue": null,
"textData": null,
"quantityData": null,
"numberData": null,
"unitData": null,
"dateData": null,
"field": null,
"score": null,
"sourceIndex": null
},
"embedding": [null
],
"similarity": {"type": null,
"cosineSimilarity": null,
"amountDiff": null,
"numberDiff": null,
"same": null
},
"layoutType": "WORD"
}
}
}
}
],
"imageHash": "string",
"codes": [{"id": "string",
"entity": {"id": "string",
"block": {"text": "string",
"geometry": {"polygon": {"points": [ ]
},
"boundingBox": {"width": null,
"height": null,
"left": null,
"top": null
}
}
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24",
"field": "string",
"score": 0,
"sourceIndex": "string"
},
"embedding": [0
],
"similarity": {"type": "TEXT_SIM",
"cosineSimilarity": 0,
"amountDiff": 0,
"numberDiff": 0,
"same": true
},
"layoutType": "WORD"
},
"tag": "string",
"payload": "string",
"type": "UPC_A"
}
],
"metaData": {"width": 0,
"height": 0
},
"label": {"index": 0,
"name": "string",
"confidence": 0
},
"rawText": "string"
}
],
"documents": [{"document": { },
"fields": [{"documentId": "string",
"index": "string",
"score": 0,
"fieldName": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24T14:15:22Z",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24T14:15:22Z",
"cosineSimilarity": 0,
"quantityDiff": 0,
"numberDiff": 0,
"same": true,
"entityId": "string"
}
],
"tag": "string",
"score": 0
}
]
},
"text": "string"
}

Response samples

200
400
401
404

Content type

application/json

{"executionId": "string",
"observation": {"executionId": "execution-1",
"mediaContents": [{"id": "media-1",
"mediaId": "media-1",
"documentPages": [{"page": 1,
"document": {"id": "string",
"tables": [{"id": "string",
"entity": {"id": null,
"block": null,
"confidence": null,
"label": null,
"type": null,
"data": null,
"embedding": [ ],
"similarity": null,
"layoutType": null
},
"tag": "string",
"headers": [null
],
"rows": [null
]
}
],
"entities": [{"id": "string",
"block": {"text": null,
"geometry": null
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": null,
"textValue": null,
"quantityValue": null,
"numberValue": null,
"unitValue": null,
"dateValue": null,
"textData": null,
"quantityData": null,
"numberData": null,
"unitData": null,
"dateData": null,
"field": null,
"score": null,
"sourceIndex": null
},
"embedding": [null
],
"similarity": {"type": null,
"cosineSimilarity": null,
"amountDiff": null,
"numberDiff": null,
"same": null
},
"layoutType": "WORD"
}
],
"keyValueSet": {"id": "string",
"tag": "string",
"pairs": [{"key": null,
"entityValue": null,
"keyValueSetValue": null,
"tableValue": null,
"tag": null
}
],
"entity": {"id": "string",
"block": {"text": null,
"geometry": null
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": null,
"textValue": null,
"quantityValue": null,
"numberValue": null,
"unitValue": null,
"dateValue": null,
"textData": null,
"quantityData": null,
"numberData": null,
"unitData": null,
"dateData": null,
"field": null,
"score": null,
"sourceIndex": null
},
"embedding": [null
],
"similarity": {"type": null,
"cosineSimilarity": null,
"amountDiff": null,
"numberDiff": null,
"same": null
},
"layoutType": "WORD"
}
}
}
}
],
"imageHash": "string",
"codes": [{"id": "string",
"entity": {"id": "string",
"block": {"text": "string",
"geometry": {"polygon": {"points": [ ]
},
"boundingBox": {"width": null,
"height": null,
"left": null,
"top": null
}
}
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24",
"field": "string",
"score": 0,
"sourceIndex": "string"
},
"embedding": [0
],
"similarity": {"type": "TEXT_SIM",
"cosineSimilarity": 0,
"amountDiff": 0,
"numberDiff": 0,
"same": true
},
"layoutType": "WORD"
},
"tag": "string",
"payload": "string",
"type": "UPC_A"
}
],
"metaData": {"width": 0,
"height": 0
},
"label": {"index": 0,
"name": "string",
"confidence": 0
},
"rawText": "string"
}
],
"documents": [{"document": { },
"fields": [{"documentId": "string",
"index": "string",
"score": 0,
"fieldName": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24T14:15:22Z",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24T14:15:22Z",
"cosineSimilarity": 0,
"quantityDiff": 0,
"numberDiff": 0,
"same": true,
"entityId": "string"
}
],
"tag": "string",
"score": 0
}
]
}
}

Transform an observation

Execute a pipeline synchronously and return the transformed observation

Authorizations:

ApiKeyAuth

path Parameters

pipelineId

required

string

ID of the pipeline to execute

Request Body schema: application/json

Execution request for a pipeline

executionId	string ID of the execution
tag	string Tag used to identify the resulting execution. Ignored if transient is true.
transient	boolean Whether to delete all execution data after pipeline completion
transform	string JSONata instruction to transform the result observation into a desired object. JSONata is a transformation language for JSON data. It can be used to transform . For more information see http://docs.jsonata.org/overview.html
tryImageConversion	boolean Default: false Tries to convert the provided content to an image (e.g. PDF)
trySimpleText	boolean Default: false Tries to extract readable text from input media (e.g. Word doc). A number of different file formats is supported. Internally Apache Tika is used for text extraction. A full list of supported file formats can be found here: https://tika.apache.org/2.9.1/formats.html
idempotent	boolean Default: false Whether to update the existing observation with the results from pipeline run (always true if executionId is null)
media	Array of strings <base64> Array of base 64 encoded media files. Content type will be detected automatically. For PDF, Docx, PPTX files the files will be rendered as images. The images can then be processed within a pipeline.
	object Not active yet!
	object (Execution Observation) The structured content of a set of media files.
text	string Raw text that can be used as input in pipelines

Responses

Request samples

Payload

Content type

application/json

{"executionId": "string",
"tag": "string",
"transient": true,
"transform": "string",
"tryImageConversion": false,
"trySimpleText": false,
"idempotent": false,
"media": ["string"
],
"runtimeParameters": { },
"observation": {"executionId": "execution-1",
"mediaContents": [{"id": "media-1",
"mediaId": "media-1",
"documentPages": [{"page": 1,
"document": {"id": "string",
"tables": [{"id": "string",
"entity": {"id": null,
"block": null,
"confidence": null,
"label": null,
"type": null,
"data": null,
"embedding": [ ],
"similarity": null,
"layoutType": null
},
"tag": "string",
"headers": [null
],
"rows": [null
]
}
],
"entities": [{"id": "string",
"block": {"text": null,
"geometry": null
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": null,
"textValue": null,
"quantityValue": null,
"numberValue": null,
"unitValue": null,
"dateValue": null,
"textData": null,
"quantityData": null,
"numberData": null,
"unitData": null,
"dateData": null,
"field": null,
"score": null,
"sourceIndex": null
},
"embedding": [null
],
"similarity": {"type": null,
"cosineSimilarity": null,
"amountDiff": null,
"numberDiff": null,
"same": null
},
"layoutType": "WORD"
}
],
"keyValueSet": {"id": "string",
"tag": "string",
"pairs": [{"key": null,
"entityValue": null,
"keyValueSetValue": null,
"tableValue": null,
"tag": null
}
],
"entity": {"id": "string",
"block": {"text": null,
"geometry": null
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": null,
"textValue": null,
"quantityValue": null,
"numberValue": null,
"unitValue": null,
"dateValue": null,
"textData": null,
"quantityData": null,
"numberData": null,
"unitData": null,
"dateData": null,
"field": null,
"score": null,
"sourceIndex": null
},
"embedding": [null
],
"similarity": {"type": null,
"cosineSimilarity": null,
"amountDiff": null,
"numberDiff": null,
"same": null
},
"layoutType": "WORD"
}
}
}
}
],
"imageHash": "string",
"codes": [{"id": "string",
"entity": {"id": "string",
"block": {"text": "string",
"geometry": {"polygon": {"points": [ ]
},
"boundingBox": {"width": null,
"height": null,
"left": null,
"top": null
}
}
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24",
"field": "string",
"score": 0,
"sourceIndex": "string"
},
"embedding": [0
],
"similarity": {"type": "TEXT_SIM",
"cosineSimilarity": 0,
"amountDiff": 0,
"numberDiff": 0,
"same": true
},
"layoutType": "WORD"
},
"tag": "string",
"payload": "string",
"type": "UPC_A"
}
],
"metaData": {"width": 0,
"height": 0
},
"label": {"index": 0,
"name": "string",
"confidence": 0
},
"rawText": "string"
}
],
"documents": [{"document": { },
"fields": [{"documentId": "string",
"index": "string",
"score": 0,
"fieldName": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24T14:15:22Z",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24T14:15:22Z",
"cosineSimilarity": 0,
"quantityDiff": 0,
"numberDiff": 0,
"same": true,
"entityId": "string"
}
],
"tag": "string",
"score": 0
}
]
},
"text": "string"
}

Response samples

400
401
404

Content type

application/json

{"code": "string",
"message": "string"
}

Transform an observation

Execute a pipeline synchronously and return the corresponding JSON object.

Authorizations:

ApiKeyAuth

path Parameters

pipelineId

required

string

ID of the pipeline to execute

Request Body schema: application/json

Execution request for a pipeline

executionId	string ID of the execution
tag	string Tag used to identify the resulting execution. Ignored if transient is true.
transient	boolean Whether to delete all execution data after pipeline completion
transform	string JSONata instruction to transform the result observation into a desired object. JSONata is a transformation language for JSON data. It can be used to transform . For more information see http://docs.jsonata.org/overview.html
tryImageConversion	boolean Default: false Tries to convert the provided content to an image (e.g. PDF)
trySimpleText	boolean Default: false Tries to extract readable text from input media (e.g. Word doc). A number of different file formats is supported. Internally Apache Tika is used for text extraction. A full list of supported file formats can be found here: https://tika.apache.org/2.9.1/formats.html
idempotent	boolean Default: false Whether to update the existing observation with the results from pipeline run (always true if executionId is null)
media	Array of strings <base64> Array of base 64 encoded media files. Content type will be detected automatically. For PDF, Docx, PPTX files the files will be rendered as images. The images can then be processed within a pipeline.
	object Not active yet!
	object (Execution Observation) The structured content of a set of media files.
text	string Raw text that can be used as input in pipelines

Responses

Request samples

Payload

Content type

application/json

{"executionId": "string",
"tag": "string",
"transient": true,
"transform": "string",
"tryImageConversion": false,
"trySimpleText": false,
"idempotent": false,
"media": ["string"
],
"runtimeParameters": { },
"observation": {"executionId": "execution-1",
"mediaContents": [{"id": "media-1",
"mediaId": "media-1",
"documentPages": [{"page": 1,
"document": {"id": "string",
"tables": [{"id": "string",
"entity": {"id": null,
"block": null,
"confidence": null,
"label": null,
"type": null,
"data": null,
"embedding": [ ],
"similarity": null,
"layoutType": null
},
"tag": "string",
"headers": [null
],
"rows": [null
]
}
],
"entities": [{"id": "string",
"block": {"text": null,
"geometry": null
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": null,
"textValue": null,
"quantityValue": null,
"numberValue": null,
"unitValue": null,
"dateValue": null,
"textData": null,
"quantityData": null,
"numberData": null,
"unitData": null,
"dateData": null,
"field": null,
"score": null,
"sourceIndex": null
},
"embedding": [null
],
"similarity": {"type": null,
"cosineSimilarity": null,
"amountDiff": null,
"numberDiff": null,
"same": null
},
"layoutType": "WORD"
}
],
"keyValueSet": {"id": "string",
"tag": "string",
"pairs": [{"key": null,
"entityValue": null,
"keyValueSetValue": null,
"tableValue": null,
"tag": null
}
],
"entity": {"id": "string",
"block": {"text": null,
"geometry": null
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": null,
"textValue": null,
"quantityValue": null,
"numberValue": null,
"unitValue": null,
"dateValue": null,
"textData": null,
"quantityData": null,
"numberData": null,
"unitData": null,
"dateData": null,
"field": null,
"score": null,
"sourceIndex": null
},
"embedding": [null
],
"similarity": {"type": null,
"cosineSimilarity": null,
"amountDiff": null,
"numberDiff": null,
"same": null
},
"layoutType": "WORD"
}
}
}
}
],
"imageHash": "string",
"codes": [{"id": "string",
"entity": {"id": "string",
"block": {"text": "string",
"geometry": {"polygon": {"points": [ ]
},
"boundingBox": {"width": null,
"height": null,
"left": null,
"top": null
}
}
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24",
"field": "string",
"score": 0,
"sourceIndex": "string"
},
"embedding": [0
],
"similarity": {"type": "TEXT_SIM",
"cosineSimilarity": 0,
"amountDiff": 0,
"numberDiff": 0,
"same": true
},
"layoutType": "WORD"
},
"tag": "string",
"payload": "string",
"type": "UPC_A"
}
],
"metaData": {"width": 0,
"height": 0
},
"label": {"index": 0,
"name": "string",
"confidence": 0
},
"rawText": "string"
}
],
"documents": [{"document": { },
"fields": [{"documentId": "string",
"index": "string",
"score": 0,
"fieldName": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24T14:15:22Z",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24T14:15:22Z",
"cosineSimilarity": 0,
"quantityDiff": 0,
"numberDiff": 0,
"same": true,
"entityId": "string"
}
],
"tag": "string",
"score": 0
}
]
},
"text": "string"
}

Response samples

200
400
401
404

Content type

application/json

{ }

Execute pipeline asynchronously

Authorizations:

ApiKeyAuth

path Parameters

pipelineId

required

string

ID of the pipeline to execute

Request Body schema: application/json

Execution request for a pipeline

executionId	string ID of the execution
tag	string Tag used to identify the resulting execution. Ignored if transient is true.
transient	boolean Whether to delete all execution data after pipeline completion
transform	string JSONata instruction to transform the result observation into a desired object. JSONata is a transformation language for JSON data. It can be used to transform . For more information see http://docs.jsonata.org/overview.html
tryImageConversion	boolean Default: false Tries to convert the provided content to an image (e.g. PDF)
trySimpleText	boolean Default: false Tries to extract readable text from input media (e.g. Word doc). A number of different file formats is supported. Internally Apache Tika is used for text extraction. A full list of supported file formats can be found here: https://tika.apache.org/2.9.1/formats.html
idempotent	boolean Default: false Whether to update the existing observation with the results from pipeline run (always true if executionId is null)
media	Array of strings <base64> Array of base 64 encoded media files. Content type will be detected automatically. For PDF, Docx, PPTX files the files will be rendered as images. The images can then be processed within a pipeline.
	object Not active yet!
	object (Execution Observation) The structured content of a set of media files.
text	string Raw text that can be used as input in pipelines

Responses

Request samples

Payload

Content type

application/json

{"executionId": "string",
"tag": "string",
"transient": true,
"transform": "string",
"tryImageConversion": false,
"trySimpleText": false,
"idempotent": false,
"media": ["string"
],
"runtimeParameters": { },
"observation": {"executionId": "execution-1",
"mediaContents": [{"id": "media-1",
"mediaId": "media-1",
"documentPages": [{"page": 1,
"document": {"id": "string",
"tables": [{"id": "string",
"entity": {"id": null,
"block": null,
"confidence": null,
"label": null,
"type": null,
"data": null,
"embedding": [ ],
"similarity": null,
"layoutType": null
},
"tag": "string",
"headers": [null
],
"rows": [null
]
}
],
"entities": [{"id": "string",
"block": {"text": null,
"geometry": null
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": null,
"textValue": null,
"quantityValue": null,
"numberValue": null,
"unitValue": null,
"dateValue": null,
"textData": null,
"quantityData": null,
"numberData": null,
"unitData": null,
"dateData": null,
"field": null,
"score": null,
"sourceIndex": null
},
"embedding": [null
],
"similarity": {"type": null,
"cosineSimilarity": null,
"amountDiff": null,
"numberDiff": null,
"same": null
},
"layoutType": "WORD"
}
],
"keyValueSet": {"id": "string",
"tag": "string",
"pairs": [{"key": null,
"entityValue": null,
"keyValueSetValue": null,
"tableValue": null,
"tag": null
}
],
"entity": {"id": "string",
"block": {"text": null,
"geometry": null
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": null,
"textValue": null,
"quantityValue": null,
"numberValue": null,
"unitValue": null,
"dateValue": null,
"textData": null,
"quantityData": null,
"numberData": null,
"unitData": null,
"dateData": null,
"field": null,
"score": null,
"sourceIndex": null
},
"embedding": [null
],
"similarity": {"type": null,
"cosineSimilarity": null,
"amountDiff": null,
"numberDiff": null,
"same": null
},
"layoutType": "WORD"
}
}
}
}
],
"imageHash": "string",
"codes": [{"id": "string",
"entity": {"id": "string",
"block": {"text": "string",
"geometry": {"polygon": {"points": [ ]
},
"boundingBox": {"width": null,
"height": null,
"left": null,
"top": null
}
}
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24",
"field": "string",
"score": 0,
"sourceIndex": "string"
},
"embedding": [0
],
"similarity": {"type": "TEXT_SIM",
"cosineSimilarity": 0,
"amountDiff": 0,
"numberDiff": 0,
"same": true
},
"layoutType": "WORD"
},
"tag": "string",
"payload": "string",
"type": "UPC_A"
}
],
"metaData": {"width": 0,
"height": 0
},
"label": {"index": 0,
"name": "string",
"confidence": 0
},
"rawText": "string"
}
],
"documents": [{"document": { },
"fields": [{"documentId": "string",
"index": "string",
"score": 0,
"fieldName": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24T14:15:22Z",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24T14:15:22Z",
"cosineSimilarity": 0,
"quantityDiff": 0,
"numberDiff": 0,
"same": true,
"entityId": "string"
}
],
"tag": "string",
"score": 0
}
]
},
"text": "string"
}

Response samples

200

Content type

application/json

{"executionId": "string"
}

Extractions

Extraction definitions.

Find out more in the documentation

Get all extractions

Authorizations:

ApiKeyAuth

Responses

Response samples

200

Content type

application/json

[{"extractionId": "string",
"description": "string",
"state": "string",
"compact": false,
"highPrecision": false,
"size": 1300,
"parallelProcessing": true,
"intelligentBatching": true,
"generationInstruct": { }
}
]

Create extraction

Create a document extraction. This automatically creates a pipeline that corresponds to the instructions in the extraction.

Authorizations:

ApiKeyAuth

Request Body schema: application/json

Extraction request

description	string The description of the extraction
compact	boolean Default: false Faster response but no confidences
highPrecision	boolean Default: false Used for higher precision but slower response
size	number Default: 1300 Image resolution in px. Higher leads to better precision but slower response.
parallelProcessing	boolean Default: true Pages will be processed in parallel. This leads to lower latency but context between pages will be lost.
intelligentBatching	boolean Default: true The AI will batch as many pages together as possible. This allows understanding content across pages. Will be ignored if parallelProcessing is true.
	object The instruct JSON expression that is used to describe the extraction.

Responses

Request samples

Payload

Content type

application/json

{"description": "string",
"compact": false,
"highPrecision": false,
"size": 1300,
"parallelProcessing": true,
"intelligentBatching": true,
"generationInstruct": { }
}

Response samples

200
400
401
404

Content type

application/json

{"pipelineId": "string"
}

Update extraction

Update an existing extraction and the corresponding pipeline.

Authorizations:

ApiKeyAuth

path Parameters

extractionId

required

string

ID of an extraction

Request Body schema: application/json

Extraction request

description	string The description of the extraction
compact	boolean Default: false Faster response but no confidences
highPrecision	boolean Default: false Used for higher precision but slower response
size	number Default: 1300 Image resolution in px. Higher leads to better precision but slower response.
parallelProcessing	boolean Default: true Pages will be processed in parallel. This leads to lower latency but context between pages will be lost.
intelligentBatching	boolean Default: true The AI will batch as many pages together as possible. This allows understanding content across pages. Will be ignored if parallelProcessing is true.
	object The instruct JSON expression that is used to describe the extraction.

Responses

Request samples

Payload

Content type

application/json

{"description": "string",
"compact": false,
"highPrecision": false,
"size": 1300,
"parallelProcessing": true,
"intelligentBatching": true,
"generationInstruct": { }
}

Response samples

200
400
401
404

Content type

application/json

{"pipelineId": "string"
}

Delete extraction

Authorizations:

ApiKeyAuth

path Parameters

extractionId

required

string

ID of an extraction

Responses

Response samples

400
401
404

Content type

application/json

{"code": "string",
"message": "string"
}

Indices

Find out more in the documentation

Get all indices

Get a list of all indices.

Authorizations:

ApiKeyAuth

query Parameters

limit	integer Limits the number of indices on a page
offset	integer Specifies the page number of the indices to be displayed

Responses

Response samples

200

Content type

application/json

[{"indexId": "string",
"semanticSearchFields": ["string"
]
}
]

Create index

Create a new index with an optional schema.

Authorizations:

ApiKeyAuth

Request Body schema: application/json

Index creation request

indexId	string Unique ID of the index.
semanticSearchFields	Array of strings A list of document fields that is used to semantically embed the field text.

Responses

Request samples

Payload

Content type

application/json

{"indexId": "string",
"semanticSearchFields": ["string"
]
}

Response samples

400
401

Content type

application/json

{"code": "string",
"message": "string"
}

Get index description

Get the description of an index. Including the data model if present.

Authorizations:

ApiKeyAuth

path Parameters

indexId

required

string

Example: Warehouse-Index

ID of the index

Responses

Response samples

200
400
401
404

Content type

application/json

{"indexId": "string",
"semanticSearchFields": ["string"
]
}

Delete index

Delete an index.

Authorizations:

ApiKeyAuth

path Parameters

indexId

required

string

Example: Warehouse-Index

ID of the index

Responses

Response samples

400
401
404

Content type

application/json

{"code": "string",
"message": "string"
}

Query an index with a search string

Authorizations:

ApiKeyAuth

path Parameters

indexId

required

string

Example: Warehouse-Index

ID of the index

Request Body schema: application/json

Query request

	object Fields used for full text search
	object A map that maps search strings to fields.
k	integer The number of results to return that are similar to the search string
searchPipelineId	string ID of the pipeline to be used for searching. If no pipeline is given, the default pipeline according to the number of search fields is selected.
	object (PipelineExecutionObject) The execution is a stateful environment in which media (such as images or PDF files) can be stored an used an inputs for pipelines. The ID in the request body is optional (generated if empty) and must be unique. Nothing will be persisted if transient is true. In order to trigger the pipeline either an execution ID containing valid media or base 64 encoded media under the media property have to be provided.

Responses

Request samples

Payload

Content type

application/json

{"fullText": {"property1": "string",
"property2": "string"
},
"semanticSearch": {"property1": "string",
"property2": "string"
},
"k": 0,
"searchPipelineId": "string",
"pipelineParameters": {"executionId": "string",
"tag": "string",
"transient": true,
"transform": "string",
"tryImageConversion": false,
"trySimpleText": false,
"idempotent": false,
"media": ["string"
],
"runtimeParameters": { },
"observation": {"executionId": "execution-1",
"mediaContents": [{"id": "media-1",
"mediaId": "media-1",
"documentPages": [{"page": 1,
"document": {"id": "string",
"tables": [{"id": null,
"entity": null,
"tag": null,
"headers": [ ],
"rows": [ ]
}
],
"entities": [{"id": null,
"block": null,
"confidence": null,
"label": null,
"type": null,
"data": null,
"embedding": [ ],
"similarity": null,
"layoutType": null
}
],
"keyValueSet": {"id": "string",
"tag": "string",
"pairs": [null
],
"entity": {"id": null,
"block": null,
"confidence": null,
"label": null,
"type": null,
"data": null,
"embedding": [ ],
"similarity": null,
"layoutType": null
}
}
}
}
],
"imageHash": "string",
"codes": [{"id": "string",
"entity": {"id": "string",
"block": {"text": "string",
"geometry": {"polygon": null,
"boundingBox": null
}
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24",
"field": "string",
"score": 0,
"sourceIndex": "string"
},
"embedding": [0
],
"similarity": {"type": "TEXT_SIM",
"cosineSimilarity": 0,
"amountDiff": 0,
"numberDiff": 0,
"same": true
},
"layoutType": "WORD"
},
"tag": "string",
"payload": "string",
"type": "UPC_A"
}
],
"metaData": {"width": 0,
"height": 0
},
"label": {"index": 0,
"name": "string",
"confidence": 0
},
"rawText": "string"
}
],
"documents": [{"document": { },
"fields": [{"documentId": "string",
"index": "string",
"score": 0,
"fieldName": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24T14:15:22Z",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24T14:15:22Z",
"cosineSimilarity": 0,
"quantityDiff": 0,
"numberDiff": 0,
"same": true,
"entityId": "string"
}
],
"tag": "string",
"score": 0
}
]
},
"text": "string"
}
}

Response samples

200
400
401
404

Content type

application/json

[{"indexId": "string",
"source": { },
"score": 0
}
]

Get all documents

Authorizations:

ApiKeyAuth

path Parameters

indexId

required

string

Example: Warehouse-Index

ID of the index

query Parameters

limit	integer Limits the number of documents on a page
offset	integer Specifies the page number of the documents to be displayed

Responses

Response samples

200

Content type

application/json

[{"indexId": "string",
"source": { },
"score": 0
}
]

Create document

Create a JSON document in an index. If the index does not exist it will be created automatically.

Authorizations:

ApiKeyAuth

path Parameters

indexId

required

string

Example: Warehouse-Index

ID of the index

Request Body schema: application/json

Document request

indexId	string
	object
score	number

Responses

Request samples

Payload

Content type

application/json

{"indexId": "string",
"source": { },
"score": 0
}

Response samples

400
401
404

Content type

application/json

{"code": "string",
"message": "string"
}

Create document

Create a JSON document in an index. If the index does not exist it will be created automatically.

Authorizations:

ApiKeyAuth

path Parameters

indexId

required

string

Example: Warehouse-Index

ID of the index

Request Body schema: application/json

Document batch request

Array

indexId	string
	object
score	number

Responses

Request samples

Payload

Content type

application/json

[{"indexId": "string",
"source": { },
"score": 0
}
]

Response samples

400
401
404

Content type

application/json

{"code": "string",
"message": "string"
}

Get document

Authorizations:

ApiKeyAuth

path Parameters

indexId required	string Example: Warehouse-Index ID of the index
documentId required	string ID of a document

Responses

Response samples

200
400
401
404

Content type

application/json

{"indexId": "string",
"source": { },
"score": 0
}

Delete document

Authorizations:

ApiKeyAuth

path Parameters

indexId required	string Example: Warehouse-Index ID of the index
documentId required	string ID of a document

Responses

Response samples

400
401
404

Content type

application/json

{"code": "string",
"message": "string"
}

Create an example

Create an extraction example used to

Authorizations:

ApiKeyAuth

path Parameters

extractionId

required

string

ID of an extraction

Request Body schema: application/json

Observation request

executionId required	string
	Array of objects (Image Content)
	Array of objects (Linked Document)

Responses

Request samples

Payload

Content type

application/json

{"executionId": "execution-1",
"mediaContents": [{"id": "media-1",
"mediaId": "media-1",
"documentPages": [{"page": 1,
"document": {"id": "string",
"tables": [{"id": "string",
"entity": {"id": "string",
"block": {"text": null,
"geometry": null
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": null,
"textValue": null,
"quantityValue": null,
"numberValue": null,
"unitValue": null,
"dateValue": null,
"textData": null,
"quantityData": null,
"numberData": null,
"unitData": null,
"dateData": null,
"field": null,
"score": null,
"sourceIndex": null
},
"embedding": [null
],
"similarity": {"type": null,
"cosineSimilarity": null,
"amountDiff": null,
"numberDiff": null,
"same": null
},
"layoutType": "WORD"
},
"tag": "string",
"headers": [{"id": null,
"block": null,
"confidence": null,
"label": null,
"type": null,
"data": null,
"embedding": [ ],
"similarity": null,
"layoutType": null
}
],
"rows": [{"id": null,
"tag": null,
"pairs": [ ],
"entity": null
}
]
}
],
"entities": [{"id": "string",
"block": {"text": "string",
"geometry": {"polygon": null,
"boundingBox": null
}
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24",
"field": "string",
"score": 0,
"sourceIndex": "string"
},
"embedding": [0
],
"similarity": {"type": "TEXT_SIM",
"cosineSimilarity": 0,
"amountDiff": 0,
"numberDiff": 0,
"same": true
},
"layoutType": "WORD"
}
],
"keyValueSet": {"id": "string",
"tag": "string",
"pairs": [{"key": {"id": null,
"block": null,
"confidence": null,
"label": null,
"type": null,
"data": null,
"embedding": [ ],
"similarity": null,
"layoutType": null
},
"entityValue": {"id": null,
"block": null,
"confidence": null,
"label": null,
"type": null,
"data": null,
"embedding": [ ],
"similarity": null,
"layoutType": null
},
"keyValueSetValue": { },
"tableValue": {"id": null,
"entity": null,
"tag": null,
"headers": [ ],
"rows": [ ]
},
"tag": "string"
}
],
"entity": {"id": "string",
"block": {"text": "string",
"geometry": {"polygon": null,
"boundingBox": null
}
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24",
"field": "string",
"score": 0,
"sourceIndex": "string"
},
"embedding": [0
],
"similarity": {"type": "TEXT_SIM",
"cosineSimilarity": 0,
"amountDiff": 0,
"numberDiff": 0,
"same": true
},
"layoutType": "WORD"
}
}
}
}
],
"imageHash": "string",
"codes": [{"id": "string",
"entity": {"id": "string",
"block": {"text": "string",
"geometry": {"polygon": {"points": [null
]
},
"boundingBox": {"width": 0,
"height": 0,
"left": 0,
"top": 0
}
}
},
"confidence": 0,
"label": "string",
"type": "STRING",
"data": {"documentId": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24",
"field": "string",
"score": 0,
"sourceIndex": "string"
},
"embedding": [0
],
"similarity": {"type": "TEXT_SIM",
"cosineSimilarity": 0,
"amountDiff": 0,
"numberDiff": 0,
"same": true
},
"layoutType": "WORD"
},
"tag": "string",
"payload": "string",
"type": "UPC_A"
}
],
"metaData": {"width": 0,
"height": 0
},
"label": {"index": 0,
"name": "string",
"confidence": 0
},
"rawText": "string"
}
],
"documents": [{"document": { },
"fields": [{"documentId": "string",
"index": "string",
"score": 0,
"fieldName": "string",
"textValue": "string",
"quantityValue": 0,
"numberValue": 0,
"unitValue": "string",
"dateValue": "2019-08-24T14:15:22Z",
"textData": "string",
"quantityData": 0,
"numberData": 0,
"unitData": "string",
"dateData": "2019-08-24T14:15:22Z",
"cosineSimilarity": 0,
"quantityDiff": 0,
"numberDiff": 0,
"same": true,
"entityId": "string"
}
],
"tag": "string",
"score": 0
}
]
}

Response samples

400
401
404

Content type

application/json

{"code": "string",
"message": "string"
}

Models

Get all registered models

Authorizations:

ApiKeyAuth

Responses

Response samples

200

Content type

application/json

[{"name": "receipt-pipeline",
"state": "string",
"description": {"name": "string",
"description": "string",
"version": 0,
"resourceTag": "string"
},
"config": { }
}
]

Register an uploaded model (currently internal only)

Authorizations:

ApiKeyAuth

Request Body schema: application/json

Model request

name	string
	object (ModelDescription)
	object

Responses

Request samples

Payload

Content type

application/json

{"name": "receipt-pipeline",
"description": {"name": "string",
"description": "string",
"version": 0,
"resourceTag": "string"
},
"config": { }
}

Response samples

401
404

Content type

application/json

{"code": "string",
"message": "string"
}

Copy a registered model

Authorizations:

ApiKeyAuth

path Parameters

modelName

required

string

Name of the model

Request Body schema: application/json

Model Copy request

newName	string
	object (ModelDescription)
	object

Responses

Request samples

Payload

Content type

application/json

{"newName": "receipt-pipeline",
"newDescription": {"name": "string",
"description": "string",
"version": 0,
"resourceTag": "string"
},
"newConfig": { }
}

Response samples

401
404

Content type

application/json

{"code": "string",
"message": "string"
}

Cambrion API (1.0)

Executions

Creates an execution

Authorizations:

Request Body schema: application/json

Responses

Request samples

Response samples

Gets all executions

Authorizations:

query Parameters

Responses

Response samples

Gets execution

Authorizations:

path Parameters

Responses

Response samples

Add media to an observation

Authorizations:

path Parameters

Request Body schema: image/jpegimage/pngapplication/pdfimage/jpeg

Responses

Response samples

Retrieve a specific media

Authorizations:

path Parameters

Responses

Merge a raw observation into the current observation

Authorizations:

path Parameters

Request Body schema: application/json

Responses

Request samples

Response samples

Get observation

Authorizations:

path Parameters

Responses

Response samples

Transform an observation

Authorizations:

path Parameters

Request Body schema: text/plain

Responses

Response samples

Transform an observation into JSON

Authorizations:

path Parameters

Responses

Response samples

Link results of an execution

Authorizations:

path Parameters

Request Body schema: application/json

Responses

Request samples

Response samples

Pipelines

Get all deployed pipelines

Authorizations:

Responses

Response samples

Create new pipeline

Authorizations:

Request Body schema: application/json

Responses

Request samples

Response samples

Get a specific pipeline

Authorizations:

path Parameters

Responses

Response samples

Update an existing pipeline

Authorizations:

path Parameters

Request Body schema: application/json

Responses

Request samples

Request Body schema:
image/jpeg