Skip to main content

Cambrion API (1.0)

Download OpenAPI specification:Download

The official Cambrion API specification. To receive a free API key reach out at hello@cambrion.ai with brief description of your use-case.

Executions

Execution environment that store results.

Creates an execution

Create an execution from an ID (optional). If an execution ID is given that ID will be used, otherwise a new one is created. If the execution already exists, it will be ignored and 204 will be returned. If a new execution was created, 200 is returned with the execution ID as body.

Authorizations:
ApiKeyAuth
Request Body schema: application/json

Execution is the context which holds data related to a specific execution

executionId
string

ID of the execution

tag
string

Tag to identify the execution

createdAt
string

Creation time

object

Responses

Request samples

Content type
application/json
{
  • "executionId": "string",
  • "tag": "string",
  • "createdAt": "string",
  • "metaData": { }
}

Response samples

Content type
application/json
{
  • "executionId": "string",
  • "tag": "string",
  • "createdAt": "string",
  • "metaData": { }
}

Gets all executions

Authorizations:
ApiKeyAuth
query Parameters
tag
string

Filter executions by tag

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Gets execution

Authorizations:
ApiKeyAuth
path Parameters
executionId
required
string

ID of an execution

Responses

Response samples

Content type
application/json
{
  • "executionId": "string",
  • "tag": "string",
  • "createdAt": "string",
  • "metaData": { }
}

Add media to an observation

Authorizations:
ApiKeyAuth
path Parameters
executionId
required
string

ID of an execution

Request Body schema:
string <base64>

Responses

Response samples

Content type
application/json
{
  • "mediaId": "string"
}

Retrieve a specific media

Authorizations:
ApiKeyAuth
path Parameters
executionId
required
string

ID of an execution

mediaId
required
string

ID of uploaded media

Responses

Merge a raw observation into the current observation

The raw observation is merged into the current observation context.

Authorizations:
ApiKeyAuth
path Parameters
executionId
required
string

ID of an execution

Request Body schema: application/json

Observation request

executionId
required
string
Array of objects (Image Content)
Array of objects (Linked Document)

Responses

Request samples

Content type
application/json
{
  • "executionId": "execution-1",
  • "mediaContents": [
    ],
  • "documents": [
    ]
}

Response samples

Content type
application/json
{
  • "code": "string",
  • "message": "string"
}

Get observation

Get a full observation of the execution.

Authorizations:
ApiKeyAuth
path Parameters
executionId
required
string

ID of an execution

Responses

Response samples

Content type
application/json
{
  • "executionId": "execution-1",
  • "mediaContents": [
    ],
  • "documents": [
    ]
}

Transform an observation

Transform a raw observation into an object using a JSONata statement. JSONata is a transformation language for JSON data. It can be used to transform . For more information see http://docs.jsonata.org/overview.html

Authorizations:
ApiKeyAuth
path Parameters
executionId
required
string

ID of an execution

Request Body schema: text/plain
string

Responses

Response samples

Content type
application/json
{
  • "code": "string",
  • "message": "string"
}

Transform an observation into JSON

Transform a raw observation into the corresponding JSON object. The values in the JSON object correspond to the data values in the observation. If data values are not available, the raw text is used.

Authorizations:
ApiKeyAuth
path Parameters
executionId
required
string

ID of an execution

Responses

Response samples

Content type
application/json
{ }

Pipelines

Machine learning pipelines.

Get all deployed pipelines

Authorizations:
ApiKeyAuth

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Create new pipeline

Authorizations:
ApiKeyAuth
Request Body schema: application/json

Pipeline request

pipelineId
string
name
string
deploy
boolean
Default: true

Whether to deploy the pipeline when creating/updating it

description
string
tag
string
version
integer
object (PipelineDefinition)

Responses

Request samples

Content type
application/json
{
  • "pipelineId": "receipt-pipeline",
  • "name": "receipt-pipeline",
  • "deploy": true,
  • "description": "A pipeline to extract contents from a receipt",
  • "tag": "string",
  • "version": 1,
  • "pipelineDefinition": {
    }
}

Response samples

Content type
application/json
{
  • "pipelineId": "string"
}

Get a specific pipeline

Authorizations:
ApiKeyAuth
path Parameters
pipelineId
required
string

ID of the pipeline to execute

Responses

Response samples

Content type
application/json
{
  • "pipeline": {
    },
  • "pipelineDefinition": {
    }
}

Update an existing pipeline

Authorizations:
ApiKeyAuth
path Parameters
pipelineId
required
string

ID of the pipeline to execute

Request Body schema: application/json

Pipeline request

pipelineId
string
name
string
deploy
boolean
Default: true

Whether to deploy the pipeline when creating/updating it

description
string
tag
string
version
integer
object (PipelineDefinition)

Responses

Request samples

Content type
application/json
{
  • "pipelineId": "receipt-pipeline",
  • "name": "receipt-pipeline",
  • "deploy": true,
  • "description": "A pipeline to extract contents from a receipt",
  • "tag": "string",
  • "version": 1,
  • "pipelineDefinition": {
    }
}

Response samples

Content type
application/json
{
  • "pipeline": {
    },
  • "pipelineDefinition": {
    }
}

Delete a specific pipeline

Authorizations:
ApiKeyAuth
path Parameters
pipelineId
required
string

ID of the pipeline to execute

Responses

Response samples

Content type
application/json
{
  • "code": "string",
  • "message": "string"
}

Get graph representation (definition) of a pipeline

Authorizations:
ApiKeyAuth
path Parameters
pipelineId
required
string

ID of the pipeline to execute

Responses

Response samples

Content type
application/json
{
  • "pipelineDefinitionId": "receipt-pipeline-definition",
  • "nodes": [
    ],
  • "edges": [
    ]
}

Execute pipeline synchronously

Execute a pipeline synchronously and return the corresponding observation. The timeout is 30 seconds. If the computation takes longer than the timeout period a timeout error will be returned.

Authorizations:
ApiKeyAuth
path Parameters
pipelineId
required
string

ID of the pipeline to execute

Request Body schema: application/json

Execution request for a pipeline

executionId
string

ID of the execution

tag
string

Tag used to identify the resulting execution. Ignored if transient is true.

transient
boolean

Whether to delete all execution data after pipeline completion

transform
string

JSONata instruction to transform the result observation into a desired object. JSONata is a transformation language for JSON data. It can be used to transform . For more information see http://docs.jsonata.org/overview.html

tryImageConversion
boolean
Default: false

Tries to convert the provided content to an image (e.g. PDF)

trySimpleText
boolean
Default: false

Tries to extract readable text from input media (e.g. Word doc). A number of different file formats is supported. Internally Apache Tika is used for text extraction. A full list of supported file formats can be found here: https://tika.apache.org/2.9.1/formats.html

idempotent
boolean
Default: false

Whether to update the existing observation with the results from pipeline run (always true if executionId is null)

media
Array of strings <base64>

Array of base 64 encoded media files. Content type will be detected automatically. For PDF, Docx, PPTX files the files will be rendered as images. The images can then be processed within a pipeline.

object

Not active yet!

object (Execution Observation)

The structured content of a set of media files.

text
string

Raw text that can be used as input in pipelines

Responses

Request samples

Content type
application/json
{
  • "executionId": "string",
  • "tag": "string",
  • "transient": true,
  • "transform": "string",
  • "tryImageConversion": false,
  • "trySimpleText": false,
  • "idempotent": false,
  • "media": [
    ],
  • "runtimeParameters": { },
  • "observation": {
    },
  • "text": "string"
}

Response samples

Content type
application/json
{
  • "executionId": "string",
  • "observation": {
    }
}

Transform an observation

Execute a pipeline synchronously and return the transformed observation

Authorizations:
ApiKeyAuth
path Parameters
pipelineId
required
string

ID of the pipeline to execute

Request Body schema: application/json

Execution request for a pipeline

executionId
string

ID of the execution

tag
string

Tag used to identify the resulting execution. Ignored if transient is true.

transient
boolean

Whether to delete all execution data after pipeline completion

transform
string

JSONata instruction to transform the result observation into a desired object. JSONata is a transformation language for JSON data. It can be used to transform . For more information see http://docs.jsonata.org/overview.html

tryImageConversion
boolean
Default: false

Tries to convert the provided content to an image (e.g. PDF)

trySimpleText
boolean
Default: false

Tries to extract readable text from input media (e.g. Word doc). A number of different file formats is supported. Internally Apache Tika is used for text extraction. A full list of supported file formats can be found here: https://tika.apache.org/2.9.1/formats.html

idempotent
boolean
Default: false

Whether to update the existing observation with the results from pipeline run (always true if executionId is null)

media
Array of strings <base64>

Array of base 64 encoded media files. Content type will be detected automatically. For PDF, Docx, PPTX files the files will be rendered as images. The images can then be processed within a pipeline.

object

Not active yet!

object (Execution Observation)

The structured content of a set of media files.

text
string

Raw text that can be used as input in pipelines

Responses

Request samples

Content type
application/json
{
  • "executionId": "string",
  • "tag": "string",
  • "transient": true,
  • "transform": "string",
  • "tryImageConversion": false,
  • "trySimpleText": false,
  • "idempotent": false,
  • "media": [
    ],
  • "runtimeParameters": { },
  • "observation": {
    },
  • "text": "string"
}

Response samples

Content type
application/json
{
  • "code": "string",
  • "message": "string"
}

Transform an observation

Execute a pipeline synchronously and return the corresponding JSON object.

Authorizations:
ApiKeyAuth
path Parameters
pipelineId
required
string

ID of the pipeline to execute

Request Body schema: application/json

Execution request for a pipeline

executionId
string

ID of the execution

tag
string

Tag used to identify the resulting execution. Ignored if transient is true.

transient
boolean

Whether to delete all execution data after pipeline completion

transform
string

JSONata instruction to transform the result observation into a desired object. JSONata is a transformation language for JSON data. It can be used to transform . For more information see http://docs.jsonata.org/overview.html

tryImageConversion
boolean
Default: false

Tries to convert the provided content to an image (e.g. PDF)

trySimpleText
boolean
Default: false

Tries to extract readable text from input media (e.g. Word doc). A number of different file formats is supported. Internally Apache Tika is used for text extraction. A full list of supported file formats can be found here: https://tika.apache.org/2.9.1/formats.html

idempotent
boolean
Default: false

Whether to update the existing observation with the results from pipeline run (always true if executionId is null)

media
Array of strings <base64>

Array of base 64 encoded media files. Content type will be detected automatically. For PDF, Docx, PPTX files the files will be rendered as images. The images can then be processed within a pipeline.

object

Not active yet!

object (Execution Observation)

The structured content of a set of media files.

text
string

Raw text that can be used as input in pipelines

Responses

Request samples

Content type
application/json
{
  • "executionId": "string",
  • "tag": "string",
  • "transient": true,
  • "transform": "string",
  • "tryImageConversion": false,
  • "trySimpleText": false,
  • "idempotent": false,
  • "media": [
    ],
  • "runtimeParameters": { },
  • "observation": {
    },
  • "text": "string"
}

Response samples

Content type
application/json
{ }

Execute pipeline asynchronously

Authorizations:
ApiKeyAuth
path Parameters
pipelineId
required
string

ID of the pipeline to execute

Request Body schema: application/json

Execution request for a pipeline

executionId
string

ID of the execution

tag
string

Tag used to identify the resulting execution. Ignored if transient is true.

transient
boolean

Whether to delete all execution data after pipeline completion

transform
string

JSONata instruction to transform the result observation into a desired object. JSONata is a transformation language for JSON data. It can be used to transform . For more information see http://docs.jsonata.org/overview.html

tryImageConversion
boolean
Default: false

Tries to convert the provided content to an image (e.g. PDF)

trySimpleText
boolean
Default: false

Tries to extract readable text from input media (e.g. Word doc). A number of different file formats is supported. Internally Apache Tika is used for text extraction. A full list of supported file formats can be found here: https://tika.apache.org/2.9.1/formats.html

idempotent
boolean
Default: false

Whether to update the existing observation with the results from pipeline run (always true if executionId is null)

media
Array of strings <base64>

Array of base 64 encoded media files. Content type will be detected automatically. For PDF, Docx, PPTX files the files will be rendered as images. The images can then be processed within a pipeline.

object

Not active yet!

object (Execution Observation)

The structured content of a set of media files.

text
string

Raw text that can be used as input in pipelines

Responses

Request samples

Content type
application/json
{
  • "executionId": "string",
  • "tag": "string",
  • "transient": true,
  • "transform": "string",
  • "tryImageConversion": false,
  • "trySimpleText": false,
  • "idempotent": false,
  • "media": [
    ],
  • "runtimeParameters": { },
  • "observation": {
    },
  • "text": "string"
}

Response samples

Content type
application/json
{
  • "executionId": "string"
}

Extractions

Extraction definitions.

Get all extractions

Authorizations:
ApiKeyAuth

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Create extraction

Create a document extraction. This automatically creates a pipeline that corresponds to the instructions in the extraction.

Authorizations:
ApiKeyAuth
Request Body schema: application/json

Extraction request

description
string

The description of the extraction

compact
boolean
Default: false

Faster response but no confidences

highPrecision
boolean
Default: false

Used for higher precision but slower response

size
number
Default: 1300

Image resolution in px. Higher leads to better precision but slower response.

parallelProcessing
boolean
Default: true

Pages will be processed in parallel. This leads to lower latency but context between pages will be lost.

intelligentBatching
boolean
Default: true

The AI will batch as many pages together as possible. This allows understanding content across pages. Will be ignored if parallelProcessing is true.

object

The instruct JSON expression that is used to describe the extraction.

Responses

Request samples

Content type
application/json
{
  • "description": "string",
  • "compact": false,
  • "highPrecision": false,
  • "size": 1300,
  • "parallelProcessing": true,
  • "intelligentBatching": true,
  • "generationInstruct": { }
}

Response samples

Content type
application/json
{
  • "pipelineId": "string"
}

Update extraction

Update an existing extraction and the corresponding pipeline.

Authorizations:
ApiKeyAuth
path Parameters
extractionId
required
string

ID of an extraction

Request Body schema: application/json

Extraction request

description
string

The description of the extraction

compact
boolean
Default: false

Faster response but no confidences

highPrecision
boolean
Default: false

Used for higher precision but slower response

size
number
Default: 1300

Image resolution in px. Higher leads to better precision but slower response.

parallelProcessing
boolean
Default: true

Pages will be processed in parallel. This leads to lower latency but context between pages will be lost.

intelligentBatching
boolean
Default: true

The AI will batch as many pages together as possible. This allows understanding content across pages. Will be ignored if parallelProcessing is true.

object

The instruct JSON expression that is used to describe the extraction.

Responses

Request samples

Content type
application/json
{
  • "description": "string",
  • "compact": false,
  • "highPrecision": false,
  • "size": 1300,
  • "parallelProcessing": true,
  • "intelligentBatching": true,
  • "generationInstruct": { }
}

Response samples

Content type
application/json
{
  • "pipelineId": "string"
}

Delete extraction

Authorizations:
ApiKeyAuth
path Parameters
extractionId
required
string

ID of an extraction

Responses

Response samples

Content type
application/json
{
  • "code": "string",
  • "message": "string"
}

Get all indices

Get a list of all indices.

Authorizations:
ApiKeyAuth
query Parameters
limit
integer

Limits the number of indices on a page

offset
integer

Specifies the page number of the indices to be displayed

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Create index

Create a new index with an optional schema.

Authorizations:
ApiKeyAuth
Request Body schema: application/json

Index creation request

indexId
string

Unique ID of the index.

semanticSearchFields
Array of strings

A list of document fields that is used to semantically embed the field text.

Responses

Request samples

Content type
application/json
{
  • "indexId": "string",
  • "semanticSearchFields": [
    ]
}

Response samples

Content type
application/json
{
  • "code": "string",
  • "message": "string"
}

Get index description

Get the description of an index. Including the data model if present.

Authorizations:
ApiKeyAuth
path Parameters
indexId
required
string
Example: Warehouse-Index

ID of the index

Responses

Response samples

Content type
application/json
{
  • "indexId": "string",
  • "semanticSearchFields": [
    ]
}

Delete index

Delete an index.

Authorizations:
ApiKeyAuth
path Parameters
indexId
required
string
Example: Warehouse-Index

ID of the index

Responses

Response samples

Content type
application/json
{
  • "code": "string",
  • "message": "string"
}

Query an index with a search string

Query an index with a search string

Authorizations:
ApiKeyAuth
path Parameters
indexId
required
string
Example: Warehouse-Index

ID of the index

Request Body schema: application/json

Query request

object

Fields used for full text search

object

A map that maps search strings to fields.

k
integer

The number of results to return that are similar to the search string

searchPipelineId
string

ID of the pipeline to be used for searching. If no pipeline is given, the default pipeline according to the number of search fields is selected.

object (PipelineExecutionObject)

The execution is a stateful environment in which media (such as images or PDF files) can be stored an used an inputs for pipelines. The ID in the request body is optional (generated if empty) and must be unique. Nothing will be persisted if transient is true. In order to trigger the pipeline either an execution ID containing valid media or base 64 encoded media under the media property have to be provided.

Responses

Request samples

Content type
application/json
{
  • "fullText": {
    },
  • "semanticSearch": {
    },
  • "k": 0,
  • "searchPipelineId": "string",
  • "pipelineParameters": {
    }
}

Response samples

Content type
application/json
[
  • {
    }
]

Get all documents

Authorizations:
ApiKeyAuth
path Parameters
indexId
required
string
Example: Warehouse-Index

ID of the index

query Parameters
limit
integer

Limits the number of documents on a page

offset
integer

Specifies the page number of the documents to be displayed

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Create document

Create a JSON document in an index. If the index does not exist it will be created automatically.

Authorizations:
ApiKeyAuth
path Parameters
indexId
required
string
Example: Warehouse-Index

ID of the index

Request Body schema: application/json

Document request

indexId
string
object
score
number

Responses

Request samples

Content type
application/json
{
  • "indexId": "string",
  • "source": { },
  • "score": 0
}

Response samples

Content type
application/json
{
  • "code": "string",
  • "message": "string"
}

Create document

Create a JSON document in an index. If the index does not exist it will be created automatically.

Authorizations:
ApiKeyAuth
path Parameters
indexId
required
string
Example: Warehouse-Index

ID of the index

Request Body schema: application/json

Document batch request

Array
indexId
string
object
score
number

Responses

Request samples

Content type
application/json
[
  • {
    }
]

Response samples

Content type
application/json
{
  • "code": "string",
  • "message": "string"
}

Get document

Authorizations:
ApiKeyAuth
path Parameters
indexId
required
string
Example: Warehouse-Index

ID of the index

documentId
required
string

ID of a document

Responses

Response samples

Content type
application/json
{
  • "indexId": "string",
  • "source": { },
  • "score": 0
}

Delete document

Authorizations:
ApiKeyAuth
path Parameters
indexId
required
string
Example: Warehouse-Index

ID of the index

documentId
required
string

ID of a document

Responses

Response samples

Content type
application/json
{
  • "code": "string",
  • "message": "string"
}

Create an example

Create an extraction example used to

Authorizations:
ApiKeyAuth
path Parameters
extractionId
required
string

ID of an extraction

Request Body schema: application/json

Observation request

executionId
required
string
Array of objects (Image Content)
Array of objects (Linked Document)

Responses

Request samples

Content type
application/json
{
  • "executionId": "execution-1",
  • "mediaContents": [
    ],
  • "documents": [
    ]
}

Response samples

Content type
application/json
{
  • "code": "string",
  • "message": "string"
}

Models

Get all registered models

Authorizations:
ApiKeyAuth

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Register an uploaded model (currently internal only)

Authorizations:
ApiKeyAuth
Request Body schema: application/json

Model request

name
string
object (ModelDescription)
object

Responses

Request samples

Content type
application/json
{
  • "name": "receipt-pipeline",
  • "description": {
    },
  • "config": { }
}

Response samples

Content type
application/json
{
  • "code": "string",
  • "message": "string"
}

Copy a registered model

Authorizations:
ApiKeyAuth
path Parameters
modelName
required
string

Name of the model

Request Body schema: application/json

Model Copy request

newName
string
object (ModelDescription)
object

Responses

Request samples

Content type
application/json
{
  • "newName": "receipt-pipeline",
  • "newDescription": {
    },
  • "newConfig": { }
}

Response samples

Content type
application/json
{
  • "code": "string",
  • "message": "string"
}