Skip to main content

Processing documents

Processing documents with the Cambrion API is simply done by calling a pipeline directly. You can feed data into the pipeline by using the pipeline endpoint and the pipeline is triggered to be executed with your data. Once the pipeline finished processing your data, the results are delivered in the response body of the API call. Either as an observation, a JSON object derived from the observation or a JSONata transformed observation.

Here is a simple example of calling a pipeline with a PNG image as input media and returning the observation as result:

import base64
from io import BytesIO
from PIL import Image

import json
import requests

CAMBRION_API_KEY = "INSERT_API_KEY"
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
'X-API-Key': CAMBRION_API_KEY
}

url = "https://api.cambrion.io/v1/pipelines/my-pipeline/executeSync"

image = Image.open('./path/to/my/image/image.png')

buffered = BytesIO()
image.save(buffered, format="PNG")
img_str = base64.b64encode(buffered.getvalue()).decode()

payload = json.dumps({
"media": [img_str],
})

response = requests.request("POST", url, headers=headers, data=payload)

observation = response.json()

The response contains the observation as JSON produced by the pipeline with ID my-pipeline.

List pipelines

To get a list of avaiable pipelines perform a GET request on the pipelines endpoint:

import requests

CAMBRION_API_KEY = "INSERT_API_KEY"
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
'X-API-Key': CAMBRION_API_KEY
}

url = "https://api.cambrion.io/v1/pipelines"

response = requests.request("GET", url, headers=headers)

pipelines = response.json()

Asynchronously calling a pipeline

To implement background processing of documents one can execute a pipeline asynchronously by using the executeAsync endpoint:

import base64
from io import BytesIO
from PIL import Image

import json
import requests

CAMBRION_API_KEY = "INSERT_API_KEY"
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
'X-API-Key': CAMBRION_API_KEY
}

url = "https://api.cambrion.io/v1/pipelines/my-pipeline/executeAsync"

image = Image.open('./path/to/my/image/image.png')

buffered = BytesIO()
image.save(buffered, format="PNG")
img_str = base64.b64encode(buffered.getvalue()).decode()

payload = json.dumps({
"media": [img_str],
})

response = requests.request("POST", url, headers=headers, data=payload)

execution = response.json()

The request above reponds immediatly. The reponse object contains the ID of the execution which can be used to retrieve the resulting observation once the pipeline has finnished. The execution is created automatically if no execution ID is given.

Adding an executionId to the payload executes the pipeline on the media files contained in the execution. Therefore no media files need to be provided in the pipeline call:

payload = json.dumps({
"executionId": "my-execution-id",
})

See here on how to create an execution: Creating an execution

Idempotence of a pipeline

Executing a pipeline multiple times on an execution is a non-idempotent process. That means that the results of any additional call will be appended to the observation. Calling a pipeline directly with the same data will however result in the same result in every subsequent call.

info

The uploaded media objects are not persisted when calling the pipeline directly!

Underlying concept

Pipelines are based on NVIDIA Triton Ensemble Models.