DocPeeldocs

Python SDK

docpeel is the official Python client for the DocPeel API. Type-annotated dataclasses ship with the package — no extra typing-stubsneeded. Requires Python 3.8 or newer.

Install

pip install docpeel
# or with poetry
poetry add docpeel

Authentication

Pass your API key directly, or set the DOCPEEL_API_KEY environment variable. Generate keys in the API keys page.

from docpeel import DocPeel

# Explicit key
client = DocPeel(api_key="dpk_live_...")

# Or read from env (DOCPEEL_API_KEY)
client = DocPeel()

Quickstart

import base64
from docpeel import DocPeel

client = DocPeel(api_key="dpk_live_...")

# The DocPeel API takes documents as base64 in a JSON body.
with open("invoice.pdf", "rb") as f:
    file_b64 = base64.b64encode(f.read()).decode("ascii")

extraction = client.extractions.create(
    file_b64=file_b64,
    file_name="invoice.pdf",
    template_id="tpl_invoice",  # optional
)

for f in extraction.fields:
    print(f.field, "=", f.value, f"({f.confidence}% confidence)")

# Invoice Number = INV-2024-081 (98% confidence)
# Total          = $1,240.00     (97% confidence)

Methods

client.ping()

Verify the key is valid. Returns metadata about the key and workspace.

me = client.ping()
# {'api_key': {'id': ..., 'name': ..., 'scopes': [...], 'workspace_id': ...},
#  'request_id': ...}

client.extractions.create()

The DocPeel API is JSON-only — documents are sent as a base64 string in the request body. The SDK accepts a pre-encoded base64 string, or any common file representation (which it base64-encodes for you).

Recommended: base64 string

import base64
from docpeel import DocPeel

client = DocPeel(api_key="dpk_live_...")

# You bring the base64 — the SDK passes it through untouched.
with open("invoice.pdf", "rb") as f:
    file_b64 = base64.b64encode(f.read()).decode("ascii")

ext = client.extractions.create(
    file_b64=file_b64,
    file_name="invoice.pdf",
    content_type="application/pdf",   # optional, inferred from file_name
    template_id="tpl_invoice",        # optional
)

Convenience: let the SDK encode for you

# 1. From a path
ext = client.extractions.create("receipt.jpg")

# 2. From a binary file-like
with open("contract.pdf", "rb") as f:
    ext = client.extractions.create(f, file_name="contract.pdf")

# 3. From raw bytes (file_name required)
ext = client.extractions.create(
    pdf_bytes,
    file_name="invoice.pdf",
    content_type="application/pdf",
)

You can pass the document via either of two parameters:

  • file_b64 — a pre-encoded base64 string (a data: URI prefix is allowed and stripped). Requires file_name.
  • file — a path (str / os.PathLike), bytes / bytearray / memoryview, or any binary file-like (open(p, "rb"), io.BytesIO). The SDK reads and base64-encodes it for you.

Wire format. Every call POSTs a JSON body to /v1/extractions with the shape {"file": ..., "file_name": ..., "content_type": ..., "template_id": ...}. No multipart upload is ever sent, so the SDK works in serverless functions and behind strict egress proxies. Max 20 MB per file (decoded).

client.extractions.retrieve(id)

ext = client.extractions.retrieve("ext_01HZX...")

Data models

All responses are returned as typed @dataclass objects:

@dataclass
class Extraction:
    id: str
    status: Literal["processing", "completed", "failed"]
    file_name: str
    file_type: str
    template_id: Optional[str]
    confidence: Optional[float]
    fields: List[ExtractionField]
    data: Dict[str, Any]
    source: Optional[str]
    credits_used: Optional[int]
    error_message: Optional[str]
    created_at: Optional[str]
    completed_at: Optional[str]

@dataclass
class ExtractionField:
    id: int
    field: str
    value: str
    confidence: float
    explanation: str

Errors

All HTTP errors raise DocPeelError with .status, .code, .message, and .request_id.

from docpeel import DocPeel, DocPeelError
import time

client = DocPeel()

try:
    client.extractions.create("invoice.pdf")
except DocPeelError as err:
    if err.code == "rate_limited":
        time.sleep(1)
    elif err.status == 401:
        raise SystemExit("Invalid API key")
    else:
        raise

Configuration

client = DocPeel(
    api_key="dpk_live_...",
    base_url="https://api.docpeel.com",  # override for staging
    timeout=60,                          # seconds, default 120
)

For retries, custom adapters, or proxies, supply your own requests.Session:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

session = requests.Session()
session.mount(
    "https://",
    HTTPAdapter(max_retries=Retry(total=3, backoff_factor=0.5)),
)

client = DocPeel(api_key="dpk_live_...", session=session)

Concurrency

The SDK is synchronous and thread-safe per client. To run extractions in parallel, use a thread pool:

from concurrent.futures import ThreadPoolExecutor
from docpeel import DocPeel

client = DocPeel()
files  = ["inv1.pdf", "inv2.pdf", "inv3.pdf"]

with ThreadPoolExecutor(max_workers=8) as pool:
    extractions = list(pool.map(client.extractions.create, files))

See also