Python SDK
docpeel is the official Python client for the DocPeel API. Type-annotated dataclasses ship with the package — no extra typing-stubsneeded. Requires Python 3.8 or newer.
Install
pip install docpeel
# or with poetry
poetry add docpeelAuthentication
Pass your API key directly, or set the DOCPEEL_API_KEY environment variable. Generate keys in the API keys page.
from docpeel import DocPeel
# Explicit key
client = DocPeel(api_key="dpk_live_...")
# Or read from env (DOCPEEL_API_KEY)
client = DocPeel()Quickstart
import base64
from docpeel import DocPeel
client = DocPeel(api_key="dpk_live_...")
# The DocPeel API takes documents as base64 in a JSON body.
with open("invoice.pdf", "rb") as f:
file_b64 = base64.b64encode(f.read()).decode("ascii")
extraction = client.extractions.create(
file_b64=file_b64,
file_name="invoice.pdf",
template_id="tpl_invoice", # optional
)
for f in extraction.fields:
print(f.field, "=", f.value, f"({f.confidence}% confidence)")
# Invoice Number = INV-2024-081 (98% confidence)
# Total = $1,240.00 (97% confidence)Methods
client.ping()
Verify the key is valid. Returns metadata about the key and workspace.
me = client.ping()
# {'api_key': {'id': ..., 'name': ..., 'scopes': [...], 'workspace_id': ...},
# 'request_id': ...}client.extractions.create()
The DocPeel API is JSON-only — documents are sent as a base64 string in the request body. The SDK accepts a pre-encoded base64 string, or any common file representation (which it base64-encodes for you).
Recommended: base64 string
import base64
from docpeel import DocPeel
client = DocPeel(api_key="dpk_live_...")
# You bring the base64 — the SDK passes it through untouched.
with open("invoice.pdf", "rb") as f:
file_b64 = base64.b64encode(f.read()).decode("ascii")
ext = client.extractions.create(
file_b64=file_b64,
file_name="invoice.pdf",
content_type="application/pdf", # optional, inferred from file_name
template_id="tpl_invoice", # optional
)Convenience: let the SDK encode for you
# 1. From a path
ext = client.extractions.create("receipt.jpg")
# 2. From a binary file-like
with open("contract.pdf", "rb") as f:
ext = client.extractions.create(f, file_name="contract.pdf")
# 3. From raw bytes (file_name required)
ext = client.extractions.create(
pdf_bytes,
file_name="invoice.pdf",
content_type="application/pdf",
)You can pass the document via either of two parameters:
file_b64— a pre-encoded base64 string (adata:URI prefix is allowed and stripped). Requiresfile_name.file— a path (str/os.PathLike),bytes/bytearray/memoryview, or any binary file-like (open(p, "rb"),io.BytesIO). The SDK reads and base64-encodes it for you.
Wire format. Every call POSTs a JSON body to /v1/extractions with the shape {"file": ..., "file_name": ..., "content_type": ..., "template_id": ...}. No multipart upload is ever sent, so the SDK works in serverless functions and behind strict egress proxies. Max 20 MB per file (decoded).
client.extractions.retrieve(id)
ext = client.extractions.retrieve("ext_01HZX...")Data models
All responses are returned as typed @dataclass objects:
@dataclass
class Extraction:
id: str
status: Literal["processing", "completed", "failed"]
file_name: str
file_type: str
template_id: Optional[str]
confidence: Optional[float]
fields: List[ExtractionField]
data: Dict[str, Any]
source: Optional[str]
credits_used: Optional[int]
error_message: Optional[str]
created_at: Optional[str]
completed_at: Optional[str]
@dataclass
class ExtractionField:
id: int
field: str
value: str
confidence: float
explanation: strErrors
All HTTP errors raise DocPeelError with .status, .code, .message, and .request_id.
from docpeel import DocPeel, DocPeelError
import time
client = DocPeel()
try:
client.extractions.create("invoice.pdf")
except DocPeelError as err:
if err.code == "rate_limited":
time.sleep(1)
elif err.status == 401:
raise SystemExit("Invalid API key")
else:
raiseConfiguration
client = DocPeel(
api_key="dpk_live_...",
base_url="https://api.docpeel.com", # override for staging
timeout=60, # seconds, default 120
)For retries, custom adapters, or proxies, supply your own requests.Session:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
session = requests.Session()
session.mount(
"https://",
HTTPAdapter(max_retries=Retry(total=3, backoff_factor=0.5)),
)
client = DocPeel(api_key="dpk_live_...", session=session)Concurrency
The SDK is synchronous and thread-safe per client. To run extractions in parallel, use a thread pool:
from concurrent.futures import ThreadPoolExecutor
from docpeel import DocPeel
client = DocPeel()
files = ["inv1.pdf", "inv2.pdf", "inv3.pdf"]
with ThreadPoolExecutor(max_workers=8) as pool:
extractions = list(pool.map(client.extractions.create, files))