Email automation

Email-to-JSON extraction: a practical workflow guide | DocPeel®

Learn how to normalize forwarded emails and attachments into structured payloads for CRM, finance, and support automations.

7 min readUpdated April 21, 2026

Why email is the hardest data source to automate

Email is the universal business inbox: orders, leads, supplier confirmations, and support requests all arrive as free-form prose with no consistent schema. Regex-based parsers break the moment a sender changes their template. Zapier email parsers need per-sender training and still miss edge cases.

The problem is compounded by attachments. An order confirmation email might include the invoice as an attached PDF, the delivery details in the email body, and the tracking number in an HTML table inside the email. These are three separate parsing problems if you treat the email as a text file — but a single extraction job if you treat the email holistically.

Start with normalized inputs

Before extraction logic matters, incoming emails must arrive in a reliable format. Set up a dedicated inbound address (every DocPeel workspace has one) or a forwarding rule from your existing inbox. This normalises the ingestion path regardless of whether emails are forwarded manually, routed by a filter, or sent by an automated system.

MIME parsing — handling plain text, HTML, multipart, and attachment types — is managed by DocPeel automatically. You do not need to pre-process emails before submitting them.

Route by confidence, not by hope

High-confidence fields can move straight into downstream systems. Lower-confidence fields should create a review path instead of silently producing bad data. This is the most important architectural decision in any email extraction workflow.

DocPeel returns a confidence score for every extracted field. A common pattern is to auto-commit extractions where all required fields score above 0.85, and to create a review task for anything below that threshold. This captures around 90% of emails without human intervention while ensuring edge cases are caught before they reach the CRM or ERP.

Handling attachments in the same job

A significant share of business emails arrive with a PDF, spreadsheet, or image attached. DocPeel processes the email body and its attachments together in a single extraction job, returning a unified JSON payload.

This means an order confirmation email that includes an attached invoice produces one structured result with all fields — body fields and attachment fields merged — rather than two separate jobs to correlate manually. The attachment type (PDF, PNG, DOCX) is identified automatically and the appropriate extraction path is applied.

Common email-to-JSON workflows

Order confirmations: extract order number, customer name, line items, shipping address, and total. Post the result to an order management system or Google Sheet the moment the email arrives.

Lead capture: pull contact name, company, phone, country, and message from any inbound enquiry email. Create a CRM contact automatically — no per-sender Zap training required.

Supplier confirmations: parse purchase order acknowledgements for PO number, confirmed quantities, delivery dates, and any line-item deviations. Alert the procurement team via Slack if a deviation is detected.

Support triage: extract issue type, product mentioned, urgency signals, and customer contact info. Create a helpdesk ticket with pre-filled fields and route it to the correct queue.

Delivering results to your stack in real time

DocPeel delivers results via webhook the moment extraction completes. Your endpoint receives the full JSON payload — extracted fields, confidence scores, job metadata, and the original file reference — within seconds of the email arriving.

For CRM integrations, the webhook handler can check whether a contact already exists by email address and either update the record or create a new one. For finance workflows, it can post directly to a QuickBooks or Xero ingestion endpoint. For support, it can call the Zendesk or Intercom API to create a ticket.

If your workflow requires review before committing, the webhook can push to an internal staging endpoint or a Google Sheet for human sign-off, with a separate automation triggered once the row is approved.

Frequently asked questions

Does DocPeel read HTML email bodies correctly?

Yes. The parser handles plain text, HTML, and multipart MIME emails. Tables in HTML bodies are extracted as structured arrays, and inline images are processed alongside text fields.

Can I forward emails directly to DocPeel for automatic processing?

Yes. Every workspace has a dedicated inbound email address. Forwarding or CC-ing that address triggers extraction automatically, with results available via webhook or the dashboard.

What happens when an email has multiple attachments of different types?

Attachments are extracted in parallel and results are merged into a single job payload. A PDF invoice, a PNG signature, and a CSV spreadsheet attached to the same email are all processed together.

How do I handle emails where the same field appears in both the body and an attachment?

DocPeel returns all extracted values. If the same field appears in both the body and an attachment, both values are returned with individual confidence scores. Your downstream logic can choose the higher-confidence value or flag the discrepancy for review.

Can DocPeel extract from emails in languages other than English?

Yes. DocPeel supports 60+ languages. German supplier confirmations, French lead enquiries, and Spanish order confirmations are all processed without any language-specific configuration.

Put the workflow into production

DocPeel gives teams a direct path from incoming documents to clean JSON, with export options for spreadsheets, webhooks, and downstream APIs.