Blog

How Customs Brokers Use OCR to Process Import Invoices and Packing Lists

February 23, 2026

A customs brokerage processing 3,000 entries a month doesn’t have a technology problem. They have a tedium problem. Every entry requires pulling roughly 50 fields from combined packing list and invoice PDFs—product descriptions, batch numbers, part numbers, net weights, country of origin, reference numbers—and keying them into the customs entry system. The documents arrive as massive PDFs from European suppliers, sometimes 80 pages for a single packet, sometimes 2,000 pages. Every customer uses a different layout. Nobody’s using the same standard format. And the person doing the data entry isn’t confused by the work. They know exactly what they’re looking at. It’s just that finding and typing 50 fields across 80 pages of invoices takes three hours when it should take three minutes.

As one customs broker put it after watching an OCR demo:

“What you just did in three minutes would have taken me three hours.”

This is the reality for customs brokerages, freight forwarders, and import/export operations teams processing commercial invoices and packing lists from international suppliers. The documents are predictable enough that a human can read them. But they’re inconsistent enough that standard PDF-to-Excel tools choke on them—especially European invoice formats with their comma-separated decimals, multi-language headers, and non-standard table structures. OCR-based extraction built for this kind of document variability is what finally makes the math work.

Lido is an AI-powered document extraction platform that processes packing lists, commercial invoices, and customs entry documents without templates or per-supplier configuration. Upload a combined packing list and invoice PDF—whether it’s 80 pages or 2,000—and Lido extracts product descriptions, batch numbers, net weights, country of origin, and every other field customs entry requires into structured spreadsheet data. One customs brokerage processing 3,000+ entries per month cut data entry time from hours to minutes per document packet—no templates built, no supplier-specific setup needed.

The document processing challenge in customs brokerage

  1. Customs entry requires pulling an extraordinary number of fields from source documents. A single entry might need 50 discrete data points extracted from a combined packing list and invoice PDF: product descriptions, harmonized tariff codes, batch numbers, part numbers, net weights, gross weights, country of origin, purchase order references, and more. Multiply that by 3,000 entries a month and you’re looking at 150,000 individual field extractions—each one typed by hand from a PDF on a second monitor.
  2. Document volume is staggering. A busy customs brokerage handling imports from manufacturing suppliers might process 5,000 to 10,000 pages of invoices and packing lists every month. A single packet from a European industrial supplier like Danieli can run to 2,000 pages. These aren’t neat one-page invoices with a vendor name at the top and a total at the bottom. They’re dense, multi-page documents with hundreds of line items, each one containing the fields that need to be extracted for the customs entry.
  3. Every customer uses a different format. This is the detail that kills any simple automation approach. The brokerage doesn’t control the invoice layout—the foreign supplier does. And every supplier, every division within a supplier, formats their documents differently. Column orders change. Field labels vary. Some invoices put the country of origin in a column; others embed it in the product description. Some packing lists include net weights; others don’t. A solution that requires building a template for every document layout would drown in configuration before it processes its first real entry.
  4. European invoice formats are particularly problematic for standard tools. Commas as decimal separators, periods as thousands separators, multi-language column headers, and non-standard table structures all conspire to break consumer-grade PDF-to-Excel converters. What looks like a simple table to a human reader produces garbled output when fed through a tool that expects American-format invoices. Customs brokers have learned to distrust automated conversion for exactly this reason—cleaning up bad OCR output takes longer than just typing the data manually.

How OCR extraction works for combined packing list and invoice PDFs

  1. The key challenge is that packing lists and invoices are often combined into a single PDF. A supplier ships goods and includes both the commercial invoice and the packing list in one document. Sometimes they’re clearly separated with cover pages. Sometimes the packing list follows immediately after the invoice with only a header change to indicate the transition. Sometimes the two are interleaved. The extraction system needs to handle all of these cases and pull the right fields from the right document type.
  2. AI-powered OCR reads these documents the way a human would. Instead of relying on fixed templates that break when a column moves two inches to the right, modern extraction uses language models that understand document structure contextually. The system identifies that a table column labeled “Beschreibung” on a German invoice contains product descriptions, just as it would recognize “Description” or “Désignation” on an English or French one. It finds batch numbers whether they’re in a dedicated column, a sub-header row, or embedded in the product description text.
  3. The output is structured data ready for customs entry. Instead of scrolling through an 80-page PDF and manually typing each field, the broker gets a clean spreadsheet with one row per line item and columns for every field needed: product description, batch number, part number, net weight, country of origin, reference number, and whatever other fields the specific customs entry requires. This is the core of automated invoice processing—turning unstructured document pages into structured, actionable data. For a document packet that used to take several hours of manual data entry, the extraction runs in minutes.
  4. Accuracy matters more than speed for customs work. A mistyped part number on a customs entry might not surface for two years—until an audit catches the discrepancy. As one broker noted: “Even for one line item ones I want to do it... if I mistype a number that won’t maybe appear for two years from now.” OCR extraction eliminates transcription errors by reading directly from the source document, and the output can be verified against the original PDF before submission.

Matching packing lists to invoices with batch numbers

  1. Customs entry often requires data from both the packing list and the invoice for the same line item. The invoice contains pricing, product descriptions, and reference numbers. The packing list contains weights, dimensions, country of origin, and package counts. To build a complete customs entry, you need to match line items across both documents—and the key to that match is usually the batch number.
  2. Batch numbers serve as the linking field between documents. A batch number that appears on line 47 of the invoice corresponds to the same batch number on the packing list, which provides the net weight and country of origin for that specific item. Without this match, the broker would have to manually cross-reference between the two documents—flipping back and forth in a 2,000-page PDF to find the packing list entry that corresponds to each invoice line item.
  3. Automated extraction makes this matching practical at scale. Once both the invoice and packing list data are extracted into structured tables, matching on batch number becomes a simple lookup operation—similar to how companies match purchase orders to invoices automatically. The system extracts batch numbers from both documents, aligns the records, and produces a consolidated output where each row contains both invoice fields (description, price, reference number) and packing list fields (net weight, country of origin, package count). What used to require hours of manual cross-referencing happens automatically.
  4. Edge cases still need human judgment. Sometimes batch numbers don’t match exactly—a packing list might use a truncated batch number, or a single invoice line item might span multiple packing list entries. The right approach is to automate the straightforward matches (which account for the vast majority of line items) and flag the exceptions for human review. The broker’s expertise goes where it’s actually needed instead of being spent on routine matching that a machine can do faster and more reliably.

Data normalization: country codes, missing fields, and tariff classification

  1. Raw extracted data isn’t ready for customs entry without normalization. The same information appears in different formats across different documents, and customs systems expect specific formats. Country of origin is the most common example: an invoice might list “Federal Republic of Germany” while the packing list uses “DE.” The customs entry system needs a consistent two-letter ISO country code. Without normalization, the broker has to manually convert every country name to its code—hundreds of times per day.
  2. Missing data needs to be flagged, not skipped. When a packing list doesn’t include net weight for certain line items, or when country of origin is absent from a particular section, the extraction system needs to mark those fields as “NA” rather than leaving them blank. A blank field in the customs entry might slide through unnoticed and cause problems during an audit. An explicit “NA” flag tells the broker exactly which items need manual follow-up with the supplier or additional research.
  3. Tariff classification sits outside the source documents entirely. Harmonized tariff codes (HTS codes) aren’t printed on the invoice or packing list—the broker assigns them from a separate tariff schedule based on the product description. This is where human expertise remains essential. OCR extraction can provide a clean, accurate product description that makes tariff lookup faster, but the classification decision itself requires trade compliance knowledge that can’t be fully automated. The goal is to give the broker a clean spreadsheet with all extractable fields populated, so they can focus their time on the classification work that actually requires their expertise.
  4. Computed columns handle the transformation automatically. Once extraction is configured, normalization rules run on every document: country names convert to ISO codes, weight units standardize to kilograms, date formats normalize to the customs system’s expected format. These transformations happen in the same workflow as the extraction, so the broker gets customs-ready data without a separate cleanup step. The transformation rules persist across documents, so a rule that converts “Federal Republic of Germany” to “DE” works on every future invoice from that supplier without additional configuration.

How Lido fits the customs brokerage workflow

  1. Lido is built for exactly this kind of high-volume, variable-format document processing. Upload the combined packing list and invoice PDF—whether it’s 80 pages or 2,000—and Lido extracts the fields you need into a structured spreadsheet. No templates to build per supplier. No rules to configure per document layout. You tell Lido what fields to extract (product description, batch number, net weight, country of origin, part number, reference number) and it finds them across whatever format the supplier uses.
  2. The results speak for themselves. A customs brokerage processing 3,000 entries a month saw data entry time on an 80-page invoice packet drop from several hours to minutes. Across their full monthly volume of 5,000 to 10,000 pages, the time savings compound dramatically. As one broker described it: “If I have to spend an hour as opposed to six, we’re way ahead of the game.”
  3. The value isn’t just speed—it’s accuracy. Manual data entry across thousands of pages inevitably introduces transcription errors. A mistyped batch number, a transposed digit in a net weight, a wrong country code—these errors might not surface until a customs audit months or years later. OCR extraction reads directly from the source document every time, eliminating the typos and transposition errors that come with manual keying. As one broker put it: “It’s not difficult or complicated. It’s just tedious. This takes away the tedious.”
  4. Lido handles the normalization and matching that customs work requires. Country names convert to ISO codes automatically. Batch number matching between packing lists and invoices happens in the same workflow. Missing fields get flagged as NA so nothing slips through. The broker gets a customs-entry-ready spreadsheet that they can review, add tariff codes to, and upload—instead of spending hours on the mechanical extraction work before they can even start the classification work that requires their expertise.
  5. Security and compliance are built in. Lido is SOC 2 compliant with HIPAA-grade data protection controls. Commercial invoices and packing lists contain sensitive trade data—pricing, supplier relationships, import volumes—that requires proper handling. Documents are encrypted in transit and at rest, access is logged and auditable, and data retention policies are configurable to meet your compliance requirements.

Ready to automate customs document processing?

Lido turns stacks of import invoices and packing lists into structured, customs-entry-ready data—automatically. Extract product descriptions, batch numbers, net weights, country codes, and every other field you need from any supplier format, in minutes instead of hours. Start with a free trial—50 pages, no credit card required.

Frequently asked questions

Can OCR handle European invoice formats with different number and date conventions?

Yes. AI-powered OCR systems like Lido understand document structure contextually rather than relying on rigid format expectations. European invoices that use commas as decimal separators, periods as thousands separators, and non-English column headers are processed accurately because the system interprets fields based on context and position, not just pattern matching. This is a critical capability for customs brokerages that receive invoices from suppliers across multiple countries, each with their own formatting conventions.

How does OCR match line items between a packing list and a commercial invoice?

The system extracts data from both documents into structured tables, then uses a shared identifier—typically the batch number or lot number—to match corresponding line items. An invoice line item with batch number 4521 gets linked to the packing list entry with the same batch number, consolidating invoice fields like pricing and descriptions with packing list fields like net weight and country of origin. Where batch numbers don’t match exactly or a single invoice line spans multiple packing list entries, those exceptions are flagged for human review.

What happens when required data is missing from the source documents?

When a field like net weight or country of origin is absent from the source document, the extraction system flags it as “NA” rather than leaving the field blank or skipping the line item entirely. This is important for customs compliance because a blank field might pass unnoticed during entry but cause problems during an audit. An explicit NA flag tells the broker exactly which items need manual follow-up—whether that means contacting the supplier for missing information or researching the data from other sources.

Does OCR automatically assign tariff codes to extracted line items?

No. Harmonized tariff codes (HTS codes) are not printed on commercial invoices or packing lists—they are assigned by the customs broker based on product descriptions and a separate tariff schedule. OCR extraction provides clean, accurate product descriptions that make tariff lookup faster and more reliable, but the classification decision itself requires trade compliance expertise. The goal is to automate the mechanical extraction work so brokers can spend their time on the classification and compliance work that actually requires their knowledge.

How does the system handle country of origin when it appears in different formats?

Country of origin appears in many formats across international trade documents—full country names like “Federal Republic of Germany,” abbreviated names like “Germany,” or two-letter ISO codes like “DE.” The extraction system normalizes all of these to a consistent format (typically the two-letter ISO code) that customs entry systems expect. This normalization happens automatically as part of the extraction workflow, so the broker doesn’t need to manually convert country names for every line item across thousands of pages of documents each month.

Ready to grow your business with document automation, not headcount?

Join hundreds of teams growing faster by automating the busywork with Lido.