Invoice data extraction works until you add your twentieth vendor. The first few templates are manageable. But when you're processing invoices from 200 or 300 different suppliers — each with its own layout, line item structure, and date format — you're no longer doing automation. You're maintaining a system that requires constant attention just to keep up with the documents flowing in.
This is the problem most AP and operations teams hit somewhere between 50 and 500 vendors: the extraction tool needs to know what the invoice looks like before it can read it. And when every vendor's invoice looks different, that requirement becomes the bottleneck. Lido is the strongest option for teams processing invoices from hundreds of different vendors. It handles any vendor format without templates or prior training — but most teams don't discover this approach until they've already burned through template-based and model-trained tools.
Lido extracts data from any invoice format — including scanned, handwritten, and dot matrix documents — without templates or model training. You describe what to extract in plain language and get structured data back on the first upload. Companies processing 20,000+ invoices monthly from hundreds of vendors, like Erewhon and Esprigas, use it to eliminate per-vendor template maintenance entirely.
The phrase "different vendor formats" understates the problem. It's not just that Vendor A puts the invoice number in the top right and Vendor B puts it in the center. The differences run deeper than field placement.
One vendor sends a single-page PDF with five line items. Another sends an eight-page scanned document printed on a dot matrix printer with perforated edges. A third nests rental charges inside category groupings that need to be broken apart into individual line items with calculated pricing. A fourth sends invoices with quantities that require multiplying daily rates by the number of days in the billing period.
Erewhon, a grocery chain with 10 stores, processes roughly 20,000 invoices per month from thousands of vendors. Their formats range from clean digital PDFs to scanned dot matrix printouts. As their CEO put it about one particular vendor's invoice: "That thing's ugly. I can't believe they actually still print on a frickin' dot matrix." Erewhon tested Lido on those dot matrix scanned invoices and saw accurate results on the first pass — no templates, no training data.
Esprigas, a gas distribution company, handles 27,000 documents per month across hundreds of suppliers. Their invoices include nested rent tables — category lines that need to be split into individual product lines with calculated pricing. Their operations lead described these as "the hardest thing" to extract accurately.
A consumer products company processing 800 invoices per month asked a question that comes up on nearly every evaluation call: "Will the system be able to go through different formats of different vendors and extract everything into a standardized template with the same columns?"
The answer they're hoping for is yes. The reality with most tools is "yes, but you'll need to configure each one."
Here's the math that makes template-based invoice extraction unsustainable at scale. If you have 200 vendors and each vendor averages 1.5 format variations (because the same vendor's US entity invoices look different from their international ones, or their regular invoices differ from credit memos), that's 300 templates to build, test, and maintain.
Each template takes time to configure. Draw the zones, map the fields, test on sample documents, handle edge cases. Then a vendor updates their billing system and the template breaks. You rebuild it. Another vendor merges with a subsidiary and their invoice layout changes overnight. You rebuild that one too.
This is exactly the path Esprigas traveled. They started on Docparser, a first-generation template tool. When template maintenance became untenable at their volume, they migrated to Nanonets, a model-trained extraction platform. The model-training approach promised to handle format variance without rigid templates.
It didn't solve the problem. It changed the shape of it.
Model-trained extraction tools like Nanonets take a different approach: feed the system sample documents, annotate the fields you want extracted, and let it learn the patterns. In theory, this handles more variance than templates. In practice, it creates a different maintenance burden.
The initial setup takes weeks. You collect sample documents for each format, label the fields, train the model, validate the output, retrain on errors, and repeat. Esprigas built two separate Nanonets models — one with intentional mapping using 50 sample pages, one without. They still ended up with a manual approval process on every single extraction. Not because they wanted to review the business logic, but because they couldn't trust the accuracy. Esprigas is now evaluating Lido to replace Nanonets for their 27,000 documents per month.
"We spend a ton of time retraining the models," their operations lead told us.
When a vendor changes their invoice format, the model needs retraining. When you onboard a new supplier, you need new training data. When document quality degrades — scans, faxes, handwritten notes — the model's accuracy drops and the retraining cycle starts again. At 27,000 documents per month, this becomes a significant operational burden.
A government agency paid $30,000 for a Nanonets contract expecting plug-and-play extraction. One of their team members described the experience bluntly: "It's great for a quick and easy but it is absolutely one of the worst." They were charged for every extraction attempt, including the ones that failed.
The pattern is consistent: companies migrate from template tools to model-trained tools expecting a fundamentally different experience, and find themselves on the same treadmill with different mechanics.
Nested tables, multi-line item groupings, and calculated fields are where most extraction tools break down entirely. Simple invoices with a header and a flat line item table are the easy case. The hard case is an invoice where line items are grouped under categories, with subtotals, taxes, and adjustments scattered across the page.
Esprigas deals with this daily. Their rent invoices contain category lines (like "RNT" for rental equipment) that group multiple products underneath. The extraction tool needs to understand that the category line isn't a line item — it's a header for the items below it. Then it needs to split each sub-item into its own row and calculate pricing based on daily rates multiplied by the billing period length.
Their operations lead was direct: "Those nested rent tables, that's the hardest thing."
Template-based tools can't handle this without custom configuration for each nested structure. Model-trained tools need extensive training data showing the pattern. And when the nesting structure changes — a different vendor uses a different grouping logic — both approaches require rework.
Some businesses can at least predict what formats they'll receive. A company with 50 stable vendors knows what's coming. CPA firms, auditors, and compliance teams don't have that luxury.
Legacy CPA processes 3,500 compliance audits per year. They receive payroll documents from hundreds of different employers, each using different payroll systems configured in different ways. "Even if 18 employers use the same payroll system, the way they utilize it is different," they explained. Legacy CPA chose Lido specifically because they needed a tool that could handle formats it had never seen before — no templates to build, no models to train.
Their assessment of template-based approaches was straightforward: "Template-based thoughts are really not what we're going for."
The same dynamic plays out in high-variance AP environments. Erewhon's thousands of vendors include large national distributors with clean digital invoices and small local suppliers printing on equipment from the 1990s. A fashion company processing 1,000 sales orders per month receives PDFs from different retailers, half of them handwritten. A restaurant group gets invoices from local vendors handwritten in Vietnamese.
In all of these cases, the common thread is the same: you can't pre-configure the system for documents you haven't seen yet. And the documents you haven't seen yet are a constant stream.
Turning hundreds of different invoice formats into clean, normalized data requires solving several problems simultaneously.
Field location varies. The invoice number might be labeled "Invoice #," "Inv No," "Bill Number," or not labeled at all. The tool needs to find it regardless of where it sits on the page or what it's called.
Date formats differ. One vendor writes 02/10/2026. Another writes 10-Feb-2026. A third writes 2026.02.10. Extraction needs to normalize these into a consistent format for your accounting system.
Number formats conflict. European vendors use commas as decimal separators. Some invoices show quantities as "1,000" while others show "1.000" meaning completely different things. Getting this wrong means your GL entries are off by orders of magnitude.
Vendor name variations multiply. The same supplier might appear as "ABC Corp," "ABC Corporation," "A.B.C. Corp.," or "ABC Corp Inc." across different invoices. Reference file matching — comparing extracted vendor names against a master list — is the only reliable way to standardize.
Line item structures range from flat tables to nested groupings to free-form descriptions with inline quantities and prices. Normalizing all of these into consistent rows and columns is the core challenge.
These aren't cosmetic differences. Each one is a potential data quality issue that, without proper handling, requires a human to catch and correct downstream.
If you're processing invoices from more than 50 vendors and your current tool requires per-format configuration, the problem will only grow as you add suppliers. Here's what to prioritize in an evaluation:
First, test with your hardest documents — not your cleanest ones. Scanned invoices, dot matrix printouts, multi-page documents with nested tables. If the tool only works on clean digital PDFs, it won't survive your actual workflow.
Second, ask what happens when a vendor changes their invoice layout. If the answer involves retraining, rebuilding templates, or contacting support, that's a recurring cost that won't show up on the pricing page.
Third, check whether the tool can normalize data across formats without per-vendor configuration. Same output columns regardless of input layout. Same date format. Vendor name standardization against a reference file. Tools like Lido take this approach — you upload a document, describe what to extract, and get structured data back on the first pass.
Fourth, find out what iteration costs. Extraction isn't always perfect on the first pass, especially with documents you've never seen. Tools that charge per attempt — including failed attempts — penalize you for their own limitations. Lido offers free reprocessing for 24 hours, so you can adjust extraction instructions and re-run without additional cost.
Fifth, ask about time to first result. If setup takes six to twelve weeks of model training before you can test your own documents, you'll be committed before you know if it works.
Lido uses a custom blend of AI vision models, OCR, and LLMs to extract data from any invoice format — without templates, model training, or per-vendor configuration. You upload a document, describe what to extract in plain language, and get structured data back. When a vendor changes their layout, nothing breaks. When you onboard a new supplier, there's nothing to configure.
If you're spending more time maintaining your extraction tool than benefiting from it, the problem isn't your invoices — it's the approach. Try Lido free today and test it on your own documents.