Blog

The Problem With Template-Based Document Extraction (And What to Use Instead)

February 16, 2026

If you've ever evaluated document extraction software, you've probably seen the pitch: define a template, map your fields, and let the tool pull data from your documents automatically. It sounds clean and logical. And for a while, it works.

Then a vendor updates their invoice layout. Or you onboard a new supplier who sends a format you've never seen. Or someone emails you a scanned PDF that's slightly rotated and your parser returns garbage. Suddenly you're back in the tool, rebuilding rules, adjusting templates, and wondering why you're spending so much time maintaining something that was supposed to save you time.

This is the fundamental problem with template-based extraction. It works great on documents you've already seen, and breaks on everything else. In practice, "everything else" is most of what shows up in your inbox.

How template-based extraction works (and where it falls apart)

The first generation of template-based tools like Docparser require you to define a parsing rule for each document layout. You draw zones on a sample document, tell the tool which fields to extract from which locations, and the tool repeats that extraction on every document that matches the template.

This is fine if you process a small number of document types that never change, but most businesses don't operate that way. Vendors change their invoice layouts without telling you. As your business grows, you constantly need to add new suppliers with their own document formats that you haven't templated yet. The same vendor might have a different layout for their US entity vs. their international subsidiary, or for regular invoices vs. credit memos. And seasonal vendors who only invoice you twice a year send something your team hasn't seen in six months and definitely didn't build a template for.

Every one of these scenarios means someone has to stop, open the tool, create or update a template, test it, and re-run the extraction. At low volumes, this is annoying. At high volumes, it becomes a part-time job.  

The model-training treadmill

"We spend a ton of time retraining the models."

When template-based tools started showing their limits, the next generation of document extraction tools promised something better: train a machine learning model on your documents, and let it learn to extract data without rigid templates.

In theory, this solves the template problem. In practice, it creates a new one. Model-based tools like Nanonets require you to feed sample documents, annotate fields, train the model, and validate its output for each document type. The initial setup typically takes 6-12 weeks and requires a lot of back and forth with the vendor's offshore team.

Then, when formats change or you add new vendors, you need to retrain every impacted model. One Nanonets customer told us that they spend a ton of time retraining the models, sometimes dozens of hours a month. And remember, this is after an already lengthy and time consuming multi-month setup period!

This particular company had actually already migrated from Docparser to Nanonets specifically to escape template maintenance. Unfortunately, they ended up in a different version of the same problem.

This pattern is more common than you'd think. Companies move from template tools to model-trained tools expecting a fundamentally different experience, and find themselves still doing manual work to keep the system running. The tool changes, but the template treadmill doesn't.

What "plug and play" actually looks like post-implementation with template-based document processing

"It was supposed to be plug and play, but the amount of... it's great for a quick and easy but it is absolutely one of the worst."

We hear some version of this on nearly every call with someone evaluating Lido after trying another tool. The specifics vary, but the story is consistent.

A government agency paid $30,000 for a Nanonets contract. They were told it would handle their document processing without heavy setup. To add insult to injury, Nanonets charged them for every extraction attempt, including the ones that failed.

"You didn't do the job the first time correctly and yeah... why are you charging me again?"

This is a recurring issue across document types and industries. A gas distribution company processing over 20,000 invoices, 2,000 supplier statements, and 5,000 customer POs per month migrated from Docparser to Nanonets. They built two separate models — one with intentional mapping (fed 50 sample pages), one without. They still have a manual approval process on every extraction, not because they want to review the business logic, but because they can't trust the accuracy of the output. As their operations lead put it: "The approval is all about the accurate extraction of the data. It has nothing to do with the content."

Think about that for a second. Their entire approval workflow exists because extraction accuracy isn't reliable enough. The "automation" still requires a human to check every result.

The common thread isn't that these teams picked the wrong tool. It's that template-based and model-trained tools share the same structural, architectural limitation: they need to be taught what a document looks like before they can read it. When the document changes, the teaching starts over.

Why this keeps happening

The extraction tool market has been stuck in a loop for years. Each new generation claims to improve on the last, but the underlying approach stays the same:

  1. Template tools: Define zones on each layout. Breaks when the layout changes.
  2. Model-trained tools: Feed samples and annotate. Retrain when you add vendors or formats change.
  3. "AI-powered" template tools: Still template-based, but with better marketing copy.

All of these approaches assume that document formats are stable and predictable. They aren't. Vendors update systems, redesign invoices, switch billing platforms, and merge entities. These changes happen without notice and without your consent, and every one of them is a potential failure point for tools that depend on knowing the layout in advance.

The result is a category of software that creates ongoing maintenance work for the people it's supposed to help. You end up babysitting your automation instead of benefiting from it.

What to look for instead

If you're evaluating document extraction tools (or re-evaluating after a bad experience), there are a few things worth prioritizing:

  1. Layout-agnostic extraction. The tool should extract data from a document it has never seen before, without templates, training, or pre-configuration. If someone has to define fields or zones for each new layout, you're buying a maintenance problem.
  2. Handling of messy documents. Test with your worst documents, not your cleanest ones. Scanned PDFs, faxed copies, handwritten notes, phone photos. If the tool only works on clean digital files, it won't survive your actual workflow. One prospect told us half their sales orders are handwritten; another has propane suppliers who "just like to handwrite everything." Real-world documents aren't clean.
  3. No penalty for iteration. Extraction isn't always perfect on the first pass. You should be able to refine your instructions and re-run without burning credits or racking up costs. Tools that charge per attempt — including failed attempts — are penalizing you for their own limitations.
  4. Speed to first value. If it takes weeks of model training or template setup before you see results, you'll never know whether the tool actually works on your documents until you're already committed. Look for something you can test in minutes with your own files.
  5. Format-change resilience. Ask what happens when a vendor changes their invoice layout. If the answer involves retraining, rebuilding, or contacting support, that's a recurring cost the pricing page won't show you.

How Lido handles this differently

Lido extracts data from any document — invoices, POs, claims, receipts, statements — without templates or model training. You upload a document, tell it what to extract, and get structured data back. When a vendor changes their layout, nothing breaks. When the extraction isn't perfect, you reprocess free for 24 hours.

  1. No templates to build or maintain
  2. No model training per document type
  3. Free reprocessing for 24 hours. No charge for iteration
  4. Works on scanned documents, handwriting, and messy inputs

Companies like ACS Industries, Hocutt, and Relay use Lido to process thousands of documents each week. ACS automates 400+ POs weekly and avoided a hire. Hocutt reduced utility bill processing time by 75%. Relay processes 16,000 Medicaid claims in 5 days.

If you're stuck maintaining templates or retraining models, there's a different approach worth testing. Try Lido free today, upload your own documents, and get accurate results instantly.

Frequently asked questions

What are the limitations of template-based document extraction?

Template-based extraction tools like Docparser require you to build and maintain a separate template for every document layout, which breaks whenever a vendor updates their format or you onboard a new supplier. Lido eliminates this entirely — it extracts data from any document without templates or model training. Esprigas migrated from Docparser to Nanonets to escape template maintenance, then evaluated Lido after spending "a ton of time retraining the models" on Nanonets for their 27,000 documents per month.

What is the best alternative to template-based document processing?

Lido is the best alternative to template-based extraction tools. It uses AI vision models, OCR, and LLMs to read any document layout without templates, model training, or per-vendor configuration — and reprocesses free for 24 hours when extraction needs refinement. ACS Industries automates 400+ POs weekly through Lido without building a single template, Hocutt reduced utility bill processing time by 75%, and Relay processes 16,000 Medicaid claims in 5 days.

Why do document extraction tools break when vendor formats change?

Template-based and model-trained extraction tools break because they depend on knowing a document's layout in advance — when a vendor updates their invoice format, the template or model no longer matches and must be rebuilt or retrained. Lido uses a layout-agnostic approach that reads documents the way a person would, so format changes don't cause failures. A government agency paid $30,000 for a Nanonets contract and watched it fail on their documents, while Esprigas spent dozens of hours monthly retraining models after vendor format changes.

What is layout-agnostic document extraction?

Layout-agnostic extraction means the tool reads documents by understanding visual structure and context rather than matching against predefined templates or trained models — so it handles any format on the first upload without configuration. Lido's layout-agnostic approach uses AI vision models and LLMs to extract data from documents it has never seen before, which is why companies processing thousands of documents from hundreds of vendors — like ACS Industries (400+ POs weekly) and Relay (16,000 Medicaid claims) — use it without maintaining any templates.

Ready to grow your business with document automation, not headcount?

Join hundreds of teams growing faster by automating the busywork with Lido.