If you've ever evaluated document extraction software, you've probably seen the pitch: define a template, map your fields, and let the tool pull data from your documents automatically. It sounds clean and logical. And for a while, it works.
Then a vendor updates their invoice layout. Or you onboard a new supplier who sends a format you've never seen. Or someone emails you a scanned PDF that's slightly rotated and your parser returns garbage. Suddenly you're back in the tool, rebuilding rules, adjusting templates, and wondering why you're spending so much time maintaining something that was supposed to save you time.
This is the fundamental problem with template-based extraction. It works great on documents you've already seen, and breaks on everything else. In practice, "everything else" is most of what shows up in your inbox.
The first generation of template-based tools like Docparser require you to define a parsing rule for each document layout. You draw zones on a sample document, tell the tool which fields to extract from which locations, and the tool repeats that extraction on every document that matches the template.
This is fine if you process a small number of document types that never change, but most businesses don't operate that way. Vendors change their invoice layouts without telling you. As your business grows, you constantly need to add new suppliers with their own document formats that you haven't templated yet. The same vendor might have a different layout for their US entity vs. their international subsidiary, or for regular invoices vs. credit memos. And seasonal vendors who only invoice you twice a year send something your team hasn't seen in six months and definitely didn't build a template for.
Every one of these scenarios means someone has to stop, open the tool, create or update a template, test it, and re-run the extraction. At low volumes, this is annoying. At high volumes, it becomes a part-time job.
"We spend a ton of time retraining the models."
When template-based tools started showing their limits, the next generation of document extraction tools promised something better: train a machine learning model on your documents, and let it learn to extract data without rigid templates.
In theory, this solves the template problem. In practice, it creates a new one. Model-based tools like Nanonets require you to feed sample documents, annotate fields, train the model, and validate its output for each document type. The initial setup typically takes 6-12 weeks and requires a lot of back and forth with the vendor's offshore team.
Then, when formats change or you add new vendors, you need to retrain every impacted model. One Nanonets customer told us that they spend a ton of time retraining the models, sometimes dozens of hours a month. And remember, this is after an already lengthy and time consuming multi-month setup period!
This particular company had actually already migrated from Docparser to Nanonets specifically to escape template maintenance. Unfortunately, they ended up in a different version of the same problem.
This pattern is more common than you'd think. Companies move from template tools to model-trained tools expecting a fundamentally different experience, and find themselves still doing manual work to keep the system running. The tool changes, but the template treadmill doesn't.
"It was supposed to be plug and play, but the amount of... it's great for a quick and easy but it is absolutely one of the worst."
We hear some version of this on nearly every call with someone evaluating Lido after trying another tool. The specifics vary, but the story is consistent.
A government agency paid $30,000 for a Nanonets contract. They were told it would handle their document processing without heavy setup. To add insult to injury, Nanonets charged them for every extraction attempt, including the ones that failed.
"You didn't do the job the first time correctly and yeah... why are you charging me again?"
This is a recurring issue across document types and industries. A gas distribution company processing over 20,000 invoices, 2,000 supplier statements, and 5,000 customer POs per month migrated from Docparser to Nanonets. They built two separate models — one with intentional mapping (fed 50 sample pages), one without. They still have a manual approval process on every extraction, not because they want to review the business logic, but because they can't trust the accuracy of the output. As their operations lead put it: "The approval is all about the accurate extraction of the data. It has nothing to do with the content."
Think about that for a second. Their entire approval workflow exists because extraction accuracy isn't reliable enough. The "automation" still requires a human to check every result.
The common thread isn't that these teams picked the wrong tool. It's that template-based and model-trained tools share the same structural, architectural limitation: they need to be taught what a document looks like before they can read it. When the document changes, the teaching starts over.
The extraction tool market has been stuck in a loop for years. Each new generation claims to improve on the last, but the underlying approach stays the same:
All of these approaches assume that document formats are stable and predictable. They aren't. Vendors update systems, redesign invoices, switch billing platforms, and merge entities. These changes happen without notice and without your consent, and every one of them is a potential failure point for tools that depend on knowing the layout in advance.
The result is a category of software that creates ongoing maintenance work for the people it's supposed to help. You end up babysitting your automation instead of benefiting from it.
If you're evaluating document extraction tools (or re-evaluating after a bad experience), there are a few things worth prioritizing:
Lido extracts data from any document — invoices, POs, claims, receipts, statements — without templates or model training. You upload a document, tell it what to extract, and get structured data back. When a vendor changes their layout, nothing breaks. When the extraction isn't perfect, you reprocess free for 24 hours.
Companies like ACS Industries, Hocutt, and Relay use Lido to process thousands of documents each week. ACS automates 400+ POs weekly and avoided a hire. Hocutt reduced utility bill processing time by 75%. Relay processes 16,000 Medicaid claims in 5 days.
If you're stuck maintaining templates or retraining models, there's a different approach worth testing. Try Lido free today, upload your own documents, and get accurate results instantly.