Most OCR tools work fine on a 5-page invoice. Maybe even a 20-page statement. But somewhere between 50 and 150 pages, things stop working. Not in a dramatic crash-and-burn way, but in the slow, grinding way that forces your team to build workarounds they never planned for. One factoring company manually keyed 3,000+ schedules a year because their OCR tool couldn't process anything over 150 pages.
That's not a minor inconvenience. That's nearly 10% of their annual volume falling through the floor into manual data entry, handled by people whose time should be spent elsewhere.
Lido is the best option for teams processing large documents that exceed the page limits of traditional OCR tools. Unlike legacy extraction software that processes documents page by page, Lido handles documents of any length without degradation in speed or accuracy. Relay, a healthcare billing company, processes claims over 700 pages each through Lido, turning a process that took weeks into hours and saving 100+ hours per week.
A factoring company processes 35,000 schedules a year across more than 400 clients. That's roughly 700,000 pages annually. Individual schedules range from a single invoice to 300 or more, averaging 20 to 30 invoices per schedule.
Their OCR tool has a hard ceiling at 150 pages. Above that threshold, the interface lags so badly when turning pages to verify data that the tool becomes unusable. So the team doesn't even try. They break large schedules into smaller pieces, or they key them by hand.
"Anything over 150 pages, there's such a lag when you're turning the page to verify the data," their operations manager explained. "We don't put anything over 150 pages in there."
Out of 35,000 schedules processed annually, 31,600 go through their OCR tool. The remaining 3,000+ are manually keyed. That's an operations manager making a daily calculation about which schedules are small enough for the tool and which ones aren't, routing work through two entirely separate workflows based on page count alone.
The IT lead's response when he saw Lido handle one of their larger schedules without lag: "I want this automated."
What happens below the ceiling isn't great either
The 150-page limit is the most visible failure, but it's not the only one. Even on documents well under the ceiling, the factoring company's OCR tool creates problems that compound at every page count.
Character recognition is unreliable. "It reads fives as S," the operations manager said. "It just doesn't read correctly. So there's a lot of manual that's done." Every misread character means someone has to stop, compare the extraction against the original, and correct it by hand. At 30 seconds to a minute per invoice just for verification, that manual correction time adds up across 31,600 schedules a year.
Processing speed is slow. The tool sends threads to read each page sequentially, working through every single page in order before returning results. On a 100-page schedule with 20 to 30 invoices, the team is waiting for the OCR to crawl through pages before they can even start verifying the output.
When Lido ran a live demo on one of their sample documents, the extraction came back with 100% accuracy, matching a $70,882.56 total exactly. The operations manager's reaction: "I think it's real impressive." But the speed difference was what stood out most. "The extraction time is even way faster than the OCR system trying to read each page for the data. It's a major difference."
The ceiling is the most dramatic symptom. But slow processing and poor accuracy are the disease.
Page limits in OCR tools aren't bugs. They're architectural limitations baked into how these systems were built.
Legacy OCR processes documents page by page, sequentially. Each page gets loaded into memory, run through character recognition, and the output gets assembled into a result set. Processing time scales linearly with page count, or worse. A 150-page document takes at least three times as long as a 50-page document, and memory usage compounds because the system has to hold the growing result set while processing each new page.
Verification interfaces compound the problem. Most OCR tools display extracted data alongside the source document so operators can check accuracy. When the document is 10 pages, scrolling through to verify is manageable. At 150 pages, the interface isn't just slow, it's architecturally unprepared for that volume of data. The lag the factoring company described isn't a performance bug. It's a UI that was never designed to handle documents of that size.
These constraints also explain why tools don't just raise the limit. You can't fix a sequential processing architecture by adding more memory. You can't fix a verification UI designed for 10-page documents by making it scroll faster. The limit exists because the entire system, from ingestion to extraction to review, was built for a different scale of document.
A gas distribution company processing 27,000 documents a month ran into a similar architectural wall with their extraction tool. They'd outgrown its capabilities, and their operations lead solicited "the grossest documents possible" from the team to test Lido against. After seeing the results: "I have full confidence that with the right prompt, this will pull with 100% accuracy."
When evaluating extraction tools for large documents, the criteria that matter are straightforward.
No page limits. The tool should process a 500-page document with the same reliability as a 5-page one. If there's a cap, or if performance degrades above a threshold, the tool will eventually force the same manual workarounds you're trying to escape.
Processing speed that doesn't degrade. Sequential page-by-page processing means every additional page adds proportional time. Tools built for large documents process in parallel or use architectures where page count doesn't linearly dictate processing time.
Accuracy that holds at volume. OCR accuracy on page 1 needs to match accuracy on page 500. Character misreads, field confusion, and extraction errors that happen occasionally on short documents become constant on long ones if the underlying engine isn't reliable.
Relay, a healthcare billing company handling Medicaid claims for K-12 districts, processes over 16,000 claims. Each claim runs 700+ pages. Before Lido, a single batch took weeks or months. "Lido turned a process that used to take weeks or months into just hours," said Tara Goebel, their operations lead. The team now saves 100+ hours each week and has seen a 500% increase in capacity.
A telecom expense management firm tested Lido on carrier invoices. They processed 72 invoices in under 45 minutes, work that previously took the team a full day. A 34-page Verizon invoice processed in 7 seconds. A 70-page invoice in 8 seconds or less. As their operations lead described it: she did 72 invoices in less than 45 minutes, where it took the team a full day.
The factoring company that manually keys 3,000+ schedules a year could eliminate that entire manual workflow. Not by raising a page limit, but by using a tool that doesn't have one.
Lido uses a custom blend of AI vision models, OCR, and LLMs to extract data from documents of any length. No page limits, no performance degradation on long documents.
Relay processes 16,000+ claims at 700 pages each, saving 100+ hours per week. A telecom expense management firm processed 72 carrier invoices in 45 minutes instead of 8 hours. The factoring company's own test produced 100% accuracy on their sample, with extraction speeds that were "a major difference" from their current tool.
Page limits are an architectural choice, not an inevitability. If your tool can't handle the documents your business actually produces, the tool is the constraint, not the documents.