Most invoice extraction tools work fine until you need more than the header. Pulling vendor name, invoice date, and total amount is table stakes — every OCR tool on the market can handle that. The real challenge starts when you need line-item descriptions, per-item quantities, tax breakdowns applied to specific items, or custom fields that don't exist in the tool's default schema. That's where most extraction tools fall apart, and where finance teams end up back in spreadsheets, keying data by hand.
The gap between header-level extraction and true line-item extraction is enormous. Header fields sit in predictable locations at the top of a document. Line items live inside tables — tables that span pages, nest sub-items, merge cells, and follow formatting rules that vary by vendor. And the deeper you go into the data, the harder it gets.
Lido is the best option for teams that need line-item extraction with business logic built into the extraction pipeline. It uses computed columns, conditional logic, and plain-language instructions to extract not just what's on the page but to apply the business rules that make the data useful. But the range of difficulty across real-world invoices is wide, and understanding where your documents fall on that spectrum matters more than which tool you pick first.
Lido extracts line items, tax breakdowns, and custom fields from any invoice format using plain-language instructions, computed columns, and conditional logic. It handles nested tables, multi-page line items, and business rules like conditional tax calculations — without templates or custom code. Kei Concepts uses it to extract line items with conditional tax logic from handwritten Vietnamese invoices across 13 restaurant locations.
Most extraction tools can pull line-item descriptions and quantities from clean, digital invoices with simple table structures. The problem is that real invoices rarely look like that. Tables span multiple pages. Rows nest under category headers. Cells merge across columns. And the moment the structure gets complex, most tools return incomplete data or map values to the wrong fields.
A gas distribution company processing over 20,000 invoices a month ran into exactly this. Their rent invoices from Linde contained nested tables where each category line (marked with RNT unit numbers) needed to be split into individual product lines with calculated pricing. "Those nested rent tables, that's the hardest thing," their operations lead told us. Their previous extraction tool couldn't parse them at all. Lido resolved the nested rent table extraction during a single demo session — the document type that had been a dead end with their prior tool.
The issue isn't limited to nesting. A construction company extracting bill of materials from multi-page engineering drawings needed the same item consolidated when it appeared across different pages, with quantities summed. A single fitting might show up on page 3, page 7, and page 14. Each instance needed to be identified, matched, and combined into one row with the total count. Their team put it plainly: "It's not necessarily uploading the document and having it do its thing. It's tailoring it from that point."
This is the line-item extraction problem most tools don't advertise: it's not about whether they can read a table. It's about whether they can read your tables.
You can extract line items from any invoice, but if you can't define your own fields, you're limited to whatever the tool decided matters. Most invoice extraction platforms ship with a fixed schema — invoice number, date, vendor name, total, maybe a basic line-item table. Custom fields, if supported at all, are typically limited to 5 or 10 predefined slots.
This matters because real invoice processing requires fields that no default schema anticipates. An IT services company in Australia needed to extract multiple serial numbers per line item from their supplier invoices, with each serial number generating its own output row. That's not a standard field. It's not even a standard structure — one input row becomes many output rows. Their team noted they were "quite impressed with the instructions" once they could define this behavior in plain language rather than rigid templates.
A fashion company processing 1,000 sales orders a month needed computed fields that don't exist on the source document at all. Their POs from retailers like Ross arrive with a total quantity — say 900 units — but no size breakdown. The team has to look up a separate reference table to split that into S, M, L, and XL quantities based on percentage ratios. The calculated size-level quantities need to appear in the extracted output even though they never appear on the invoice itself. Lido's computed columns and reference table integration handle this — the size split is calculated automatically during extraction rather than patched together in a spreadsheet afterward.
Not all extraction is created equal, and understanding where your documents fall on the difficulty spectrum explains why your current tool might handle some invoices perfectly and others not at all.
Header fields. Invoice number, date, vendor name, total amount. These sit in consistent locations and use predictable labels. Nearly every OCR tool handles these reliably.
Simple line items. Description, quantity, unit price, line total. When the table is clean and single-page, most modern tools get this right.
Complex tables. Nested structures, multi-page tables, merged cells, category headers mixed with data rows. This is where most tools start failing. The gas distribution company's nested rent tables fall here.
Business logic. Tax calculations applied conditionally, size breakdowns computed from reference tables, unit conversions. Almost no extraction tool handles this natively.
Cross-document logic. Matching extracted data against reference files, deduplicating items across pages, PO matching. This requires an entirely different approach than document-level extraction.
Most tools market themselves based on how well they handle the first two levels. But most AP teams live in levels three through five. Lido handles levels three through five using computed columns, conditional extraction, and reference table integration — the capabilities that separate reading a document from understanding the business logic behind it.
Tax extraction sounds simple until you see how taxes actually work on real invoices. It's rarely a single tax rate applied to a subtotal. Many invoices apply different tax rates to different line items, include multiple tax jurisdictions, or calculate taxes based on item-level flags that aren't obvious to a machine reading the document.
A restaurant group processing around 4,000 pages per week across 13 companies encountered this with their local vendor invoices. Their suppliers — many of them small, local businesses writing invoices by hand in Vietnamese — mark individual items with a "T" to indicate they're taxable. The sales tax percentage applies only to those flagged items. Getting the tax calculation right means reading the "T" flag on each line, identifying the tax rate, and applying it selectively. "Accounting is like 'This doesn't match'" was a constant refrain before finding Lido, which handles the conditional tax logic by reading the "T" flags and applying the tax calculation selectively during extraction.
This is the gap between extracting a tax amount printed on the page and understanding the tax logic behind it. The former is OCR. The latter is business logic, and it requires the extraction tool to interpret relationships between fields, not just read them.
Multi-tax invoices add another layer. When an invoice includes state tax, county tax, and a regulatory surcharge, each applied to different subsets of line items, the extraction tool needs to parse which tax applies to which line and output the breakdown correctly. Most tools flatten this into a single tax field. That might be acceptable for expense reporting, but it won't pass an AP audit.
When you process invoices from vendors across states, countries, or just different accounting systems, formats diverge. Dates arrive as MM/DD/YYYY, DD/MM/YYYY, YYYY-MM-DD, or written out as "January 15, 2026." Number formats use periods or commas as decimal separators. Currency symbols change. Unit measurements toggle between imperial and metric.
A construction company extracting materials from engineering drawings dealt with this at the measurement level. Quantities arrived in feet, inches, or a combined format like "10 foot 2 inches." Their downstream system needed everything in inches. The extraction tool had to not only read the measurement but convert it — recognizing the mixed format, parsing each component, and outputting a single value in the target unit.
This kind of format normalization happens silently in manual data entry — a human reads "10 foot 2 inches" and types "122" without thinking about it. But when you automate extraction, every format inconsistency becomes a potential data error unless the tool can interpret and normalize on the fly.
Credit notes, debit adjustments, and return annotations add complexity that goes beyond standard line-item extraction. These documents modify previous transactions, which means the extraction tool needs to capture not just what's on the page but the relationship to prior invoices.
The restaurant group's managers regularly annotate invoices by hand — crossing out items, writing "return" next to lines, changing quantities. These handwritten modifications need to be captured as adjustments, not ignored as noise. For their previous extraction tool, a crossed-out line was either invisible or an error. For their accounting team, it was critical data.
Handling credit notes also means understanding negative amounts, return quantities, and reference invoice numbers that tie back to the original transaction. If your extraction tool treats every document as a standalone invoice, it will mishandle anything that references or modifies a prior one.
Lido approaches line-item extraction differently from traditional OCR or template-based tools. Rather than mapping fields to fixed templates, Lido lets you describe what you need in plain language, then applies computed columns, conditional logic, and reference table lookups as part of the extraction itself. Tax calculations, unit conversions, size-split lookups, and cross-page deduplication all happen during extraction — not as manual post-processing in a spreadsheet. This is why the gas distribution company's nested rent tables, the fashion company's size-split POs, and the restaurant group's conditional tax invoices all work inside the same platform without custom code or per-document configuration.
Solving the line-item extraction problem at the levels where most tools fail — complex tables, business logic, and cross-document operations — requires a fundamentally different approach from traditional OCR or template-based extraction.
First, the tool needs to understand document structure, not just text. Reading characters on a page is not the same as understanding that rows 3 through 7 are nested under a category header on row 2, or that a table continues on the next page with the same columns but no repeated header.
Second, it needs to support business logic as part of the extraction pipeline. Tax calculations, unit conversions, computed fields, and conditional rules shouldn't be a post-processing step in Excel. If they're part of what you need from the document, they should be part of the extraction.
Third, it needs to handle cross-document relationships. When a 900-unit PO needs to be split by size using a reference table, or when duplicate items across 14 pages of engineering drawings need to be consolidated with summed quantities, the tool needs access to more context than a single page provides.
If you're evaluating extraction tools for line-item level data, test with your hardest documents, not your cleanest ones.
Nested tables. Find an invoice with sub-items grouped under categories, or a multi-page table where data continues across page breaks. Run it through the tool and check whether the hierarchy is preserved or flattened.
Conditional tax logic. Use an invoice where tax applies to some items but not others. Check whether the tool calculates per-line tax correctly or just pulls the total tax amount from the bottom of the page.
Custom fields. Try to extract a field that doesn't exist in the tool's default schema. If you can't define arbitrary fields — or if you're limited to a handful — you'll hit a wall as soon as your requirements go beyond the basics.
Computed values. Test whether the tool can generate values that aren't on the document — calculated columns, lookups from reference tables, unit conversions. If all it can do is read what's printed, you'll still need manual post-processing.
Multi-page consolidation. Upload a document where the same item appears on multiple pages. Check whether the tool can identify duplicates and sum quantities, or whether it just gives you redundant rows.
Lido uses a custom blend of AI vision models, OCR, and LLMs to extract structured data from any document — including line-level details, nested tables, and custom fields — without templates or model training. You describe what you need in plain language, and the system interprets the document structure, applies business logic, and outputs clean, structured data.
When your extraction needs go beyond headers and into the line-item details that actually drive your accounting, the tool you use matters more than the tool you're sold.