Blog

Smoker CPA: 2 Hours to 7 Minutes with AI Extraction

March 20, 2026

Smoker CPA reduced the time spent on data extraction per client engagement from 2 hours to 7 minutes using Lido’s AI document extraction. The firm processes 11 different document types across 600+ clients, including W-2s, 1099s, K-1s, bank statements, and payroll reports. The 94% time reduction per engagement freed the team to focus on advisory work instead of manual data entry.

CPA firms have a document problem that no amount of hiring solves. Every client sends different documents from different institutions in different formats. A W-2 from ADP looks nothing like a W-2 from Paychex. A 1099 from Charles Schwab has a different layout than a 1099 from Fidelity. Bank statements from a regional credit union bear no resemblance to bank statements from Chase.

Smoker CPA deals with this across 600+ clients and 11 distinct document types. Before Lido, extracting data from a single client’s documents took approximately 2 hours. After implementing AI document extraction, that same work takes 7 minutes. That’s a 94% reduction in time spent on the part of client work that generates the least value.

Why CPA document processing is uniquely hard

Most industries that struggle with document processing struggle with volume. A property management company processes 2,000 utility bills a month. A healthcare BPO processes 6,000 claims. The documents are repetitive. The challenge is scale.

CPA firms face a different problem. The challenge is variety.

Smoker CPA processes 11 document types: W-2s, 1099-NECs, 1099-INTs, 1099-DIVs, 1099-Rs, K-1s, bank statements, payroll reports, brokerage statements, and other tax-related documents. Each type has a canonical set of fields. But the actual format of those fields changes with every issuing institution. A K-1 from a large partnership fund has a different layout, different supplemental schedules, and different formatting than a K-1 from a small family LLC.

Across 600+ clients, the number of format variations is effectively unlimited. You cannot build templates for all of them. You cannot train a model on all of them. There are always new institutions, new form revisions, and new clients whose documents you have never seen before.

This is the same problem Legacy CPA identified when they described processing 3,500 audits per year across “thousands of payroll formats.” The format variability exceeds any team’s ability to pre-configure for it. The only viable approach is a system that reads documents the way a human does: by understanding what the fields mean, regardless of where they appear on the page. Our deep dive on how CPA firms handle format variability covers this dynamic in detail.

11 document types, 600+ clients

The math of CPA document processing during tax season is punishing. If each of 600 clients submits an average of 8 documents, that’s 4,800 documents to process in roughly four months. At 2 hours per engagement for data extraction alone, the firm was spending 1,200 hours per season on work that does not require CPA expertise.

Those 1,200 hours are not evenly distributed. Tax season is compressed. The bulk of documents arrive in January and February, with a second wave before the April deadline. The work comes in surges, and each surge pushes against the firm’s capacity ceiling.

The 11 document types add complexity beyond the format variability problem. Each type has different fields to extract. A W-2 has wages, federal tax withheld, state tax withheld, Social Security wages, and employer identification. A K-1 has ordinary business income, rental income, interest, dividends, capital gains, self-employment earnings, and often dozens of supplemental fields. The person processing these documents needs to know what to look for on each type, which fields matter for the return, and how to handle edge cases.

This is skilled work masquerading as data entry. And that mismatch is what makes it so expensive. You’re paying CPA-level salaries for spreadsheet-level tasks.

What the old workflow looked like

Before Lido, the workflow for a single client engagement went something like this:

Receive the client’s document package. Open each document. Identify the document type and issuing institution. Locate the relevant fields on each form. Manually key the values into the firm’s tax preparation software or a working spreadsheet. Cross-check the entered values against the source documents. Reconcile any discrepancies. Move on to the next document.

Two hours per engagement. Every engagement. Every client. Every season.

The 2-hour figure includes the time spent locating fields on unfamiliar formats. When a client’s brokerage switches from one statement layout to another, the person processing it spends extra time figuring out where the dividend income line moved to. When a new client brings documents from an institution the firm hasn’t seen before, that lookup time is even longer.

Hiring more staff doesn’t solve this efficiently. Training a new hire on 11 document types from hundreds of institutions takes weeks. And the fundamental bottleneck remains: there are always new formats, and the person processing them always has to figure out each one from scratch.

How Smoker CPA uses Lido now

The new workflow replaces manual field-by-field data entry with AI extraction and human review.

Documents go into Lido. The system reads each document, identifies the relevant fields based on extraction instructions the firm has configured, and outputs structured data. The firm’s team reviews the output, verifies it against the source document, and exports the clean data into their tax preparation workflow.

Seven minutes per engagement. That includes upload, extraction, review, and export.

The shift from 2 hours to 7 minutes comes from eliminating the typing entirely. The human role changes from data entry (read the document, type the value) to quality assurance (check the extracted value, confirm or correct). Reading and confirming is faster than reading and typing, and the error rate is lower because the cognitive load is lighter.

Lido handles the format variability natively. A W-2 from ADP, a W-2 from Gusto, and a W-2 from a hand-typed small business all contain the same fields. The AI locates those fields regardless of layout. The firm doesn’t build templates for each institution or train models on sample documents. The extraction works on the first document from every institution because the system understands what a W-2 is, not what an ADP W-2 looks like at specific pixel coordinates.

The same principle applies across all 11 document types. K-1s, 1099s, bank statements, payroll reports. Each has a defined set of target fields. The AI finds them. The human confirms them.

The results: 2 hours to 7 minutes

The numbers tell a clear story.

Before Lido, each client engagement required 2 hours of data extraction. Across 600+ clients, that added up to 1,200+ hours per season dedicated to manual data entry.

After implementing Lido, the same work takes 7 minutes per engagement. Across 600+ clients, that’s approximately 70 hours per season, a reduction of over 1,100 hours.

Those 1,100 hours don’t disappear. They convert to capacity for other work. Advisory services. Tax planning. Client communication. Business development. Or, in the bluntest terms, the firm can serve more clients without hiring more people.

The 94% time reduction also changes the economics of flat-fee engagements. If a client pays a flat rate for tax preparation and the data extraction portion used to consume 2 hours of staff time, that was a significant chunk of the engagement’s margin. At 7 minutes, the data extraction cost per engagement drops to near zero. The firm keeps the same fee with dramatically lower cost of delivery.

For firms billing hourly, the calculus is different but the outcome is similar. The hours freed from data extraction become available for higher-value billable work. At a typical CPA billing rate of $150-250/hour, 1,100 recovered hours represent $165,000-$275,000 in potential additional revenue capacity per season.

This pattern echoes what other accounting firms have experienced. Legacy CPA, processing 3,500 audits per year, found that template-based extraction was structurally impossible at their level of format variability. The solution in both cases is the same: AI that reads documents contextually rather than matching them to preconfigured layouts.

Why CPA firms face unique document challenges

Other industries have document processing problems. Manufacturing deals with purchase orders. Healthcare deals with claims. Logistics deals with bills of lading. But CPA firms face a combination of factors that makes their problem particularly resistant to traditional automation.

Seasonal compression. Tax season creates a 4-month window where the majority of document processing must happen. There is no option to spread the work evenly across the year. Automation needs to handle peak volume, not average volume.

Client-controlled inputs. CPA firms do not control what documents their clients send or what format those documents are in. A manufacturing company can standardize its PO format. A CPA firm cannot tell Charles Schwab to redesign their 1099-DIV.

Regulatory precision. Extracted data flows directly into tax returns filed with the IRS. Errors have consequences beyond operational inefficiency. An incorrect W-2 wage figure means an incorrect tax liability, which means an amended return, which means unhappy clients and potential penalties.

High document type count. Most industries deal with 2-4 document types at high volume. CPA firms deal with 10+ document types, each with its own field schema. The extraction system needs to handle all of them, and the team needs to configure and validate extraction for each type.

These factors together explain why CPA firms were among the last industries to adopt document automation. The tools that worked for invoice processing in AP departments couldn’t handle the variety, precision requirements, and seasonal intensity of tax document processing. AI-first extraction, which works on any document format without templates, is the first approach that fits the CPA use case structurally.

Frequently asked questions

What document types can Lido extract for CPA firms?

Lido extracts data from all standard tax and financial documents: W-2s, 1099 variants (NEC, INT, DIV, R, MISC), K-1s, bank statements, brokerage statements, payroll reports, and other tax-related documents. The system works across formats from any issuing institution without requiring templates or training per institution. Smoker CPA processes 11 distinct document types across 600+ clients using Lido.

How accurate is AI extraction on tax documents?

On clean, digitally generated tax forms like W-2s and 1099s, Lido achieves 99%+ field-level accuracy. On scanned or lower-quality documents, accuracy depends on input quality but typically exceeds 95%. The workflow includes a human review step where the CPA or staff member confirms extracted values against the source document, catching any errors before data enters the tax preparation system.

Does this work for firms of all sizes?

Yes. Solo practitioners and small firms see the largest per-person impact because every hour saved goes directly back to the owner or a small team. Smoker CPA processes 600+ clients. Larger firms like Legacy CPA, which handles 3,500 audits per year, use the same approach at higher volume. The time savings scale with client count regardless of firm size.

How long does it take to set up Lido for tax document processing?

Initial setup involves configuring extraction instructions for each document type you process. For a firm handling 11 document types, expect a few hours of setup across all types. Once configured, the extraction instructions are reusable across every client and every institution’s format of that document type. Most firms are processing their first documents within minutes and have a production workflow within a day or two.

Can Lido handle K-1s with supplemental schedules?

Yes. K-1s are among the most complex tax documents because they vary significantly by entity type and often include supplemental schedules with footnotes, separately stated items, and partner-specific allocations. Lido extracts from both the standard K-1 fields and supplemental pages, handling the layout variations between K-1s from different partnerships, S-corps, trusts, and estates.

What happens when a client sends documents from an institution Lido has never seen?

Lido extracts from it on the first attempt. There are no templates to build and no models to train per institution. The system reads the document contextually, locating the fields you’ve defined (wages, tax withheld, dividends, etc.) regardless of where they appear on the page. This is the core advantage for CPA firms: with 600+ clients using hundreds of different financial institutions, every season brings formats the system has never seen. They all work.

Ready to grow your business with document automation, not headcount?

Join hundreds of teams growing faster by automating the busywork with Lido.