How to Copy and Extract Tables from PDF to Excel

June 16, 2026

This works on native (digitally created) PDFs with simple table layouts. For scanned PDFs, multi-page tables, or complex formatting, you need a tool that understands table structure: Adobe Acrobat exports PDFs to Excel with decent table preservation, Python libraries like tabula-py extract tables programmatically, and AI-powered tools like Lido extract table data from any PDF format, including scans, without templates or manual cleanup.

The right method depends on what kind of PDF you have and how often you need to do this. A one-time copy-paste from a native PDF takes 30 seconds. Extracting tables from 50 scanned invoices every week is a completely different problem. The five methods below cover the full range, from the simplest manual option to fully automated AI extraction, so you can pick the one that fits your situation.

The core problem is that PDFs do not store data in rows and columns. A PDF stores text as individually positioned characters at specific x-y coordinates on a page. What looks like a table to you is just characters that happen to be aligned in a grid. Every method for extracting a table from a PDF has to reconstruct the row-and-column structure from those character positions. That reconstruction breaks in predictable ways depending on table complexity.

Method 1: Copy and paste (free, instant, limited)

Open your PDF in any PDF reader (Adobe Reader, Preview on Mac, Chrome's built-in viewer, or Edge). Select the table text by clicking and dragging across the table area. Copy it (Ctrl+C or Cmd+C). Open Excel and paste (Ctrl+V or Cmd+V).

When this works: Native PDFs (created digitally by software, not scanned from paper) with simple tables that have clear column separation, no merged cells, no spanning headers, and fit on a single page. If you can select the text in your PDF reader and the selection highlights the text in reading order, copy-paste has a reasonable chance of producing usable output.

When this breaks:

Fix for single-column paste: If everything pastes into one column, select the column in Excel, go to Data > Text to Columns, choose “Fixed width,” and manually set column breaks. This works for simple tables with consistent spacing but fails on tables where values have different widths across rows.

Use copy-paste for a single, simple table from a native PDF that you need once. Beyond that, the cleanup time exceeds whatever you save by skipping a proper tool.

Method 2: Adobe Acrobat export (paid, better tables)

Adobe Acrobat Pro can export a PDF directly to an Excel (.xlsx) file. Open the PDF in Acrobat, click File > Export a PDF > Spreadsheet > Microsoft Excel Workbook, choose your save location, and open the resulting file in Excel. The entire process takes under a minute for most documents.

Acrobat understands PDF internals better than anything else on the market because Adobe created the format. Its export engine identifies table boundaries, column structure, and cell alignment from the PDF's internal positioning data. For native PDFs with well-structured tables, Acrobat typically preserves the row-and-column layout accurately enough to work with immediately.

When this works well:

Limitations:

Acrobat is the strongest option among traditional file-conversion tools. If you already have an Acrobat subscription and process native PDFs with reasonably clean table formatting, it handles most single-page tables well. For scanned documents or high-volume processing, the cleanup time per document adds up.

Method 3: Free online converters (free, privacy trade-off)

Dozens of free websites convert PDF to Excel: Smallpdf, ILovePDF, PDF24, Zamzar, PDF2Go, and others. The workflow is identical across all of them: upload your PDF, wait for processing, download the Excel or CSV file. No software installation, no account required (on most), and results in under a minute for small files.

How they compare:

ToolFree tier limitsTable qualityScanned PDFsPrivacySmallpdf2 files/dayBasicLimited OCRFiles deleted after 1 hourILovePDF1-2 files/dayBasicLimited OCRFiles deleted after 2 hoursPDF24UnlimitedBasicBasic OCRDesktop app available (local processing)Zamzar2 files/day, 50MB maxBasicNoFiles stored 24 hoursPDF2Go3 files/dayBasicOCR availableFiles deleted after 24 hours

When to use them: Occasional one-off conversions of simple, non-sensitive PDFs with basic table layouts. If you need to convert a single PDF once and the table is straightforward, a free converter saves you from installing software.

Why they fail on real-world tables:

Free online converters occupy the space between copy-paste and paid tools. They handle simple tables better than copy-paste but fall short of Acrobat or AI extraction on anything complex. If privacy matters or you process more than a few documents per week, they are not the right solution.

Method 4: Python libraries (free, technical, scalable)

For developers and technical users, Python libraries offer precise control over PDF table extraction. The two primary options are tabula-py (a Python wrapper around Tabula) and Camelot. Both are free, open-source, and run locally on your machine.

tabula-py detects tables in native PDFs and extracts them as pandas DataFrames. Basic usage:

import tabula # Extract all tables from a PDF tables = tabula.read_pdf("invoice.pdf", pages="all") # Export the first table to Excel tables[0].to_excel("output.xlsx", index=False)

tabula-py uses two extraction modes: “lattice” for tables with visible borders (gridlines), and “stream” for tables without borders (using whitespace alignment). You can also specify exact page areas to extract from, which avoids capturing non-table content.

Camelot offers similar functionality with more control over table detection parameters:

import camelot # Extract tables using stream mode (borderless tables) tables = camelot.read_pdf("invoice.pdf", flavor="stream") # Export to Excel tables[0].to_excel("output.xlsx")

Camelot reports a per-table accuracy score, letting you identify tables that may need manual review. It also handles tables with merged cells better than tabula-py in most cases.

Strengths of the Python approach:

Limitations:

Python libraries are the best option for developers who process native PDFs from a small number of consistent sources and need programmatic control. They are not practical for non-technical users, for scanned documents, or for workflows with high format diversity. For a broader view of table extraction approaches, see our table extraction software guide.

Method 5: AI-powered extraction (handles everything)

AI-powered table extraction reads a PDF the way a person does: it looks at the page, identifies where tables are, understands what each column represents, and outputs structured data with correct column headers and row alignment. It works on native PDFs, scanned documents, photos of printed pages, and any layout the other methods struggle with.

With Lido, the process is: upload a PDF (or forward it by email, or connect a cloud storage folder), and the AI identifies every table in the document and returns structured data in Excel, Google Sheets, CSV, or JSON. No templates to define, no extraction zones to draw, no training data to provide, and no Python code to write.

What AI extraction handles that other methods cannot:

When to choose AI extraction over other methods:

Lido offers 50 free pages per month, with paid plans starting at $29/month. For a detailed comparison with other extraction tools, see how to extract data from any PDF.

Why PDF tables break during extraction (the technical reason)

Understanding why table extraction is hard helps you choose the right tool and set realistic expectations for each method.

A PDF does not contain a “table” object. There is no internal tag that says “this is a table with 5 columns and 12 rows.” Instead, a PDF stores a sequence of text-drawing instructions: “place the character ‘I’ at position (72.3, 401.2), place ‘n’ at (77.1, 401.2), place ‘v’ at (82.0, 401.2)...” and so on for every character on the page. Table borders are separate line-drawing instructions that happen to form a grid shape.

Any tool that extracts tables from a PDF must reverse-engineer the table structure from those raw positioning instructions. This involves:

Simple tables with visible borders, consistent spacing, and no merged cells are straightforward for any tool to parse. The difficulty escalates rapidly with: borderless tables (no gridlines to anchor column detection), multi-line cells (break row assumptions), merged cells (break column assumptions), multi-page spans (no structural continuity signal between pages), and irregular spacing (different column widths in different rows).

This is why each method has a different failure point. Copy-paste fails on anything beyond the simplest tables because the clipboard captures characters in reading order without column structure. Acrobat fails on complex layouts because its parser uses heuristics for column detection that break on irregular spacing. Python libraries fail on scanned PDFs because they have no text positions to analyze. AI extraction handles the broadest range because it reasons about document structure visually, the same way a person would when reading a table they have never seen before.

Which method should you use?

Match your situation to the right approach:

Your situationBest methodWhySimple native PDF, one-time needCopy-pasteFree, instant, no setupNative PDF, decent tables, have AcrobatAdobe Acrobat exportBest file-conversion qualityOccasional simple PDF, no paid toolsFree online converterNo install, handles basic tablesDeveloper, batch processing, native PDFsPython (tabula-py/Camelot)Free, scriptable, preciseScanned PDFsAI extraction (Lido)Only method that handles scans reliablyMulti-page tablesAI extraction (Lido)Merges pages into continuous tableMultiple formats, recurring volumeAI extraction (Lido)No per-format setup, automatedInvoices and business documentsAI extraction (Lido)Semantic field labeling, line-item supportPrivacy-sensitive, must stay localPython or PDF24 desktopNo file upload to external server

Most people start with copy-paste, watch it mangle their table, and escalate through the methods above. The pattern is predictable: simple methods work on simple PDFs, and anything more complex demands a tool that actually understands table structure.

If you process business documents (invoices, purchase orders, bank statements, receipts) at any recurring volume, start with Method 5. The other four methods all produce output that needs manual cleanup. AI extraction produces structured, labeled data that is ready to use immediately. The best PDF to Excel converters all use some form of AI extraction because it is the only approach that scales across document diversity without per-format configuration.

Frequently asked questions

Can I copy a table from a scanned PDF to Excel?

Not with copy-paste or basic converters. Scanned PDFs are images with no selectable text, so there is nothing to copy. You need a tool with OCR (optical character recognition) that converts the image to text and then reconstructs the table structure. Adobe Acrobat Pro includes OCR but produces inconsistent table quality on scans. Python libraries like tabula-py cannot process scanned PDFs at all. AI extraction tools like Lido handle scanned PDFs natively, running OCR and table extraction in a single step and outputting structured data to Excel.

Why does my PDF table paste into one column in Excel?

PDFs store text as positioned characters, not as table cells. When you copy text from a PDF, the clipboard captures a stream of characters in reading order without column separators. Excel receives this as a single text stream and places it in one column. You can try Data, Text to Columns, Fixed Width to manually set column breaks. However, this only works reliably on tables with perfectly consistent spacing. For tables with variable-width values, use Adobe Acrobat export or AI extraction instead of copy-paste.

What is the best free way to extract a table from a PDF?

For native PDFs with visible table borders, tabula-py (Python library) produces the cleanest free results with precise area targeting. For non-technical users, Smallpdf and ILovePDF offer free tiers that handle basic tables adequately. PDF24 is completely free with no daily limits and has a desktop app for local processing. Lido offers 50 free pages per month with AI-powered extraction that handles complex tables, scanned documents, and multi-page tables that free converters cannot process. For simple text-only PDFs, copy-paste costs nothing and works instantly.

How do I extract a multi-page table from a PDF to Excel?

Multi-page tables are where most extraction methods fail. Copy-paste requires extracting each page separately and manually aligning rows. Free online converters typically split each page into a separate table. Adobe Acrobat sometimes handles page continuity but often repeats or drops headers on continuation pages. Python libraries require custom code to merge page results. AI extraction tools like Lido process multi-page PDFs as a single document and merge table content across page breaks automatically, producing one continuous table with a single header row.

Can I automate PDF table extraction to Excel?

Yes, through two approaches. Python scripts using tabula-py or Camelot can batch-process native PDFs and export tables to Excel programmatically, which is free but requires coding ability and only works on digitally created PDFs. AI extraction tools like Lido automate the full pipeline: connect an email inbox or cloud storage folder, and every PDF that arrives is automatically processed with table data exported to Excel, Google Sheets, or your ERP. No code required, and it handles scanned documents alongside native PDFs.

Ready to grow your business with document automation, not headcount?

Join hundreds of teams growing faster by automating the busywork with Lido.

Schedule a demo