Blog

How to Extract Data From a PDF Form: 5 Methods for 2026

June 7, 2026

To extract data from a PDF form, use an AI-powered tool like Lido that reads both fillable and scanned forms and outputs structured data into Excel, Google Sheets, or CSV. For fillable PDF forms, you can also export form data directly from Adobe Acrobat. For scanned paper forms, you need OCR to read the text from the image before extracting it.

PDF forms are one of the most common document types in business. Tax forms, insurance applications, patient intake forms, and job applications all use PDF format. The data inside them is useful, but getting it out and into a spreadsheet or database is rarely straightforward.

This guide covers how to extract data from a PDF form, whether it is a fillable digital form or a scanned paper form, and how to get the results into Excel.

Two Types of PDF Forms

How you extract data from a PDF form depends on what kind of form you are working with. The two types look identical on screen but are completely different under the hood.

Fillable (Interactive) PDF Forms

These are PDF forms with actual form fields built into the file. Text boxes, checkboxes, dropdown menus, and radio buttons are embedded as interactive elements. When someone fills out the form, their responses are stored as structured data inside the PDF.

You can tell a form is fillable by clicking on a field. If the cursor changes and you can type into the field, it is interactive. Fillable forms are the easiest to extract data from because the field names and values are already structured.

Scanned (Flat) PDF Forms

These are photographs or scans of paper forms saved as PDF files. The PDF contains an image of the page, not actual form fields. Even if someone filled out the paper form by hand or with a typewriter, the responses are part of the image and not stored as data.

You can tell a form is scanned by trying to click on a field. If nothing happens and you cannot interact with any element, it is a flat image. Extracting data from scanned forms requires OCR to read the text from the image first.

Method 1: Export Form Data From Adobe Acrobat

If the PDF form is fillable, Adobe Acrobat can export the form field data directly. Open the form in Acrobat Pro, go to Tools > Prepare Form, then click "More" and select "Export Data." Acrobat exports the field names and values as an FDF, XFDF, XML, or TXT file.

This method only works on fillable forms with interactive fields. It does not work on scanned forms, flattened PDFs (where the form fields have been merged into the page image), or forms where the data was typed as regular text rather than entered into form fields.

Method 2: Copy and Paste

For fillable forms, you can click on each field, select the value, copy it, and paste it into your spreadsheet. For forms with selectable text (not scanned), you can select text regions and copy them directly.

Copy-paste works for extracting a few values from a single form. It becomes impractical at any volume. If you have 20 forms with 15 fields each, that is 300 individual copy-paste operations. It is also error-prone since it is easy to paste a value into the wrong cell.

Method 3: Use a Python Script

Python libraries can extract form data from fillable PDFs programmatically. PyPDF2 reads form field values, and pdfplumber extracts text from specific regions of the page. Here is a simple example using PyPDF2.

from PyPDF2 import PdfReader
import csv

reader = PdfReader("application.pdf")
fields = reader.get_fields()

with open("output.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["Field", "Value"])
    for name, field in fields.items():
        writer.writerow([name, field.get("/V", "")])

This script reads every form field from a fillable PDF and writes the field names and values to a CSV file. It works well for batch processing many forms with the same fields. It does not work on scanned forms or PDFs without interactive form fields.

Method 4: Use Google Docs OCR

Upload the scanned PDF form to Google Drive, then open it with Google Docs. Google Docs converts the image to editable text using built-in OCR. You can then copy the text from the converted document.

This method is free and handles simple scanned forms with clear text. It struggles with handwritten responses, checkboxes, and forms with complex layouts. The converted text often loses the relationship between field labels and values, so you need to manually match each response to its corresponding field.

Method 5: Use AI-Powered Extraction

AI tools like Lido read both fillable and scanned PDF forms using a combination of OCR and machine learning. The AI understands the form layout, identifies field labels and their corresponding values, and outputs structured data with each field properly labeled.

This is the only method that handles every form type reliably. Fillable forms, scanned paper forms, handwritten responses, checkboxes, and multi-page forms all work on the first upload with no templates or configuration. The AI understands which value belongs to which field, even on forms it has never seen before.

How to Extract Data From a PDF Form to Excel

Most teams need form data in Excel for analysis, reporting, or import into another system. Here is how to get PDF form data into Excel with each method.

Acrobat Export to Excel

Acrobat exports form data as FDF, XML, or TXT. To get it into Excel, export as TXT (tab-delimited) or XML, then open the file in Excel using File > Open. Excel interprets tab-delimited text and XML into rows and columns. This works for fillable forms only.

Python Script to Excel

Modify the Python script to write directly to an Excel file using the openpyxl library instead of CSV. Or export to CSV and open the CSV file in Excel. For batch processing, you can loop through a folder of PDF forms and write all results to a single Excel workbook with one row per form.

Lido to Excel

Lido exports extracted form data directly to Excel. Upload your PDF forms, review the extracted fields, and export. Each form becomes a row in the spreadsheet with field values in the correct columns. Lido also exports to Google Sheets, CSV, and QuickBooks.

For teams that process forms regularly, Lido connects to an email inbox so incoming PDF form attachments are extracted and exported to Excel automatically. No manual uploads, no copy-paste, no cleanup.

Lido delivers 99%+ field-level accuracy on form data, handles both fillable and scanned forms, and is SOC 2 Type II and HIPAA compliant. Start with 50 free pages to test it on your own forms.

Now that you know how to extract data from a PDF form, you can choose the method that matches your form type and volume.

Frequently asked questions

How Do I Extract Data From a PDF Form?

For fillable forms, export the form data from Adobe Acrobat or use a Python script with PyPDF2. For scanned paper forms, use an OCR tool or an AI extraction tool like Lido. AI tools handle both types and output structured data to Excel, Google Sheets, or CSV.

How Do I Extract Data From a PDF Form to Excel?

Use an AI tool like Lido that exports form data directly to Excel. Upload the form, review the extracted fields, and export. Alternatively, export from Acrobat as a TXT or XML file and open it in Excel, or use a Python script that writes form field values to an Excel file.

Can I Extract Data From a Scanned PDF Form?

Yes, but you need a tool with OCR. Google Docs offers free OCR for simple forms. AI tools like Lido include OCR and also understand form structure, so they correctly match field labels to their values even on scanned documents.

How Do I Extract Data From Multiple PDF Forms at Once?

Use a Python script to loop through a folder of forms and extract all fields, or use an AI tool like Lido that processes forms in bulk. Lido can also connect to an email inbox to process incoming PDF forms automatically.

Can I Extract Handwritten Data From a PDF Form?

Standard OCR tools struggle with handwriting. AI-powered tools like Lido use machine learning models trained on handwritten text and can read most legible handwritten responses. Accuracy depends on the clarity of the handwriting.

What Is the Difference Between a Fillable and a Flat PDF Form?

A fillable PDF form has interactive fields (text boxes, checkboxes, dropdowns) that store responses as structured data. A flat PDF form is an image of a form with no interactive fields. Fillable forms are easier to extract data from because the values are already structured. Flat forms require OCR to read the text from the image.

Ready to grow your business with document automation, not headcount?

Join hundreds of teams growing faster by automating the busywork with Lido.