Free · No Upload · 100% In-Browser

Extract PDF Tables
to Excel Spreadsheet

Convert PDF tables and structured data into editable Excel (.xlsx) spreadsheets directly in your browser. Each PDF page becomes a separate worksheet. No server upload, no account required.

.xlsxExcel Output Format
Per-PageWorksheets
0 KBData Sent to Server
FreeAlways

PDF to Excel Converter

Upload a PDF — extract structured data into editable .xlsx worksheets

Drop your PDF here

or click to browse from your device

Tables Financial Data Reports Invoices Statements
filename.pdf
0 KB
Best results with text-based PDFs containing clear tables and structured data. Scanned image PDFs require OCR processing first. Complex merged cells and multi-column layouts may need minor adjustment after export.
Initializing PDF.js...

Data Preview

Download Excel File
Ready to download
Simple Process

Convert PDF to Excel in Three Steps

1

Upload Your PDF

Drop any text-based PDF containing tables, financial reports, price lists, invoices or structured data. Works best with PDFs that have selectable text rather than scanned images.

2

Configure Extraction

Choose page range, column detection sensitivity and output format. The auto column-detection mode works for most standard table layouts. Adjust to wide or tight for unusual table spacing.

3

Preview and Download

A live data preview shows the extracted content before you download. Each PDF page becomes a separate Excel worksheet. Download as .xlsx or .csv and open straight in Excel or Google Sheets.

Why Choose Us

PDF Table Extraction Right in Your Browser

Powered by PDF.js and SheetJS — two trusted open-source libraries — our converter extracts structured data from PDF documents without sending a single byte to any server.

100% Private

Your PDF — whether it contains financial statements, payroll data, client pricing, or sensitive business reports — never leaves your device. PDF.js and SheetJS run entirely inside your browser's sandbox.

Per-Page Worksheets

Each page of your PDF is extracted into its own named worksheet in the Excel workbook. Page 1 becomes Sheet "Page 1", page 2 becomes "Page 2" and so on — keeping your data logically organised by source page.

Live Data Preview

Before downloading, a scrollable table preview shows exactly what data has been extracted from each page. Switch between page tabs to inspect each worksheet. Confirm the extraction looks right before saving.

Adjustable Column Detection

Three column sensitivity modes — Auto, Wide and Tight — handle different table layouts. Auto works for most standard tables. Wide is better for sparsely formatted financial reports. Tight works for dense multi-column data.

Excel and CSV Output

Download as .xlsx for multi-sheet Excel workbooks compatible with Microsoft Excel, Google Sheets, LibreOffice Calc and Apple Numbers. Or export as .csv for simple single-page data ready for database import or scripting workflows.

Custom Page Range

Extract all pages or specify exactly which pages to process. Enter ranges like 1-3, 5, 8-10 to extract only the tables you need from large multi-page financial reports, without processing the entire document.

Why Convert PDF Tables to Excel?

PDF (Portable Document Format) is the universal standard for sharing documents with fixed, reproducible layout. It excels at presenting financial statements, price lists, data reports, invoices and tables in a format that looks identical on every device. However, this presentation-first design is also PDF's greatest limitation for data work: the tabular data inside a PDF is locked — you cannot sort it, filter it, chart it, perform calculations on it or import it into a database without first converting it to an editable format.

Microsoft Excel (.xlsx), alongside Google Sheets, LibreOffice Calc and Apple Numbers, is the world's most widely used data manipulation environment. Converting PDF tables to Excel unlocks the data inside your PDFs and makes it immediately actionable — ready for pivot tables, VLOOKUP, conditional formatting, chart creation, statistical analysis, database import and programmatic processing via Python pandas, R or SQL.

"Data trapped in a PDF is just a picture of a spreadsheet. The moment you convert it to Excel, it becomes a living dataset you can analyse, sort, filter, visualise and act on."

The demand for reliable PDF-to-Excel conversion spans virtually every industry and function. Accountants re-key bank statement data. Analysts manually transcribe quarterly report tables. Procurement teams copy supplier price lists. HR professionals extract payroll data from payslip PDFs. Each of these workflows costs significant time and introduces transcription errors that a reliable conversion tool eliminates entirely.

What Types of PDFs Work Best for Table Extraction

Understanding which PDFs yield the best extraction results helps you set appropriate expectations and choose the right tool for your specific document.

Text-Based PDFs (Best Results)

PDFs created digitally — exported from Microsoft Word, Excel, PowerPoint, accounting software, ERP systems, web browsers or reporting tools — contain actual text objects with precise position coordinates stored in the PDF content stream. Our converter reads these text objects directly using PDF.js's text content extraction API, obtaining both the text values and their spatial positions (x, y coordinates). This spatial data is used to reconstruct the original table structure with high accuracy.

Examples of text-based PDFs that typically extract well include bank statements exported from online banking portals, financial reports exported from Xero, QuickBooks or SAP, price lists and catalogues created in InDesign or Word, invoice PDFs from billing systems and payroll reports from HR platforms.

Scanned Image PDFs (Require OCR First)

PDFs created by scanning physical documents with a flatbed scanner, multifunction printer or smartphone scanning app contain page images rather than text objects. There is no selectable text in a scanned PDF — the content is purely visual pixels. Our browser-based extractor cannot extract data from scanned PDFs because it reads text objects, not pixels. For scanned PDFs, you first need to apply Optical Character Recognition (OCR) using a tool such as Adobe Acrobat, ABBYY FineReader, or our PDF to OCR tool, which converts the scanned images into a text-based PDF. Once OCR has been applied, the resulting searchable PDF can be processed by our extractor.

Hybrid PDFs (Partial Results)

Some PDFs contain a mix of text-based content and embedded image elements. For example, a financial report might have text paragraphs and data tables as proper PDF text objects, but company logos, charts and graphs as embedded JPEG or PNG images. Our extractor will successfully extract the text-based tables from such documents while ignoring the image-based content, which is the correct behaviour.

How the PDF Table Extraction Works

Our converter uses a spatial text clustering algorithm to reconstruct table structure from the position-annotated text objects extracted by PDF.js. Here is the technical pipeline:

  1. Text Extraction: PDF.js's getTextContent() method extracts all text items from each PDF page. Each item includes the text string and a transformation matrix encoding its x, y position, font size and rotation.
  2. Row Grouping: Text items are grouped into rows by their Y coordinate (vertical position). Items whose Y coordinates fall within a configurable tolerance band are considered to belong to the same row. This tolerance accounts for slight Y-axis variation between characters in the same line due to font baseline alignment differences.
  3. Column Sorting: Within each row, text items are sorted by their X coordinate (horizontal position) from left to right, reconstructing the reading order of each row.
  4. Column Alignment: The column detection mode controls how X-position clusters are identified across rows to align cells into consistent columns. Auto mode uses k-means-style clustering of X positions across all rows. Wide mode uses a larger minimum gap threshold. Tight mode uses a smaller threshold.
  5. Worksheet Assembly: SheetJS (xlsx library) assembles each page's row/column data into an Excel worksheet. Multiple page worksheets are combined into a single .xlsx workbook.
  6. Download: The workbook is serialised to binary Excel format and delivered as a file download.

Key Use Cases for PDF to Excel Conversion

Financial Statement Analysis

Investment analysts, financial controllers, management accountants and CFOs regularly receive income statements, balance sheets, cash flow statements and financial model outputs as PDF reports from accounting systems (Xero, Sage, Oracle, SAP), auditors, subsidiary companies and portfolio companies. Converting these to Excel enables ratio analysis, trend modelling, variance analysis and consolidation into master financial models — workflows that are impossible to perform on static PDF data without manual re-entry.

Bank and Credit Card Statement Processing

Bookkeepers, accounts payable teams and finance professionals processing bank reconciliations need to import bank statement transaction data into accounting software. Bank statements downloaded as PDFs from online banking portals (Barclays, HSBC, Lloyds, NatWest, JPMorgan, Bank of America, Chase, Wells Fargo) can be converted to Excel and then cleaned and imported into Xero, QuickBooks, Sage or Dynamics via CSV upload, eliminating manual transaction entry entirely.

Supplier Price List Management

Procurement managers, purchasing officers and category buyers regularly receive supplier catalogues and price lists as PDF files. Converting these to Excel enables price comparison across multiple suppliers, percentage markup calculations, conditional formatting to highlight price changes versus the previous period, and VLOOKUP-based integration with internal product databases and ERP systems such as SAP Ariba, Oracle Procurement or Microsoft Dynamics 365.

Scientific and Research Data

Researchers and data scientists frequently need to extract numerical data tables from PDF journal articles — experimental results, measurement datasets, comparison tables, statistical outputs and literature review matrices — for meta-analysis, replication studies and systematic reviews. Converting these to Excel enables direct use of the data in statistical software including SPSS, Stata, R and Python without error-prone manual transcription.

Legal and Regulatory Data Extraction

Legal professionals, compliance officers and regulatory analysts working with court judgements, HMRC tax schedules, FCA regulatory returns, Companies House filings, SEC EDGAR submissions and government statistical publications routinely need to extract tabular data for legal analysis, compliance modelling and regulatory reporting. PDF-to-Excel conversion is a standard capability in legal technology (LegalTech) and RegTech workflows.

Tips for Getting the Best PDF to Excel Results

  • Use text-based PDFs: If you have a scanned PDF, run OCR on it first using our PDF to OCR tool or Adobe Acrobat before attempting Excel extraction. Text-based PDFs produce dramatically better results than scanned image PDFs.
  • Try Wide mode for financial reports: Financial statements and accounting reports often use wide column spacing with numeric values right-aligned at large distances from their labels. Wide column detection mode handles this spacing pattern better than Auto.
  • Use Tight mode for dense tables: Price lists, data matrices and multi-column tables with minimal whitespace between columns often extract better with Tight mode, which uses a smaller gap threshold to separate adjacent columns.
  • Extract specific pages for large documents: For annual reports or regulatory filings with many pages, use the custom range to extract only the pages containing the tables you need. This is faster and produces cleaner output than processing an entire 200-page document.
  • Clean up in Excel after export: Even the best browser-based extraction may require minor cleanup — removing header rows that repeated on each page, merging split cells or reformatting numeric strings that extracted as text. This cleanup is far faster than manual transcription of the original data.
Got Questions?

Frequently Asked Questions

Is my PDF data sent to any server?
No. All processing runs in your browser using PDF.js for text extraction and SheetJS for Excel file creation. Your PDF — including any financial, legal or confidential data it contains — never leaves your device at any point during the conversion process.
Why does my scanned PDF produce an empty or garbled Excel file?
Scanned PDFs contain page images rather than text objects. Our extractor reads text content from PDF content streams, not pixels. If your PDF was created by scanning a physical document, you need to apply OCR (Optical Character Recognition) first to convert it into a text-based PDF. Use our PDF to OCR tool or Adobe Acrobat to add a text layer before re-attempting the Excel extraction.
Can I extract from a PDF with multiple tables per page?
Yes, but with caveats. Our extractor processes each page as a single text grid. If your page contains two separate tables with a gap between them, the extractor will include all rows from both tables in the same worksheet. You can manually delete the rows belonging to one table in Excel after export, or use the custom page range to process individual pages containing the specific table you need.
Why are numbers extracting as text strings in Excel?
PDF text extraction returns all values as strings. Numbers formatted with currency symbols, thousand separators or percentage signs often do not auto-convert to Excel number format. In Excel, select the column, use Data > Text to Columns with Delimited, then Finish to convert strings to numbers. Alternatively, use Find & Replace to remove unwanted characters before converting.
What column detection mode should I use?
Start with Auto for most PDFs. If columns are merging incorrectly (text from different columns appearing in the same cell), switch to Tight mode to increase column sensitivity. If columns are splitting incorrectly (single values split across two cells), switch to Wide mode to decrease sensitivity and allow more horizontal gap between columns.
Can I export to CSV instead of Excel?
Yes. Select "CSV (.csv) — first page" from the Output Format dropdown. The CSV file contains the extracted data from the first page of your PDF (or the first page of your selected range). CSV is ideal for database imports, scripting workflows and tools that don't support .xlsx format. For multi-page extraction in separate files, use the Excel option and then save individual sheets as CSV from Excel.
Does the tool work with password-protected PDFs?
No. Password-protected PDFs cannot be processed. Remove the password first using Adobe Acrobat or a PDF unlocker tool, then use our converter. Owner-password protected PDFs (edit restrictions only) may open without a user password but could restrict text extraction depending on the permission flags set by the document creator.
Which applications can open the .xlsx output file?
The .xlsx format (Office Open XML) is supported by Microsoft Excel (all versions from 2007 onwards), Google Sheets (via import or direct open), LibreOffice Calc, Apple Numbers, WPS Office and virtually every modern spreadsheet application. It is also the standard format for pandas read_excel() in Python, openpyxl, xlrd and similar data processing libraries.

Ready to Extract Your PDF Tables?

Drop your PDF into the tool above. Free, private and instant.

Start Extracting Now
More Tools

Related PDF and Image Conversion Tools