In this article:

Import Data from a Website to Google Sheets [2024 Update]

What is XPath?

XPath, which stands for XML Path Language, is used to access data stored in XML files. XML files are used to store data, and its XML format is also used as the basis for HTML format today. As HTML format is used in web pages, XPath can also be used to access data from web pages.

The HTML syntax consists of a system of tags and attributes that dictate how the content should be displayed. The tags describe in general terms how the content should be displayed. The attributes further refine the formatting and also include ways to identify the specific data stored in the given tag. 

Here are some examples of the HTML tags with their corresponding XPaths for the different types of data stored in a webpage:

Page title: //title

Meta description: /html/head/meta[@name='description']/@content

H1: //h1

H2: //h2

H3: //h3

Links: //@href

Tables: //table[n]/tbody/*

Bulleted Lists: //ul/li

Numbered Lists: //ol/li

Often HTML pages will use a certain tag multiple times and when you use it with the IMPORTXML function, you can get more data than you need. In constructing the XPath, you can include the tags enclosing the data with their unique attributes so that you can filter the output to the ones you need. 

If you want to include the class attribute of the tag, you add 

[contains(@class, ‘class_value’)]

Where the class_value is the class value attribute. For example, if the data we want to include is inside the following code:

<div class=”toc”>

<ul>

<li>Entry one</li>

<li>Entry two</li>

<li>Entry three</li>

.

.

.

</ul>

</div>

Then the XPath is

//div[contains(@class,'toc')]/ul/li/*

Google Sheets will identify the div tag that contains the attribute class with value “toc”, then go further to the ul and li tags to extract the specific contents for each li tag.

How to Import Data from a Website to Google Sheets

1. Identify the Data to Import from a Webpage

The data in websites come in different formats. They can be in the form of 

  • Table
  • Lists
  • Paragraphs
  • Etc. 

For example, we want to extract the list of spreadsheet software from this Wikipedia page:

Import data from website to google sheets

2. Right-Click the Data then Select Inspect

Once we identified the data, we then right-click the data we want to import then select Inspect. The Inspect option allows you to view the HTML code used to build the webpage. The Inspect option also centers the code to the portion containing the data.

Import website data google sheets‍

The Inspect tab shows that the items are stored in <li> tags. 

3. Assemble the XPath

When you move your cursor through the HTML code of the webpage, you will notice that the highlighted part of the webpage changes. You can use it to identify the topmost tag containing the data you need. For our example, it is the div tag with the class mw-content-ltr mw-parser-output

Import website data to google sheets

Additionally, we just want the name of the software. Fortunately, all of them are hyperlinks.

We can include the <a> tag in the XPath. 

We can then construct the XPath by taking note of the div class and the list tags present. The XPath becomes:

//div[contains(@class,'mw-content-ltr mw-parser-output')]/ul/li/a

4. Use the Formula =IMPORTXML(URL, xpath)

The formula to use is 

=IMPORTXML(URL, xpath)

Where URL  is the URL of the webpage while xpath is the XPath we use to extract the selected data. We can enclose The URL is 

https://en.wikipedia.org/wiki/List_of_spreadsheet_software

And the XPath is

//div[contains(@class,'mw-content-ltr mw-parser-output')]/ul/li

The formula is

=IMPORTXML("https://en.wikipedia.org/wiki/List_of_spreadsheet_software","//div[contains(@class,'mw-content-ltr mw-parser-output')]/ul/li/a")

The result is:

Google sheets import website data

FAQs

What are some Common IMPORTXML function Errors in Google Sheets?

Here are two common errors:

Import Xml content cannot be parsed. It means that the XPath input does not follow the correct syntax; thus Google Sheets cannot import any data from the webpage. Check the formula to make sure there is no typo and that the proper syntax is followed.

Imported content is empty. It can mean two things:

  • The XPath is not a valid HTML element; thus Google Sheets cannot identify any content corresponding to it.
  • There is no content corresponding to the given XPath as interpreted by Google Sheets.

You may need to check the webpage itself to identify the proper XPath.

Can all website data be imported to Google Sheets using IMPORTXML function?

No. There are websites that actively block codes that import their content. This will cause Google Sheets to output errors in the IMPORTXML function when otherwise the XPath should work.

Additionally, there is data that you can only access by logging in to your account. These include important data about your customers, inventory, and sales stored in the online eCommerce and marketing platforms you use. 

These can be accessed by using a built-in API that the platform accesses. It is tricky to set up the access via API in Google Sheets! There is a workaround, however. Try Lido. Unlike Google Sheets, importing data is a built-in feature, and Lido is optimized to import data from different eCommerce and marketing platforms you use such as Hubspot, Salesforce, MySQL databases, etc. 

Try it now for free!

Automate repetitive tasks with Lido

Save hours on repetitive and tedious work. Lido is a new spreadsheet that connects your spreadsheets, forms, PDFs, and email inbox.

Trigger
Action
Select trigger
When a cell value in Google Sheets changes
Check Google Sheets for today's date
When a new row is added to Google Sheets
On new Google Form submission
Send me a daily reminder
On new Typeform submission
When a cell value in database changes
Check database for today's date
When a new row is added in database
When a new HubSpot customer is created
Select action
ADDCALENDAREVENT
Creates a one- or multiple-day calendar event with optional attendees
ADDHUBSPOT
Adds an object to Hubspot
ADDSLACKCHANNEL
Create a Slack channel, and optionally add a topic or members
CALLURL
Makes any HTTP request
CREATEGOOGLEDOC
Replaces [@column_name] values in a Google Doc with the corresponding table row's values
CREATEPDF
Replaces [@column_name] values in a Google Doc with the corresponding table row's values, then export it as a PDF
FETCH
Makes any HTTP request and returns the response
INSERTROWS
Inserts given array below defined values in given worksheet
SENDGMAIL
Sends an email using your Google account
SENDOUTLOOK
Sends an email using your Microsoft Outlook account
SENDSLACK
Sends a Slack message
SENDSMS
Sends an SMS message using your Twilio account
UPDATECELL
Updates cells with given values
UPDATEHUBSPOT
Updates a property of a Hubspot object
Trigger is required
Action is required
Get started