How to write a Google scraper using the official API
15.02.2025
15.02.2025
6 minutes
79
0
0
0
0
What is this article about and for whom
This article is about writing your own search results parser, for free and in 5 minutes. Without using such things as proxy or bs4. No third-party programs to bypass captcha and (or) imitate user activity in the browser, i.e., Selenium, for example.
It is intended for beginner SEO specialists who are a little savvy in programming and understand python syntax, but who do not have a lot of spare money.
And how am I going to parse Google search results? It's simple, I will connect to the Google Search API, which has a free plan of 100 requests per day. For the owner of a small site, it's just right. Here is a ready-made Google search results parser project.
Next, fill in all the fields and you're done. Now, finally, create an API key. Go to Credentials, either by the link or by the button (。・∀・)ノ゙:
Create an API-key:
After all these steps, you have managed to create your own API key for your own search engine. Copy it and save it somewhere.
Writing a scraper
Basic setup and preparation
Now we have everything we need; all that's left is to write a scraper. Let's create a virtual environment, install the necessary packages, and create a couple of directories:
For Windows/PowerShell
For Linux/Bash
Installing pandas and openpyxl is optional, because if you don't want to save the parsing results to XLSX files, then you don't have to. I will, because it's more convenient for me. The data directory will store our temporary JSON files and the results themselves, either the same JSON or XLSX tables.
Configuration file
My parser will also have a configuration file - config.json, from where it can find out how it should process requests. Here is the content of the configuration file, copy and paste:
Here is a general description of each key:
key - the API key we recently created
cx - the ID created at the beginning for the custom search engine
save_to - allows you to define how to save the result, valid values are exel and json.
depth - how many pages of search results to parse; Google allows you to get a maximum of 10 pages with 10 positions each
title, description and url - what to scrape
Script
This script is designed to accept arguments from the command line. The first is -q, the query itself. The second is the path to the configuration file -C. I did this using the argparse python module. All of its functionality is implemented in the run function:
In this function, an argparse object is created, configured, and then the configuration file is processed. At the very bottom of this function, serp_scrape_init and serp_page_scrape are called. Let's look at them one by one.
The first function serp_scrape_init works with Google Search API. Although, it's hard to call it work. We simply make a request to this URL:
It is important to understand that we need to go through all possible pages that Google returns. For this, the following parameters are used in the address, num and start. The first is responsible for how many sites to return in one request (maximum 10). The second parameter goes through all pages with a step of 10. In general, there are many more parameters for queries, you can see all of them here. As a result, our function looks like this:
As a result of the work, the function creates JSON files, which will then be processed by serp_page_scrape. Actually, let's talk about it.
Nothing extraordinary, it just opens previously created JSON files and saves what was specified in the configuration file. And that's it. Now, we actually got a small Google in the console. Here's an example of usage:
Here is the full code of the script and the main.py file:
Conclusion
You know, I originally planned to write a parser on the BeautifulSoup4 + Selenium + Python stack. But after googling a bit, no, I didn't find an official tutorial from Google on how to create a legal search results parser. I was getting websites of agencies and companies that offer to do the same thing, only for money.
Sure, if you are a large company, and you need to make 1000 requests per second, then Google Search API can provide additional limits for a small fee. Very small, compared to “unnamed” companies and websites. That's how it is. If you want to learn more about Google Search API, check out their official blog. It is very informative.