3 horizontal lines, burger
3 horizontal lines, burger
3 horizontal lines, burger
3 horizontal lines, burger

3 horizontal lines, burger
Remove all
LOADING ...

Content



    How to scrape google serp via google serp checker api

    Clock
    15.02.2025
    /
    Clock
    11.03.2026
    /
    Clock
    6 minutes
    An eye
    792
    Hearts
    0
    Connected dots
    0
    Connected dots
    0
    Connected dots
    0

    What is this article about and for whom

    This article is about writing your own search results parser, for free and in 5 minutes. Without using such things as proxy or bs4. No third-party programs to bypass captcha and (or) imitate user activity in the browser, i.e., Selenium, for example.
    It is intended for beginner SEO specialists who are a little savvy in programming and understand python syntax, but who do not have a lot of spare money.
    And how am I going to parse Google search results? It's simple, I will connect to the Google Search API, which has a free plan of 100 requests per day. For the owner of a small site, it's just right. Here is a ready-made Google search results parser project.

    Creating an API key and search engine ID

    To be able to use Google's API, we need to get an access key and a search engine ID. First, let's create our own search engine. Go to https://programmablesearchengine.google.com/controlpanel/all, click Add and fill in all the form fields.
    You will be redirected to the next page. Now on this page you can get your ID.
    You get the ID, all that remains is to get an API key. Go to this page, register if necessary https://console.cloud.google.com/apis/dashboard?inv=1&invt=AbppVQ On this page you will need to create a new project. Click here:
    Next, fill in all the fields and you're done. Now, finally, create an API key. Go to Credentials, either by the link or by the button (。・∀・)ノ゙:
    Create an API-key:
    After all these steps, you have managed to create your own API key for your own search engine. Copy it and save it somewhere.

    Writing a scraper

    Basic setup and preparation

    Now we have everything we need; all that's left is to write a scraper. Let's create a virtual environment, install the necessary packages, and create a couple of directories:
    mkdir MyParser mkdir MyParser/data mkdir MyParser/data/serp mkdir MyParser/data/temp New-Item MyParser/main.py New-Item MyParser/config.json python -m venv .venv ./.venv/Scripts/activate pip install requests pandas openpyxl
    For Windows/PowerShell
    mkdir MyParser mkdir MyParser/data mkdir MyParser/data/serp mkdir MyParser/data/temp touch MyParser/main.py touch MyParser/config.json python -m venv .venv source ./.venv/bin/activate pip install requests pandas openpyxl
    For Linux/Bash
    Installing pandas and openpyxl is optional, because if you don't want to save the parsing results to XLSX files, then you don't have to. I will, because it's more convenient for me. The data directory will store our temporary JSON files and the results themselves, either the same JSON or XLSX tables.

    Configuration file

    My parser will also have a configuration file - config.json, from where it can find out how it should process requests. Here is the content of the configuration file, copy and paste:
    { "key": "11111111111111111111111111111111111", "cx": "11111111111111111", "save_to": "exel", "title": true, "description": false, "url": true, "depth": 1 }

    Here is a general description of each key:
    1. key - the API key we recently created
    2. cx - the ID created at the beginning for the custom search engine
    3. save_to - allows you to define how to save the result, valid values ​​are exel and json.
    4. depth - how many pages of search results to parse; Google allows you to get a maximum of 10 pages with 10 positions each
    5. title, description and url - what to scrape

    Script

    This script is designed to accept arguments from the command line. The first is -q, the query itself. The second is the path to the configuration file -C. I did this using the argparse python module. All of its functionality is implemented in the run function:
    def run(): parser = argparse.ArgumentParser(add_help=True) parser.add_argument('-q', type=str, help='Query to parse', metavar='QUERY', required=True, nargs='*') parser.add_argument('-C', type=str, help='Path to config, in json format', metavar='CONFIG_FILE', required=True, nargs=1) args = parser.parse_args() # query raw_query = ''.join(word + ' ' for word in args.q) if raw_query is None: return query = quote(raw_query) # check if config exist options = { 'key': '', 'cx': '', 'save_to': '', 'title': '', 'description': '', 'url': '', 'depth':'' } with open(args.C[0], 'r') as file: data = json.loads(file.read()) for key in data: if options.get(key) is not None: options[key] = data[key] else: print(f'ERROR: Something went wrong in your config file, {key}') return False # check depth if options['depth'] > 10: print('WARNING: Google Search API allowed only 100 search results to be available') options['depth'] = 10 else: options['depth'] = data['depth'] serp_scrape_init(query, options) serp_page_scrape(query, options)
    In this function, an argparse object is created, configured, and then the configuration file is processed. At the very bottom of this function, serp_scrape_init and serp_page_scrape are called. Let's look at them one by one.
    The first function serp_scrape_init works with Google Search API. Although, it's hard to call it work. We simply make a request to this URL:
    https://www.googleapis.com/customsearch/v1?key={options['key']}&cx={options['cx']}&q={query}&num=10&start={i * 10 + 1}
    It is important to understand that we need to go through all possible pages that Google returns. For this, the following parameters are used in the address, num and start. The first is responsible for how many sites to return in one request (maximum 10). The second parameter goes through all pages with a step of 10. In general, there are many more parameters for queries, you can see all of them here. As a result, our function looks like this:
    def serp_scrape_init(query: str, options: dict = {}) -> list: for i in range(0, options['depth']): response = requests.get(f'https://www.googleapis.com/customsearch/v1?key={options['key']}&cx={options['cx']}&q={query}&num=10&start={i * 10 + 1}') save_to_json(f'./data/temp/{query}_{i*10 + 1}-{i*10 + 10}.json',response.json())
    As a result of the work, the function creates JSON files, which will then be processed by serp_page_scrape. Actually, let's talk about it.
    def serp_page_scrape(query: str, options: dict) -> list: data = [] for i in range(0, options['depth']): try: with open(f'./data/temp/{query}_{i*10 + 1}-{i*10 + 10}.json', 'r+', encoding='utf-8') as file: data_temp = json.loads(file.read()) for item in data_temp['items']: title = None if options['title']: title = item['title'] description = None if options['description']: description = item['snippet'] url = None if options['url']: url = item['link'] data.append({ 'title': title, 'description': description, 'url': url, }) except: pass if options['save_to'] == 'json': save_to_json(f'./data/serp/{query}.json', data) else: save_to_exel(f'./data/serp/{query}.xlsx', data) return data
    Nothing extraordinary, it just opens previously created JSON files and saves what was specified in the configuration file. And that's it. Now, we actually got a small Google in the console. Here's an example of usage:
    python main.py -q The biggest cats in the world -С config.json
    Here is the full code of the script and the main.py file:
    import json import argparse import requests import pandas from urllib.parse import quote, unquote def save_to_json(path, list): with open(path, 'w', encoding='utf-8') as file: json.dump(list, file, indent=2, ensure_ascii=False) file.close() def save_to_exel(path, data): frame = pandas.DataFrame({ 'title': [], 'link': [], 'description': [] }) for indx, entry in enumerate(data): frame.at[indx, 'title'] = entry['title'] frame.at[indx, 'link'] = entry['url'] frame.at[indx, 'description'] = entry['description'] frame.to_excel(path, index=False ) def serp_page_scrape(query: str, options: dict) -> list: data = [] for i in range(0, options['depth']): try: with open(f'./data/temp/{query}_{i*10 + 1}-{i*10 + 10}.json', 'r+', encoding='utf-8') as file: data_temp = json.loads(file.read()) for item in data_temp['items']: title = None if options['title']: title = item['title'] description = None if options['description']: description = item['snippet'] url = None if options['url']: url = item['link'] data.append({ 'title': title, 'description': description, 'url': url, }) except: pass if options['save_to'] == 'json': save_to_json(f'./data/serp/{query}.json', data) else: save_to_exel(f'./data/serp/{query}.xlsx', data) return data def serp_scrape_init(query: str, options: dict = {}) -> list: print(f'Query: {unquote(query)},\nOptions: title={options['title']} | description={options['description']} | urls={options['url']} | depth={options['depth']} | save to={options['save_to']}') for i in range(0, options['depth']): response = requests.get(f'https://www.googleapis.com/customsearch/v1?key={options['key']}&cx={options['cx']}&q={query}&num=10&start={i * 10 + 1}') save_to_json(f'./data/temp/{query}_{i*10 + 1}-{i*10 + 10}.json',response.json()) def run(): # This is going to be only in standalone script # Get the options and query from CLI parser = argparse.ArgumentParser(add_help=True) parser.add_argument('-q', type=str, help='Query to parse', metavar='QUERY', required=True, nargs='*') parser.add_argument('-C', type=str, help='Path to config, in json format', metavar='CONFIG_FILE', required=True, nargs=1) args = parser.parse_args() # query raw_query = ''.join(word + ' ' for word in args.q) if raw_query is None: return query = quote(raw_query) # check if config exist options = { 'key': '', 'cx': '', 'save_to': '', 'title': '', 'description': '', 'url': '', 'depth':'' } with open(args.C[0], 'r') as file: data = json.loads(file.read()) for key in data: if options.get(key) is not None: options[key] = data[key] else: print(f'ERROR: Something went wrong in your config file, {key}') return False # check depth if options['depth'] > 10: print('WARNING: Google Search API allowed only 100 search results to be available') options['depth'] = 10 else: options['depth'] = data['depth'] serp_scrape_init(query, options) serp_page_scrape(query, options) if __name__ == "__main__": run()

    Conclusion

    You know, I originally planned to write a parser on the BeautifulSoup4 + Selenium + Python stack. But after googling a bit, no, I didn't find an official tutorial from Google on how to create a legal search results parser. I was getting websites of agencies and companies that offer to do the same thing, only for money.
    Sure, if you are a large company, and you need to make 1000 requests per second, then Google Search API can provide additional limits for a small fee. Very small, compared to “unnamed” companies and websites. That's how it is. If you want to learn more about Google Search API, check out their official blog. It is very informative.


    Do not forget to share, like and leave a comment :)

    Comments

    (0)

    captcha
    Send
    LOADING ...
    It's empty now. Be the first (o゚v゚)ノ

    Other

    Similar articles


    A scraper of an e-commerce website, an example provided, wildberries

    Clock
    16.11.2024
    /
    Clock
    11.03.2026
    An eye
    759
    Hearts
    0
    Connected dots
    0
    Connected dots
    0
    Connected dots
    0
    This is a tutorial with an example showing how to make a scraper for an e-commerce website with bypasses of blocking using proxies and their rotation. Using Selenium and some …

    How to make a scraper of a list of films from Kinopoisk

    Clock
    24.11.2024
    /
    Clock
    11.03.2026
    An eye
    1196
    Hearts
    0
    Connected dots
    0
    Connected dots
    0
    Connected dots
    0
    In this article I will tell you how to write a scraper for the Kinopoisk website, what you will need for this, and I will share the source code with …

    How to make a simple python scraper + a ready-for-use example

    Clock
    10.12.2024
    /
    Clock
    11.03.2026
    An eye
    781
    Hearts
    1
    Connected dots
    0
    Connected dots
    0
    Connected dots
    0
    In this article I will show how to make a simple python scraper. This parser is an example of how to parse static and dynamic sites. With the source code …

    Used termins


    Related questions