Client side rendering(CSR)

It is a JavaScript rendering method that uses JavaScript to render a website or application in the browser. With CSR, the processing and rendering of the content happens in the browser rather than on the server.

Is a high-level, interpreted programming language that is commonly used for web development. It is an essential part of web applications, enabling interactive features and dynamic content on websites.

Is a set of instructions written in a programming or scripting language that is executed by a runtime environment rather than being compiled into machine code. Scripts are typically used to automate tasks or to control the behavior of applications and systems.

In the context of computing and web development, refers to a program or script that is designed to extract data from websites. This process is known as web scraping. Scrapers can automatically navigate through web pages, retrieve specific information, and store that data in a structured format, such as CSV, JSON, or a database.

What is the best web scraping tool?

The choice of a scraping tool depends on the nature of the website and its complexity. As long as the tool can help you get the data quickly and smoothly with acceptable or zero cost, you can choose the tool you like.

How to avoid being blocked when scraping a website?

Many websites would block you if you scraped them too much. To avoid being denied, you should make the scraping process more like a human browsing a website. For example, adding a delay between two requests, using proxies, or applying different scraping patterns can help you avoid being blocked.

What is the difference between web scraping and web crawling?

Web scraping and web crawling are two related concepts. Web scraping, as we mentioned above, is a process of obtaining data from websites; web crawling is systematically browsing the World Wide Web, generally for the purpose of indexing the web.

Manual scraping, what is it?

This is a process of extracting data from web resources or documents that is manual, that is, performed by a person without the help of any auxiliary scripts or programs.

Scraping data from Instagram is illegal?

If the data you are going to collect is public and accessible to everyone, then it is definitely allowed. Plus, Instagram provides a special API for scraping so there should be no problems.

Cloud scraping what is it

This is a service for collecting information from various sources and grouping them in various formats, which is carried out on the cloud servers of the provider of this service.

It all depends on what to scrape and what to scrape with. You can scrape documents and tables, or you can scrape a websites. Moreover, websites are more difficult to scrape from than documents, because there are many websites and each has its own architecture, which greatly complicates scraping.

Which language to use to write a scraper

The single and universal language of all parsers is python. The fact is that for any information, in whatever form and format it was presented, there is a library. Although there are alternatives in the form of JavaScrip, Ruby, Go, C++, PHP

Remove all

LOADING ...

Content

How to make your own google serp scraper, with an example

15.02.2025

13.04.2026

6 minutes

1021

Tags:

Scraper

CLI

Search result parser series

Python

Search result scraper

What is this article about and for whom

This article is about writing your own search results parser, for free and in 5 minutes. Without using such things as proxy or bs4. No third-party programs to bypass captcha and (or) imitate user activity in the browser, i.e., Selenium, for example.

It is intended for beginner SEO specialists who are a little savvy in programming and understand python syntax, but who do not have a lot of spare money.

And how am I going to parse Google search results? It's simple, I will connect to the Google Search API, which has a free plan of 100 requests per day. For the owner of a small site, it's just right. Here is a ready-made Google search results parser project.

Creating an API key and search engine ID

To be able to use Google's API, we need to get an access key and a search engine ID. First, let's create our own search engine. Go to https://programmablesearchengine.google.com/controlpanel/all, click Add and fill in all the form fields.

You will be redirected to the next page. Now on this page you can get your ID.

You get the ID, all that remains is to get an API key. Go to this page, register if necessary https://console.cloud.google.com/apis/dashboard?inv=1&invt=AbppVQ On this page you will need to create a new project. Click here:

Next, fill in all the fields and you're done. Now, finally, create an API key. Go to Credentials, either by the link or by the button (｡･∀･)ﾉﾞ:

Create an API-key:

After all these steps, you have managed to create your own API key for your own search engine. Copy it and save it somewhere.

Writing a scraper

Basic setup and preparation

Now we have everything we need; all that's left is to write a scraper. Let's create a virtual environment, install the necessary packages, and create a couple of directories:

mkdir MyParser mkdir MyParser/data mkdir MyParser/data/serp mkdir MyParser/data/temp New-Item MyParser/main.py New-Item MyParser/config.json python -m venv .venv ./.venv/Scripts/activate pip install requests pandas openpyxl

For Windows/PowerShell

mkdir MyParser mkdir MyParser/data mkdir MyParser/data/serp mkdir MyParser/data/temp touch MyParser/main.py touch MyParser/config.json python -m venv .venv source ./.venv/bin/activate pip install requests pandas openpyxl

For Linux/Bash

Installing pandas and openpyxl is optional, because if you don't want to save the parsing results to XLSX files, then you don't have to. I will, because it's more convenient for me. The data directory will store our temporary JSON files and the results themselves, either the same JSON or XLSX tables.

Configuration file

My parser will also have a configuration file - config.json, from where it can find out how it should process requests. Here is the content of the configuration file, copy and paste:

{ "key": "11111111111111111111111111111111111", "cx": "11111111111111111", "save_to": "exel", "title": true, "description": false, "url": true, "depth": 1 }

Here is a general description of each key:

key - the API key we recently created
cx - the ID created at the beginning for the custom search engine
save_to - allows you to define how to save the result, valid values are exel and json.
depth - how many pages of search results to parse; Google allows you to get a maximum of 10 pages with 10 positions each
title, description and url - what to scrape

Script

This script is designed to accept arguments from the command line. The first is -q, the query itself. The second is the path to the configuration file -C. I did this using the argparse python module. All of its functionality is implemented in the run function:

def run(): parser = argparse.ArgumentParser(add_help=True) parser.add_argument('-q', type=str, help='Query to parse', metavar='QUERY', required=True, nargs='*') parser.add_argument('-C', type=str, help='Path to config, in json format', metavar='CONFIG_FILE', required=True, nargs=1) args = parser.parse_args() # query raw_query = ''.join(word + ' ' for word in args.q) if raw_query is None: return query = quote(raw_query) # check if config exist options = { 'key': '', 'cx': '', 'save_to': '', 'title': '', 'description': '', 'url': '', 'depth':'' } with open(args.C[0], 'r') as file: data = json.loads(file.read()) for key in data: if options.get(key) is not None: options[key] = data[key] else: print(f'ERROR: Something went wrong in your config file, {key}') return False # check depth if options['depth'] > 10: print('WARNING: Google Search API allowed only 100 search results to be available') options['depth'] = 10 else: options['depth'] = data['depth'] serp_scrape_init(query, options) serp_page_scrape(query, options)

In this function, an argparse object is created, configured, and then the configuration file is processed. At the very bottom of this function, serp_scrape_init and serp_page_scrape are called. Let's look at them one by one.

The first function serp_scrape_init works with Google Search API. Although, it's hard to call it work. We simply make a request to this URL:

https://www.googleapis.com/customsearch/v1?key={options['key']}&cx={options['cx']}&q={query}&num=10&start={i * 10 + 1}

It is important to understand that we need to go through all possible pages that Google returns. For this, the following parameters are used in the address, num and start. The first is responsible for how many sites to return in one request (maximum 10). The second parameter goes through all pages with a step of 10. In general, there are many more parameters for queries, you can see all of them here. As a result, our function looks like this:

def serp_scrape_init(query: str, options: dict = {}) -> list: for i in range(0, options['depth']): response = requests.get(f'https://www.googleapis.com/customsearch/v1?key={options['key']}&cx={options['cx']}&q={query}&num=10&start={i * 10 + 1}') save_to_json(f'./data/temp/{query}_{i*10 + 1}-{i*10 + 10}.json',response.json())

As a result of the work, the function creates JSON files, which will then be processed by serp_page_scrape. Actually, let's talk about it.

def serp_page_scrape(query: str, options: dict) -> list: data = [] for i in range(0, options['depth']): try: with open(f'./data/temp/{query}_{i*10 + 1}-{i*10 + 10}.json', 'r+', encoding='utf-8') as file: data_temp = json.loads(file.read()) for item in data_temp['items']: title = None if options['title']: title = item['title'] description = None if options['description']: description = item['snippet'] url = None if options['url']: url = item['link'] data.append({ 'title': title, 'description': description, 'url': url, }) except: pass if options['save_to'] == 'json': save_to_json(f'./data/serp/{query}.json', data) else: save_to_exel(f'./data/serp/{query}.xlsx', data) return data

Nothing extraordinary, it just opens previously created JSON files and saves what was specified in the configuration file. And that's it. Now, we actually got a small Google in the console. Here's an example of usage:

python main.py -q The biggest cats in the world -С config.json

Here is the full code of the script and the main.py file:

import json import argparse import requests import pandas from urllib.parse import quote, unquote def save_to_json(path, list): with open(path, 'w', encoding='utf-8') as file: json.dump(list, file, indent=2, ensure_ascii=False) file.close() def save_to_exel(path, data): frame = pandas.DataFrame({ 'title': [], 'link': [], 'description': [] }) for indx, entry in enumerate(data): frame.at[indx, 'title'] = entry['title'] frame.at[indx, 'link'] = entry['url'] frame.at[indx, 'description'] = entry['description'] frame.to_excel(path, index=False ) def serp_page_scrape(query: str, options: dict) -> list: data = [] for i in range(0, options['depth']): try: with open(f'./data/temp/{query}_{i*10 + 1}-{i*10 + 10}.json', 'r+', encoding='utf-8') as file: data_temp = json.loads(file.read()) for item in data_temp['items']: title = None if options['title']: title = item['title'] description = None if options['description']: description = item['snippet'] url = None if options['url']: url = item['link'] data.append({ 'title': title, 'description': description, 'url': url, }) except: pass if options['save_to'] == 'json': save_to_json(f'./data/serp/{query}.json', data) else: save_to_exel(f'./data/serp/{query}.xlsx', data) return data def serp_scrape_init(query: str, options: dict = {}) -> list: print(f'Query: {unquote(query)},\nOptions: title={options['title']} | description={options['description']} | urls={options['url']} | depth={options['depth']} | save to={options['save_to']}') for i in range(0, options['depth']): response = requests.get(f'https://www.googleapis.com/customsearch/v1?key={options['key']}&cx={options['cx']}&q={query}&num=10&start={i * 10 + 1}') save_to_json(f'./data/temp/{query}_{i*10 + 1}-{i*10 + 10}.json',response.json()) def run(): # This is going to be only in standalone script # Get the options and query from CLI parser = argparse.ArgumentParser(add_help=True) parser.add_argument('-q', type=str, help='Query to parse', metavar='QUERY', required=True, nargs='*') parser.add_argument('-C', type=str, help='Path to config, in json format', metavar='CONFIG_FILE', required=True, nargs=1) args = parser.parse_args() # query raw_query = ''.join(word + ' ' for word in args.q) if raw_query is None: return query = quote(raw_query) # check if config exist options = { 'key': '', 'cx': '', 'save_to': '', 'title': '', 'description': '', 'url': '', 'depth':'' } with open(args.C[0], 'r') as file: data = json.loads(file.read()) for key in data: if options.get(key) is not None: options[key] = data[key] else: print(f'ERROR: Something went wrong in your config file, {key}') return False # check depth if options['depth'] > 10: print('WARNING: Google Search API allowed only 100 search results to be available') options['depth'] = 10 else: options['depth'] = data['depth'] serp_scrape_init(query, options) serp_page_scrape(query, options) if __name__ == "__main__": run()

Conclusion

You know, I originally planned to write a parser on the BeautifulSoup4 + Selenium + Python stack. But after googling a bit, no, I didn't find an official tutorial from Google on how to create a legal search results parser. I was getting websites of agencies and companies that offer to do the same thing, only for money.

Sure, if you are a large company, and you need to make 1000 requests per second, then Google Search API can provide additional limits for a small fee. Very small, compared to “unnamed” companies and websites. That's how it is. If you want to learn more about Google Search API, check out their official blog. It is very informative.

Fast implementation of a RESTful API for a Django website

Next article

Full localization for a Django website

Previous article

Do not forget to share, like and leave a comment :)

Comments

(0)

Send

LOADING ...

It's empty now. Be the first (oﾟvﾟ)ノ

External links

Other

Used termins

Client side rendering(CSR) ⟶ It is a JavaScript rendering method that uses JavaScript to render a website or application in the browser. With CSR, the processing and rendering of the content happens in the browser rather than on the server.
Javascript ⟶ Is a high-level, interpreted programming language that is commonly used for web development. It is an essential part of web applications, enabling interactive features and dynamic content on websites.
Script ⟶ Is a set of instructions written in a programming or scripting language that is executed by a runtime environment rather than being compiled into machine code. Scripts are typically used to automate tasks or to control the behavior of applications and systems.
Scraper ⟶ In the context of computing and web development, refers to a program or script that is designed to extract data from websites. This process is known as web scraping. Scrapers can automatically navigate through web pages, retrieve specific information, and store that data in a structured format, such as CSV, JSON, or a database.