Remove all

LOADING ...

Content

How to clusterize semantic core and how to use it

04.02.2026

11.03.2026

9 minutes

235

Tags:

Scraper

Search engine optimization

Introduction/continuation of the semantic core where...

I will show what clustering is, why we will do this and what is most important, how it is done manually and how this can be automated using third-party services. Before answering the question why, you need to answer the question of what. Namely, what is the clustering of search queries?

Search query clustering - this is a process in which keywords are grouped with each other, either on the basis of the similarity of the user's intent, or by the lexical similarity of phrases.

Why would you actually want to cluster? Clustering allows you to find out the future structure of the site and make it up accordingly, to find out by which keywords you can promote this or that page both in organic and through advertising.

You can cluster in two ways:

The first method uses the lexical similarity of grouped keywords.
The second way, relying on the user's intent, group queries according to the similarity of search results. This method is divided into two subcategories:
Soft clustering. It is enough that the pages in the search match only partially.
Hard clustering. It is necessary that the pages in the search match completely.

In general, a third type of clustering is also distinguished, namely, a logical one. There is an emphasis on logical combination of queries. For example, group all queries into a "Purchases" or "Services" or "Products" cluster, etc. But as I will show further, the logical and lexical clustering is almost no different from each other. And about it further.

Lexical clustering of keywords

There is nothing to say about the first way. For example, there is such a table of keywords:

Keyword	Impressions
Telegram Bot Menu	500
Webhook Telegram Bot	500
YouTube MP3 Telegram	500
Telegram SetWebhook	500
Telegram Bot Menu Button	500
YouTube Converter MP3 Telegram	500

Now, in order to cluster it according to the first method, it is necessary to group them according to the presence of common words in each. So, for example, the table above will be grouped like this:

Keyword	Cluster	Shows
Telegram Bot Menu	Telegram Bot Menu	500
Telegram Bot Menu Button	Telegram Bot Menu	500
Webhook Telegram Bot	Webhook Telegram Bot	500
Telegram SetWebhook	Webhook Telegram Bot	500
YouTube MP3 Telegram	YouTube MP3 Telegram	500
YouTube Converter MP3 Telegram	YouTube MP3 Telegram	500

This is a fairly simple way of clustering that requires basic knowledge of the language, nothing more.

Logical clustering works on the principle of combining keywords according to their logical similarity. So, if there are many queries for the development of telegram bots, it is probably worth combining them into one cluster and calling it services for the development of telegram bots.

In practice, it is best to combine these two methods. If the first one is easily automated, then the second will require your intervention. And in fact, this is just one method.

Clustering by top pages in SERP

But while being simple, it is not the one I would recommend to use. Why? This method cannot show and capture the user's intent into clusters. Or rather, it will not be able to display the user's intent, assumed by the search engine, i.e. Google/Yandex.

Search engines themselves do not know the search query intent, but thanks to algorithms, they display exactly those pages that best respond to the user's hidden intention that entered the search query.

As I understand it, and according to Google's blogs, they do so by calculating numerous ranking factors, of which there are hundreds, but the most significant are behavioral factors and the site's backlinks profile.

And in order to "grab" and catch the user's intent, it is necessary to use the second method, that is, to engage in clustering by top pages in SERP, more about it later.

How does it work? Let's say we have the following queries in the semantic core(we will use the previously mentioned queries):

Webhook Telegram Bot
Telegram SetWebhook
YouTube MP3 Telegram

Now you need to enter these queries into the target search engine in order to find out what is ranked according to these queries (then I will give a table with the results of the analysis of search results from Google).

Top URLs for "Webhook Telegram Bot"	Top URLs for "Telegram SetWebhook"	Top URLs for "Youtube MP3 Telegram"
https://stackoverflow.com/questions/42554548/how-to-set-telegram-bot-webhook	https://stackoverflow.com/questions/36905455/how-to-use-setwebhook-in-telegram	https://t.me/ytbaudiobot
https://habr.com/en/companies/digitalleague/articles/716760/	https://telegram-bot-sdk.readme.io/reference/setwebhook	https://telegram.me/convert_youtube_to_mp3_bot
https://core.telegram.org/bots/webhooks	https://core.telegram.org/bots/webhooks	https://www.telegrambots.info/bots/ytbaudiobot
https://timeweb.cloud/tutorials/nodejs/otlichie-polling-i-webhook-v-telegram-botah	https://decovar.dev/blog/2018/12/02/telegram-bot-webhook-en/	https://ichip.ru/podborki/programmy-prilozheniya/5-telegram-b . otov-kotorye-pomogut-skachat-video-s-youtube-i-socsetej-834852
https://core.telegram.org/bots/api	https://core.telegram.org/bots/api	https://lifehacker.ru/download-audio-youtube/

The second request, although similar to the first one, shows a slightly different output. Although this is easily explained, because if the first request is quite wide in its own sense, then the second one is much more specific because it is about a certain method of installing webhooks. And in the second table there are only two similar addresses.

The third request is completely different, not a single similar address.

On the example of a table, it is easy to explain the difference between hard clustering and soft. If we used soft clustering with a similarity of 2 (that is, how many addresses should match), then the first and second request would fall into one cluster.

But if you use hard clustering, none of the queries would get into the common cluster, because for hard clustering it is necessary that all addresses in the search result match.

Large core clustering

Now, when we know what clustering is, what its types are, and why it is used, you can wonder, "But what if I have a semantic core for 1000 or more queries/keywords?"

You have three ways:

Find relevant online tools and pay with money
Make your own counterparts and pay with time
Manually sort through all the keywords and again pay with time

It may seem like the third option is not an option at all. But if you have a semantic core of up to 50 keywords, or if you already have a clustered core and you are in the process of using it, then you can gradually cluster it in this way.

If you decide to spend your time and create a semantic core clusterer, then you will need to implement 2 interconnected tools:

Search engine scraper. I have an article on my website how to make such a parser for google. And also ready python script for this.
The clusterer itself, which would compare the resulting list of addresses. Maybe someday I'll do it myself.

And if you have a little money, you can use ready-made solutions. I can note only one single that completely covers my clustering needs and this is clusterizer from Arsenkin.

I recommend it; I use it myself. Works on the principle of paid limits. The cheapest option is 28k limits for 30 days for $8. For this number of limits, you can cluster approximately 15k keywords with frequency.

If you enter a list of keywords at the input from this table(which was obtained in the previous article), we will get this table as an output. Well, how to use a clustered core is probably the next question that will arise in your head. And I will try to answer it in the next chapter.

In which I will explain how to use it

Many articles describe the process and ways to create a clustered semantic core, but never how to use it at all. Everyone talks about its benefits without showing any benefit in practice. I want to fix this by showing a previously clustered core.

There are 9 columns in the table, we will go through each of them:

Search queries - the key word
Cluster - the general name of the cluster, given for the first search query in the group
Queries in cluster - number of keywords in the cluster
Impressions - Number of impressions per month
Cluster's impressions sum - total number of impressions per month of the entire cluster
% Aggregators - how many pages of aggregators in search results
Main pages - How many main pages in search results
Toponym in the query - is there a toponym in the request (place name)
URLs - search results on request

I usually add two more columns. One for writing the addresses of the pages of my site that use a particular cluster. I do this in order to follow the progress of the "implementation" of the semantic core to the site. Another to indicate the type of cluster, and I do this in order to understand what I should use this cluster for.

You could notice that after clustering, there are about 200 more queries in the core that could not be clustered. These keywords are actually very important, because if you look closely, you can find amazing ideas for articles or mini-tools.

Once we have a clustered core, we need to look through all of its clusters and decide what to use them for. So I, for example, divide clusters by destination pages. They are as follows:

Abbreviation	Page type	Page description	Number of impressions*
A	Article	Some informative material, guide, tutorial, case.	10-100
BA	Big article	It's just a very large article that combines ordinary articles. Such pages are also called pillar page	100-1000
T	Tool	A simple terminal tool, or in our case it will be a telegram bot	10-100
BT	Big tool	By a large tool, I mean not only the complexity of its execution, but also that it will be online, that is, it will be available to everyone	100-1000
PP**	Pagination page	A page that group either articles or tools.	1000-10000

* - here the amount is conditional and depends specifically on the core itself and the competition that occurs between sites for issuing in the search.

** - these are pages, pagination pages, it makes sense to optimize if you can implement selective indexing of such pages, otherwise thousands and thousands of such pages will eat up your site budget.

Of course, you may have other types of pages, for example, services or product pages, it all depends on the specific semantic core and the site. This is how it turns out in my case.

How do you determine what and which cluster is best? Well, if you understand the topic and are a specialist in the area in which you make your site, then a quick glance should be enough to understand which cluster and which type of page it is better to combine. But what if not?

I can give only a few recommendations:

Look at the user's intent. If you can see in the query that a person is looking for, let's say free telegram bots, then the best page for such a cluster would be either a pillar page or pagination page.
You see a huge cluster under 20-40 keywords, it may be worth using it for a pillar page and break it into several small articles.
Do not try to insert all keywords on your page. Choose only a few and after publication and for some time, change them to find the optimal phrases and combinations of them.

When you're done, you'll get something like such a table. Which you will later use as a compass in choosing the next material for the article. and which will help you follow what you have already done and what else you have to do. Just start with the cluster you like and start writing. But keep in mind that...

It is better to start with low-competitive queries and where you easily lift up to TOP 10, TOP 5 in search results. How to determine that the cluster is low-competitive? To do this, look at the column F(% Aggregators) and G(Main pages).

The higher the percentage of aggregating pages, the easier it is to advance there. Why? They are aggregators, they do not bring anything new to the search, but simply copy the content of others, so they do not have or have very little weight in the search eyes.

The smaller the number of main pages in the search, the easier it is to move there. Why? It's just that someone made a very, very highly specialized site and decided with a 100% chance to take all the impressions for this request. It's hard to get around these, if it's real at all.

Conclusion

Of course, there are other online clusterers like PixelPlus, Serpstat, Topvisor, but I didn't really like their output work, and this article is not about the top clusterers, but about a process as a whole.

The use of clustering is obvious, so if you need to understand what keywords could be used in a particular article, structure the site, understand the direction of the site development, or find out what keywords to promote advertising in search, then clustering is necessary.

When I wrote: "used in a particular article", I did not mean to use them only in meta tags and descriptions. No you can and you should use them in the body of the material/article.

And although it will not be difficult to cluster the core, it will definitely be difficult to use it. After all, there is an opinion that a "semantic core", a kind of fiction of SEO companies and SEO specialists, to make the appearance of what they work on and raise their price.

Interesting point of view, but as we saw from the previous chapter about its use, the semantic core is much more useful than many can imagine. And do not perceive the semantic core as a result of work as such, but as a tool to help achieve specific results (clicks, impressions, conversions).

How do you use the semantic core? And do you use it at all? I will be glad to read about it in the comments.

Impact of Core & Spam Updates in 2025 for a personal website

Next article

How to Build a Semantic Core: A Step-by-Step Keyword Guide

Previous article