3 horizontal lines, burger
3 horizontal lines, burger
3 horizontal lines, burger
3 horizontal lines, burger

3 horizontal lines, burger
Remove all
LOADING ...

Content



    What Are AI Agents and How to Build Your Own: A Step-by-Step Python Guide

    Clock
    21.11.2025
    /
    Clock
    21.11.2025
    /
    Clock
    14 minutes
    An eye
    65
    Hearts
    0
    Connected dots
    0
    Connected dots
    0
    Connected dots
    0

    Introduction to AI

    In this article, I'll describe, explain, and demonstrate how AI agents can be used to automate keyword research, given only an article idea (as an example). I'll also describe how these agents work, make decisions, and their different types and varieties.
    I've always been skeptical of new technologies, distrusting hype and trends. Especially AI. It seemed cool, but for me, it wasn't very useful.
    Sure, it was fun to just chat for a while, but it got boring, and I won't even mention the errors, inaccuracies, and overconfidence in what they generate. They're only as good as the initial training data. And what data do they train on? That's right, websites and their content.
    For example, if you ask Gemini the most overused question among front-end developers: "How do I center an element?" Gemini will give you a complete, detailed, and exhaustive answer. Please note that I asked a very general question; I could have meant how this could be done in Figma or any other editor. But since Gemini uses search results as a tool, it also gets the intent from there.
    The thing is, intent is determined by the search algorithms of the search engine being asked (Google, Bing, Yandex, etc.). More precisely, users determine the intent of a particular search query by selecting and spending more time on resources that best match the intent specified in the query.
    But if you ask, "Describe in detail the course and results of the 1804 'Glass War' between Switzerland and Mongolia," the AI ​​will try to play along and can very confidently answer what actually happened in 1804.
    I understand that this example is very exaggerated, and even the most modern models won't buy it, but in any case, you should always check what they've generated for you. Although, to be fair, it should be said that the same applies to anything written and published online, because it's the internet. Anyone can write anything, however they want.
    But one thing today's AI can do well is aggregate and summarize large volumes of data, and do it quickly. And I believe these are very good qualities for automating certain processes.
    It's precisely this ability to quickly process data that makes AI an ideal candidate for automation. But to turn a simple chatty bot into a reliable assistant, we need to give it autonomy and the right to act. This is where AI Agents come in.

    What is an AI Agent and what does it look like?

    According to an internet definition that is definitely not AI-generated, an AI Agent is:
    An AI Agent is an autonomous program that uses artificial intelligence to independently perform tasks and achieve goals without constant human intervention. Unlike a simple chatbot, it can perceive its environment, make decisions, use various tools (such as internet search or databases), and learn from the information it receives to achieve results.
    In other words, to consider a regular Python script an AI Agent, we need:
    1. A language model—what will actually generate the corresponding output.
    2. Tools—what enables the language model to interact with the outside world.
    3. A workflow control system (The Orchestration Layer)—that is, a system that controls the "Think, Act, Observe" cycle—a state machine, if you will, that determines which model to use and what tools to provide to that model.
    In the next chapter, I'll describe creating an AI Agent using the LangChain library. To create an agent, you'll need to specify the tools, the model, and some context. In this case, the library itself plays the role of the "Workflow Control System."

    How agents work and think

    Fundamentally, all agents operate in a special cycle called "Think, Act, Observe." It consists of five key steps:
    1. Get a Mission — the agent sets an initial goal for the agent to act upon. This mission can be defined by either the user or the developer during creation.
    2. Context Analysis — at this stage, the agent analyzes the entire context provided, including the developer's initial prompts, descriptions of the provided tools, and what's in its temporary memory.
    3. Think — the model now begins to think about solving the user's request, developing a plan for further action, and, accordingly, which tools can be used.
    4. Act — the agent, having selected the necessary tools, begins to use them in accordance with the requirements for completing the given mission.
    5. Observe and Repeat — each tool must have an output; this output is added to the context, and the agent returns to step 3 to analyze the new, updated context.

    What kinds of agents are there?

    In the previous chapter, I showed how an agent works in its most basic, simple form. But did you know that agents have a hierarchy and can become even more complex and intelligent. As a result, they can solve much more complex problems.
    Hierarchy of Agent Systems. Taken from https://www.kaggle.com/whitepaper-introduction-to-agents
    Level 0. The core reasoning system. A large language model on its own.
    1. Example: You asked Gemini, "Write an essay about the impact of the Industrial Revolution on the environment."
    2. Concept: The model uses only its internal knowledge, acquired during training. It doesn't access the internet or run code. If it doesn't have this knowledge or it's outdated, it will start hallucinating.
    Level 1. The connected problem-solver. At this level, tools are attached to the language model, allowing it to explore and interact with the environment.
    1. Example: You asked, "What's the weather like in Warsaw right now?"
    2. Concept: The model understands that it doesn't know the answer, but it has a tool (function) called get_weather(city). It calls this tool, receives data, and generates a response in natural language. This is where the connection with the outside world begins.
    Level 2. The strategic problem-solver. At this level, the agent acquires a context, and this context can be used to limit and frame the agent or customize its behavior.
    1. Example: The agent we create in the article (SEO specialist).
    2. Concept: You don't just provide a tool; you define a role and context: "You are an SEO specialist, your goal is to expand the semantic core." The agent remembers its mission and can perform several actions in a row (find competitors -> extract keywords -> filter) within a single session, maintaining the task context.
    Level 3. The Collaborative Multi-Agent System. At this level, different agents with their own tools and contexts unite to solve a common problem.
    1. Example: "Blog Editing."
    2. Concept: There are three agents: the Researcher (searches for topics online), the Copywriter (writes text based on what they find), and the Editor (checks the text and corrects style). You assign the task "Write an article about AI," and agents pass the work down the chain, like coworkers in an office, until the result is ready.
    Level 4. The self-evolving system. This is a system that is able to understand its own shortcomings and limitations and create additional tools and/or agents to address them in order to complete the task.
    1. Example: A travel booking agent faced with a new task.
    2. Concept: You ask the agent to book a hotel on a specific website whose API the agent doesn't know. A Level 4 agent won't crash. It will read the website's documentation, write a new tool (code) to interact with the API, test it, and complete your task.
    A couple of notes on some levels. Fundamentally, you can equate levels 2 and 3. So, instead of creating dozens of separate agents with their own tools, you can simply create all the necessary tools and delegate them to a single agent. This is much simpler to implement. You don't have to worry about connecting multiple agents and how they will interact.
    By this logic, a self-adaptive system doesn't necessarily have to consist of multiple agents. It can simply create or search for the necessary tools to perform its tasks.
    However, there are a couple of limitations that prevent the approach I described above.
    1. First, context. Each agent has a limited context, and therefore, it's impossible to cram all the tools into a single agent. That's why they (agents) will have to be multiplied.
    2. Second, agents lack long-term memory. With each new interaction, the context will have to be redefined and trained to perform a particular task.
    Later in this article, I'll demonstrate the creation of a Level 2 AI Agent, as they are currently the easiest to develop and suitable for most routine tasks. At least mine are. And this is certainly not because I don't yet know how to create Level 3 and 4 agents.
    And as you may have noticed, starting from Level 1, the key difference between an agent and a simple chatbot is the ability to interact with its environment. And they do this with the help of tools. Let's figure out what types there are.

    About tools for AI agents

    Tools are functions that the Large Language Model can use to perform assigned tasks. All tools can be roughly divided by the type of action:
    1. Those that get something are used to obtain information, such as querying a database.
    2. Those that do something are used to perform actions on existing systems, such as creating a database entry.
    These tools can also be divided by their origin:
    1. External tools are those that are not built into the Language Model and must be specified when creating an agent.
    2. Built-in tools are those that the Language Model can use without explicitly specifying them in the toolchain when creating an agent. For example, Gemini has the following built-in tools: Google Search, Code Execution, URL-Context, and Computer Use. More built-in tools for Gemini can be found here.
    3. Agent as a tool - i.e., any agent can also be used as a tool.
    Not all language models have built-in tools; only Gemini, OpenAI, Anthropic, and Groq do. And not all models support the A2A protocol. That is, not all models can use other agents as tools.
    The A2A protocol is a communication protocol between agents for transferring structured data between them. More information about this protocol can be found on the official website.
    When developing tools for your agent, you should also consider the best practices:
    1. Document everything thoroughly—that is, use clear names, describe all input parameters, explain what your function returns, provide default values, and add examples. This will all be contextualized, and the agent will need to understand what the tool is intended for.
    2. Describe the result, not the execution flow—that is, explain what needs to be done, not how to do it, and don't try to explain the sequence of actions.
    3. Tools should be simple—don't overcomplicate things or write functions with multiple purposes.
    4. Build tools so that they don't return large volumes of data, but only the essentials—for example, instead of returning a table with a large number of rows, simply store the table name for future reference.
    And this is just the basics. It all seems very complex and confusing, but in reality, you can build your first agent (level 2) in 15 minutes. Let's write an agent that automates work with meta tags in SEO.

    Let's make the simplest, most functional AI Agent

    What you will need and what you should consider before starting

    It took me a lot of effort to come up with something that would be both easy to implement and even useful. This is what we'll be writing: an AI Agent for identifying keywords for search engine optimization based on the description of an article's idea, based on the top 20 Google search results.
    It's not complicated; we'll need the following components:
    1. One Google model - gemini-2.5-flash.
    2. A search results parsing tool - Tavily.
    3. And the language we'll be writing it in - Python/LangChain.
    By the way, you can use my search results parser written in Python instead of Tavily. You can read about how to write and use it in the corresponding article at the link.
    Just a heads up, Gemini isn't available in some countries (like mine). You can upgrade to a different model (here's a list of supported models from LangChain) or install a VPN.
    Why did I choose this particular stack for my agent? Well, it's all pretty straightforward: they have free limits you can practice on and get the hang of. And, in general, the limits they offer are more than enough for personal use. And getting API keys for them is very easy, a matter of minutes.

    Obtaining an API key to use Gemini

    Visit Google's official website to create, integrate, and configure your own AI Chats: https://aistudio.google.com/app/api-keys . You'll need a Google account, of course. On the dashboard:
    Click Create API Key, then enter a name for the key and select "Default Gemini Project." The API key is ready; copy or save it; we'll need it later.

    Obtain an API key to use Tavily

    Now let's get an API key for Tavily. We'll need it to get search results not only from Google but also from other search engines.
    Although that's not exactly how it works. Tavily is a standalone search engine, with no external dependencies on Google, Bing, or Yandex. And frankly, it's not entirely suitable for tools designed for SEO optimization. But I'll show you how to use it, and then we'll migrate to Google's built-in search tool.
    To do this, first, you need to register and go to the home page. Then, create and copy your API key.
    Tavily API Key Creation Page

    AI agent code

    Now we have everything we need to create our first AI Agent. Let's create a separate directory, a virtual environment, install the necessary libraries, and create several files (the agent configuration file and the agent file itself).
    Create a separate directory, a virtual environment, and activate it:
    cd AI_Agent_Post_Ideas python -m venv .venv ./.venv/Scripts/activate
    Let's install the necessary packages:
    pip install langchain-core langchain-community google-genai tavily-search tavily-python langchain-google-genai langchain langchain-prompts pydantic
    Create two files in the same directory: settings.json and main.py. In the first one, place your API keys like this:
    { "LLM_API_KEY": "kjfoweijfoeefoi_kjdsjfkdf_kdfsjdlfk", "TAVILY_API_KEY": "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" }
    The official LangChain project website does it differently. They set environment variables, but I don't do that because I'd have to search for these keys again every time I launch a new terminal. That's why I have a separate file for keys.
    The second file will contain the actual logic of our agent. Here's its code; more details are below.
    import json from langchain_google_genai import ChatGoogleGenerativeAI from langchain.agents import create_agent from tavily import TavilyClient def to_file(filename, data): with open(filename, 'w', encoding='utf-8') as file: json.dump(data, file, indent=2) with open('settings.json', 'r', encoding='utf-8') as settings_file: settings = json.load(settings_file) LLM_API_KEY = settings['LLM_API_KEY'] TAVILY_API_KEY = settings['TAVILY_API_KEY'] def get_search_results(request): """ Get generic search results arguments: query (str): A string that represents идею статьи пользователя return: dict: A dict that represents Search engine result pages """ tavily_client = TavilyClient(api_key=TAVILY_API_KEY) response = tavily_client.search(request, max_results=20) return response def run(): print("This AI agent get article's idea and trying to apply it to SEO") article_idea = input("Hit ENTER to procceed ... ") agent = create_agent( model=ChatGoogleGenerativeAI(model="gemini-2.5-flash", google_api_key=LLM_API_KEY), tools=[get_search_results], system_prompt="You are an SEO specialist looking for opportunities to expand your existing semantic core.", ) # Run the agent result = agent.invoke( {"messages": [{"role": "user", "content": f"Generate me a only possible keywords ideas separated by comma. For this article idea: {article_idea}"}]} ) data_str = result["messages"][1].content[0]['text'] data = { "keywords" : []} for keyword in data_str.split(','): data["keywords"].append(keyword) to_file('keywords.json', data) if __name__ == "__main__": run()
    The external tool in this case is the get_search_results function. The workflow control system is the langchain package itself, which is given a model (gemini-2.5-flash) as input, and only gets_search_results as tools, along with the communication context setup.
    Next, we invoke the agent and provide it with even more context for working with the input data. Note that I haven't written anything about how or what tools it should use. But I have specified the exact result I expect.
    Then I structure it and save it to a file.
    data_str = result["messages"][1].content[0]['text'] data = { "keywords" : []} for keyword in data_str.split(','): data["keywords"].append(keyword) to_file('keywords.json', data)

    AI Agent Code, Enhanced

    Our agent works, but plain text output isn't always convenient for programmatic processing. Furthermore, we're using a third-party search engine when the model itself has powerful built-in capabilities. Let's refactor it and make the agent more professional and autonomous.
    This can be done in the following way:
    1. First, use Gemini's built-in search tool.
    2. Second, use structured inference.
    Let's look at the improved version, and below I'll explain step-by-step what we changed and why.
    from pydantic import BaseModel, Field import json from langchain_google_genai import ChatGoogleGenerativeAI from langchain.agents import create_agent def to_file(filename, data): with open(filename, 'w', encoding='utf-8') as file: file.write(data) with open('settings.json', 'r', encoding='utf-8') as settings_file: settings = json.load(settings_file) LLM_API_KEY = settings['LLM_API_KEY'] TAVILY_API_KEY = settings['TAVILY_API_KEY'] class KeywordsInfo(BaseModel): """The keywords list with related urls from SERP""" keywords: list = Field(description="The list of keywords") urls: list = Field(description="The list of urls from SERP") def run(): print("This AI agent get article's idea and trying to apply it to SEO") article_idea = input("Hit ENTER to procceed ... ") agent = create_agent( model=ChatGoogleGenerativeAI(model="gemini-2.5-flash", google_api_key=LLM_API_KEY), system_prompt="You are an SEO specialist looking for opportunities to expand your existing semantic core.", response_format=KeywordsInfo ) # Run the agent result = agent.invoke( {"messages": [{"role": "user", "content": f"Generate me (based on Search result data) possible keywords ideas, for this article idea: {article_idea}. After this provide me a list of urls from SERP (at minimum 5) that you used for keyword ideas generation."}]} ) to_file('keywords.json', result["structured_response"].model_dump_json()) if __name__ == "__main__": run()
    As you can see, the code has become a bit cleaner and more understandable. We've also removed our tool and now use Gemini's built-in Google search (I discussed built-in tools earlier in the chapter above).
    Also, when calling the agent, we no longer specify how to format and return our data. Now, we specify the format of the returned data in a specially defined class — KeywordsInfo.
    from pydantic import BaseModel, Field # Other code class KeywordsInfo(BaseModel): """The keywords list with related urls from SERP""" keywords: list = Field(description="The list of keywords") urls: list = Field(description="The list of urls from SERP")
    You can find more information about how to specify structured results for agents here. Roughly speaking, when invoking an agent, we specify what we want to get from it, and in the KeywordsInfo class, we refine and structure it accordingly.
    Not all models support structured data. Always read the LangChain documentation first to be sure.
    We create an agent with the appropriate specification of the results we want to get:
    agent = create_agent( model=ChatGoogleGenerativeAI(model="gemini-2.5-flash", google_api_key=LLM_API_KEY), system_prompt="You are an SEO specialist looking for opportunities to expand your existing semantic core.", response_format=KeywordsInfo )
    Afterwards, we launch the agent:
    result = agent.invoke( {"messages": [{"role": "user", "content": f"Generate me (based on Search result data) possible keywords ideas, for this article idea: {article_idea}. After this provide me a list of urls from SERP (at minimum 5) that you used for keyword ideas generation."}]} )
    We receive the data and do what we need with it. In my case, I simply save it to a JSON file. Perhaps I'll then send it to the server or pass it on to the next agent for further tasks.
    def to_file(filename, data): with open(filename, 'w', encoding='utf-8') as file: file.write(data) # Other code ## to_file('keywords.json', result["structured_response"].model_dump_json())
    This was an example of a very simple AI agent, using the built-in search tool and implementing our own tool. With and without a structured response.

    Conclusions, or why generative AI is good for automating "some" processes

    That's what AI Agents are. They're not particularly difficult to write, and they're only as cool as the tools and models that drive them.
    I realize that in this article, I didn't cover important topics like testing, error handling, working with files (or artifacts), and deploying autonomous agents to servers. But that wasn't necessary; in this article, I wanted to show that creating your own AI Agent is even easier than writing a Telegram bot. And that it can be very useful when paired with certain tools.
    This article is just the first, introductory part of working with AI Agents. More and more interesting things will follow. To stay up to date with future articles, subscribe to the corresponding RSS feed or email newsletter, and don't forget to leave a comment if I missed anything or made a mistake. Bye.


    Do not forget to share, like and leave a comment :)

    Comments

    (0)

    captcha
    Send
    LOADING ...
    It's empty now. Be the first (o゚v゚)ノ

    Other

    Similar articles


    How to make a simple python scraper + a ready-for-use example

    Clock
    10.12.2024
    /
    Clock
    02.10.2025
    An eye
    527
    Hearts
    1
    Connected dots
    0
    Connected dots
    0
    Connected dots
    0
    In this article I will show how to make a simple python scraper. This parser is an example of how to parse static and dynamic sites. With the source code …

    How to scrape google serp via google serp position api

    Clock
    15.02.2025
    /
    Clock
    02.10.2025
    An eye
    623
    Hearts
    0
    Connected dots
    0
    Connected dots
    0
    Connected dots
    0
    In this article I will show you how to make a Google serp scraper using their official API. I will show how to get an API key and search engine …

    How my site was aggressively scraped from China and how I blocked them via the .htaccess file

    Clock
    22.09.2025
    /
    Clock
    01.10.2025
    An eye
    1475
    Hearts
    0
    Connected dots
    0
    Connected dots
    0
    Connected dots
    0
    A showcase of how someone actively scraped my site from China, plus charts. What could have caused this (DDoS, parser, clicker)? What were the attacker's goals, and how can I …