#3 - AI Agents: What are they & How to code them
In this article, we will try to understand what an AI agent is, how they came into existence and build a simple agent for ourselves

I'm passionate about continuous learning, keeping myself up to date with latest changes in the IT field. My interest is in the areas of Web Development (JavaScript/TypeScript), Blockchain and GenAI (focusing on creating and deploying memory aware AI-powered RAG applications using LangGraph, LangFuse, QdrantDB and Neo4J). I welcome professional connections to explore new ideas and collaborations.
INTRODUCTION
Language Models can be thought of as super-smart brains trapped in a virtual digital box with no option to delve into the outside world. They can chat, write, and even code with impressive fluency but cannot work with live data. This is due to the limited information in the data with which they were trained on before their knowledge cutoff. But there is a way to equip these digital brains with a ‘body’ allowing them to interact with the internet or with the computer. This is where the fascinating realm of AI agents begin.
AI AGENTS
AI Agents are essentially LLMs equipped with the ability to call upon and utilize functions (which are referred to as ‘tools’) to perform actions with dynamic data in the real world. This simple yet powerful combination unlocks a whole new level of potential. Some of the things an agent could do are:
Fetching Real-Time Data
Interacting with different APIs
Managing Files
Automating Complex Workflows
At the core of AI applications like Cursor, though it leverages an LLM to edit code, refactor, and even generate new code snippets based on your project context, it is powered by its internal "tools" that allow it to interact with your codebase. This is a prime example of how an AI agent can significantly enhance productivity in a specific domain.
SIMPLIFIED AGENT CREATION
Creating these sophisticated AI agents might sound like something only a programmer could do, and traditionally, it did require a fair bit of coding. However, platforms like OpenAI have made this process significantly more accessible with features like Assistant function calling.
Essentially, this allows you to describe the functions (i.e. "tools") to the OpenAI Assistant. When you interact with the Assistant, it can intelligently determine if calling one of these functions would be helpful to answer your query. If so, it generates a JSON payload specifying which function to call and with what parameters. Your application then executes the function and provides the result back to the Assistant, which can then use this information to formulate its final response.
In other words:
Instead of you explicitly telling ChatGPT to use a specific tool, you simply ask your question, and it's internal reasoning, figures out the best tool to use
LET’S BUILD AN AGENT
This example demonstrates the core principles we've been discussing using OpenAI libraries. It might seem daunting at first. So let's break down what we need to code in pseudo-code below (full code at the end):
Defining Tools:
The code should first defines a dictionary calledavaiable_tools. This dictionary should hold the definitions of the tools our agent can use. Each tool should have aname(‘get_weather’, ‘run_command’, etc), afunction(‘fn’) that actually performs the action, and adescriptionthat explains what the tool does. This ‘description’ is crucial because the LLM uses it to decide when and how to use the tool.
The System Prompt:
Thesystem_promptis the ‘mantra’ of our agent's operation. It instructs the LLM on how to behave. Ideally, it should have the "start, plan, action, observe" cycle which is a common framework (‘chain of thought’ prompting style) for building agents:Start: The agent receives a user query.
Plan: The agent analyzes the query and decides on the steps needed to fulfill it, potentially including which tools to use.
Action: If a tool is deemed necessary, the agent selects the appropriate tool and provides the required input.
Observe: The agent receives the output from the tool call.

The prompt should also specify a JSON output format that the LLM should adhere to at each step. This structured output makes it easy for the code to parse the agent's reasoning and actions.
The Conversation Loop:
Thewhile Trueloop that should keeps the agent running, waiting for user input. For each query:The user's query should be added to a list called
messages, which should keep track of the conversation history.
The code should then enters another
while Trueloop to handle the multi-step interaction with the LLM.Use the
client.chat.completions.create()function which should sent the conversation history (including the system prompt and the user query) to the model (in this case, the openAI modelgpt-4o).
The model would responds with a JSON object based on the
system_prompt.The code should parse this JSON output and acts accordingly:
If the
stepis "plan", it should print the agent's thought process.If the
stepis "action", it should extract thefunctionname andinput, call the corresponding function fromavaiable_tools, and then add the "observe" step with the tool's output back to the conversation.If the
stepis "output", it should print the final response to the user and breaks out of the inner loop to wait for the next user query.
This code, while simple, beautifully illustrates the core mechanism of how an LLM can be guided to use external tools to solve problems beyond its initial knowledge base. The system prompt acts as the director, and the structured JSON output allows for seamless communication between the LLM and the Python code that executes the tools.
CONCLUSION
The development of AI agents is still in its early stages, but the potential is immense. As LLMs become even more sophisticated and tool integration becomes more seamless, we can expect to see agents that are capable of handling increasingly complex tasks autonomously. From personalized assistants that manage our schedules and finances to sophisticated problem-solving systems in various industries, the possibilities are truly exciting. My journey into understanding AI agents has only just begun, and I'm eager to see how this field continues to evolve and shape the future of how we interact with technology. It's no longer just about intelligent language models; it's about giving them the means to act and interact with the world around us, and that's a prospect that truly sparks the imagination.
FULL CODE
If you’d like to experiment with creating an AI agent for yourself, then try this python code which when combined with OpenAI’s api key becomes an AI agent (check the comments if you feel stuck or ask ChatGPT!) -
import json
import requests
from dotenv import load_dotenv
from openai import OpenAI
import os
load_dotenv()
# creating a client instance
client = openai.Client()
# tool to run terminal commands
def run_command(command):
result = os.system(command=command)
return result
# tool to fetch current weather using the weather api
def get_weather(city: str):
# print to console everytime this tool is used
print("🔨 Tool Called: get_weather", city)
url = f"https://wttr.in/{city}?format=%C+%t"
response = requests.get(url)
if response.status_code == 200:
return f"The weather in {city} is {response.text}."
return "Something went wrong"
# creating a list of available tools explaining what it does
avaiable_tools = {
"get_weather": {
"fn": get_weather,
"description": "Takes a city name as an input and returns the current weather for the city"
},
"run_command": {
"fn": run_command,
"description": "Takes a command as input to execute on system and returns ouput"
}
}
# system prompt which instructs the LLM to be a helpful agents who can use tools if need be to
# to resolve user queries
system_prompt = f"""
You are an helpfull AI Assistant who is specialized in resolving user query.
You work on start, plan, action, observe mode.
For the given user query and available tools, plan the step by step execution, based on the planning,
select the relevant tool from the available tool. and based on the tool selection you perform an action to call the tool.
Wait for the observation and based on the observation from the tool call resolve the user query.
Rules:
- Follow the Output JSON Format.
- Always perform one step at a time and wait for next input
- Carefully analyse the user query
Output JSON Format:
{{
"step": "string",
"content": "string",
"function": "The name of function if the step is action",
"input": "The input parameter for the function",
}}
Available Tools:
- get_weather: Takes a city name as an input and returns the current weather for the city
- run_command: Takes a command as input to execute on system and returns ouput
Example:
User Query: What is the weather of new york?
Output: {{ "step": "plan", "content": "The user is interseted in weather data of new york" }}
Output: {{ "step": "plan", "content": "From the available tools I should call get_weather" }}
Output: {{ "step": "action", "function": "get_weather", "input": "new york" }}
Output: {{ "step": "observe", "output": "12 Degree Cel" }}
Output: {{ "step": "output", "content": "The weather for new york seems to be 12 degrees." }}
"""
messages = [
{ "role": "system", "content": system_prompt }
]
# this is where we code the agent
while True:
# get the query from user
user_query = input('> ')
# and add it to the messages as user query
messages.append({ "role": "user", "content": user_query })
while True:
# set the model to use and response format to follow
response = client.chat.completions.create(
model="gpt-4o",
response_format={"type": "json_object"},
messages=messages
)
# given the output generated by the model back to the model for context
parsed_output = json.loads(response.choices[0].message.content)
messages.append({ "role": "assistant", "content": json.dumps(parsed_output) })
# print to console if the step followed by the model is 'plan' to see it's thinking
if parsed_output.get("step") == "plan":
print(f"🧠: {parsed_output.get("content")}")
continue
# if the step is 'action' look in the available tools to find something which can be
# useful in resolving the user's query
if parsed_output.get("step") == "action":
tool_name = parsed_output.get("function")
tool_input = parsed_output.get("input")
if avaiable_tools.get(tool_name, False) != False:
output = avaiable_tools[tool_name].get("fn")(tool_input)
messages.append({ "role": "assistant", "content": json.dumps({ "step": "observe", "output": output}) })
continue
# break out of the second loop if the step is 'output' waiting for the next user input
if parsed_output.get("step") == "output":
print(f"🤖: {parsed_output.get("content")}")
break




