AI Agents: Understanding and Coding Basics

INTRODUCTION

Language Models can be thought of as super-smart brains trapped in a virtual digital box with no option to delve into the outside world. They can chat, write, and even code with impressive fluency but cannot work with live data. This is due to the limited information in the data with which they were trained on before their knowledge cutoff. But there is a way to equip these digital brains with a ‘body’ allowing them to interact with the internet or with the computer. This is where the fascinating realm of AI agents begin.

AI AGENTS

AI Agents are essentially LLMs equipped with the ability to call upon and utilize functions (which are referred to as ‘tools’) to perform actions with dynamic data in the real world. This simple yet powerful combination unlocks a whole new level of potential. Some of the things an agent could do are:

Fetching Real-Time Data
Interacting with different APIs
Managing Files
Automating Complex Workflows

At the core of AI applications like Cursor, though it leverages an LLM to edit code, refactor, and even generate new code snippets based on your project context, it is powered by its internal "tools" that allow it to interact with your codebase. This is a prime example of how an AI agent can significantly enhance productivity in a specific domain.

SIMPLIFIED AGENT CREATION

Creating these sophisticated AI agents might sound like something only a programmer could do, and traditionally, it did require a fair bit of coding. However, platforms like OpenAI have made this process significantly more accessible with features like Assistant function calling.

Essentially, this allows you to describe the functions (i.e. "tools") to the OpenAI Assistant. When you interact with the Assistant, it can intelligently determine if calling one of these functions would be helpful to answer your query. If so, it generates a JSON payload specifying which function to call and with what parameters. Your application then executes the function and provides the result back to the Assistant, which can then use this information to formulate its final response.

In other words:

Instead of you explicitly telling ChatGPT to use a specific tool, you simply ask your question, and it's internal reasoning, figures out the best tool to use

LET’S BUILD AN AGENT

This example demonstrates the core principles we've been discussing using OpenAI libraries. It might seem daunting at first. So let's break down what we need to code in pseudo-code below (full code at the end):

Defining Tools:
The code should first defines a dictionary called avaiable_tools. This dictionary should hold the definitions of the tools our agent can use. Each tool should have a name (‘get_weather’, ‘run_command’, etc), a function (‘fn’) that actually performs the action, and a description that explains what the tool does. This ‘description’ is crucial because the LLM uses it to decide when and how to use the tool.
The System Prompt:
The system_prompt is the ‘mantra’ of our agent's operation. It instructs the LLM on how to behave. Ideally, it should have the "start, plan, action, observe" cycle which is a common framework (‘chain of thought’ prompting style) for building agents:
- Start: The agent receives a user query.
- Plan: The agent analyzes the query and decides on the steps needed to fulfill it, potentially including which tools to use.
- Action: If a tool is deemed necessary, the agent selects the appropriate tool and provides the required input.
- Observe: The agent receives the output from the tool call.

The prompt should also specify a JSON output format that the LLM should adhere to at each step. This structured output makes it easy for the code to parse the agent's reasoning and actions.

The Conversation Loop:
The while True loop that should keeps the agent running, waiting for user input. For each query:
- The user's query should be added to a list called messages, which should keep track of the conversation history.
- The code should then enters another while True loop to handle the multi-step interaction with the LLM.
- Use the client.chat.completions.create() function which should sent the conversation history (including the system prompt and the user query) to the model (in this case, the openAI model gpt-4o).
- The model would responds with a JSON object based on the system_prompt.
- The code should parse this JSON output and acts accordingly:
  - If the step is "plan", it should print the agent's thought process.
  - If the step is "action", it should extract the function name and input, call the corresponding function from avaiable_tools, and then add the "observe" step with the tool's output back to the conversation.
  - If the step is "output", it should print the final response to the user and breaks out of the inner loop to wait for the next user query.

This code, while simple, beautifully illustrates the core mechanism of how an LLM can be guided to use external tools to solve problems beyond its initial knowledge base. The system prompt acts as the director, and the structured JSON output allows for seamless communication between the LLM and the Python code that executes the tools.

CONCLUSION

The development of AI agents is still in its early stages, but the potential is immense. As LLMs become even more sophisticated and tool integration becomes more seamless, we can expect to see agents that are capable of handling increasingly complex tasks autonomously. From personalized assistants that manage our schedules and finances to sophisticated problem-solving systems in various industries, the possibilities are truly exciting. My journey into understanding AI agents has only just begun, and I'm eager to see how this field continues to evolve and shape the future of how we interact with technology. It's no longer just about intelligent language models; it's about giving them the means to act and interact with the world around us, and that's a prospect that truly sparks the imagination.

FULL CODE

If you’d like to experiment with creating an AI agent for yourself, then try this python code which when combined with OpenAI’s api key becomes an AI agent (check the comments if you feel stuck or ask ChatGPT!) -

import json
import requests
from dotenv import load_dotenv
from openai import OpenAI
import os

load_dotenv()

# creating a client instance 
client = openai.Client()

# tool to run terminal commands
def run_command(command):
    result = os.system(command=command)
    return result

# tool to fetch current weather using the weather api
def get_weather(city: str):
    # print to console everytime this tool is used
    print("🔨 Tool Called: get_weather", city)

    url = f"https://wttr.in/{city}?format=%C+%t"
    response = requests.get(url)

    if response.status_code == 200:
        return f"The weather in {city} is {response.text}."
    return "Something went wrong"

# creating a list of available tools explaining what it does
avaiable_tools = {
    "get_weather": {
        "fn": get_weather,
        "description": "Takes a city name as an input and returns the current weather for the city"
    },
    "run_command": {
        "fn": run_command,
        "description": "Takes a command as input to execute on system and returns ouput"
    }
}

# system prompt which instructs the LLM to be a helpful agents who can use tools if need be to
# to resolve user queries 
system_prompt = f"""
    You are an helpfull AI Assistant who is specialized in resolving user query.
    You work on start, plan, action, observe mode.
    For the given user query and available tools, plan the step by step execution, based on the planning,
    select the relevant tool from the available tool. and based on the tool selection you perform an action to call the tool.
    Wait for the observation and based on the observation from the tool call resolve the user query.

    Rules:
    - Follow the Output JSON Format.
    - Always perform one step at a time and wait for next input
    - Carefully analyse the user query

    Output JSON Format:
    {{
        "step": "string",
        "content": "string",
        "function": "The name of function if the step is action",
        "input": "The input parameter for the function",
    }}

    Available Tools:
    - get_weather: Takes a city name as an input and returns the current weather for the city
    - run_command: Takes a command as input to execute on system and returns ouput

    Example:
    User Query: What is the weather of new york?
    Output: {{ "step": "plan", "content": "The user is interseted in weather data of new york" }}
    Output: {{ "step": "plan", "content": "From the available tools I should call get_weather" }}
    Output: {{ "step": "action", "function": "get_weather", "input": "new york" }}
    Output: {{ "step": "observe", "output": "12 Degree Cel" }}
    Output: {{ "step": "output", "content": "The weather for new york seems to be 12 degrees." }}
"""

messages = [
    { "role": "system", "content": system_prompt }
]

# this is where we code the agent
while True:
    # get the query from user
    user_query = input('> ')
    # and add it to the messages as user query
    messages.append({ "role": "user", "content": user_query })

    while True:
        # set the model to use and response format to follow
        response = client.chat.completions.create(
            model="gpt-4o",
            response_format={"type": "json_object"},
            messages=messages
        )
        # given the output generated by the model back to the model for context
        parsed_output = json.loads(response.choices[0].message.content)
        messages.append({ "role": "assistant", "content": json.dumps(parsed_output) })

        # print to console if the step followed by the model is 'plan' to see it's thinking 
        if parsed_output.get("step") == "plan":
            print(f"🧠: {parsed_output.get("content")}")
            continue

        # if the step is 'action' look in the available tools to find something which can be 
        # useful in resolving the user's query
        if parsed_output.get("step") == "action":
            tool_name = parsed_output.get("function")
            tool_input = parsed_output.get("input")

            if avaiable_tools.get(tool_name, False) != False:
                output = avaiable_tools[tool_name].get("fn")(tool_input)
                messages.append({ "role": "assistant", "content": json.dumps({ "step": "observe", "output":  output}) })
                continue
        # break out of the second loop if the step is 'output' waiting for the next user input
        if parsed_output.get("step") == "output":
            print(f"🤖: {parsed_output.get("content")}")
            break

#3 - AI Agents: What are they & How to code them

INTRODUCTION

AI AGENTS

SIMPLIFIED AGENT CREATION

LET’S BUILD AN AGENT

CONCLUSION

FULL CODE

REFERENCE

Comments

GenAI series

#4 - Fine Tuning An LLM

More from this blog

The Role of Game Theory in Cryptocurrency Survival

#4. Spring Boot - Create A Simple REST API Server

#10 - LangGraph: A Gentle Introduction

#9 - Tracing in AI

#8 - Advanced RAG: Knowledge Graph

Command Palette

INTRODUCTION

AI AGENTS

SIMPLIFIED AGENT CREATION

LET’S BUILD AN AGENT

CONCLUSION

FULL CODE

REFERENCE

Comments

GenAI series

#4 - Fine Tuning An LLM

More from this blog