AI agents are sophisticated systems

Let's delve into the toolbox ๐Ÿงฐ of AI agents and understand the core components that enable them to interact with the world and achieve their goals. These aren't your run-of-the-mill software programs; AI agents are sophisticated systems that combine reasoning, logic, and the ability to access external information ๐Ÿง . Their functionality stems from a combination of models, tools, and an orchestration layer, all working together to accomplish complex tasks.

AI Agents Toolbox

AI Agents Toolbox

Let's delve into the toolbox ๐Ÿงฐ of AI agents and understand the core components that enable them to interact with the world and achieve their goals. These aren't your run-of-the-mill software programs; AI agents are sophisticated systems that combine reasoning, logic, and the ability to access external information ๐Ÿง . Their functionality stems from a combination of models, tools, and an orchestration layer, all working together to accomplish complex tasks.

At the heart of an AI agent lies a language model (LM). This model acts as the central decision-maker, guiding the agent's processes. The model can be a general-purpose one, a multimodal one, or a fine-tuned one, depending on the agent's specific needs. It uses reasoning frameworks like ReAct, Chain-of-Thought (CoT), or Tree-of-Thoughts (ToT) to make decisions about its next steps. It is important to note that models are typically not trained with specific configurations of the agent (i.e. tool choices), however, they may be refined for the tasks by giving examples of how to use tools or perform reasoning steps.

But a model on its own is limited. To truly act, agents need tools ๐Ÿ› ๏ธ that connect them to the outside world. These tools allow agents to go beyond their training data and interact with external systems and data. Let's explore the main types of tools:

  • Extensions: These are like keys ๐Ÿ”‘ that unlock access to external APIs. Extensions allow agents to seamlessly execute APIs, regardless of their underlying implementation. An extension teaches an agent how to use an API endpoint through examples and by specifying what arguments or parameters are required to make a successful API call. For instance, an agent can use a Google Flights API extension to book a flight. Extensions are crafted independently of the agent, allowing agents to select the most appropriate tool for the task at hand based on the available examples.
  • Functions: Think of these as specialized code modules โš™๏ธ that agents can call to perform specific actions. Unlike Extensions, functions are executed on the client-side, not the agent-side, which provides the developer more control over data flow and system execution. With functions, a developer can iterate on agent development without needing additional infrastructure, and run operations asynchronously. For example, a function could retrieve a list of cities from an API that can be used to download images or data. Functions allow a model to output a function and its arguments, but the model does not make a live API call itself. The client side UI then manages the actual API call using parameters provided by the model.
  • Data Stores: These are like vast libraries ๐Ÿ“š that give agents access to both structured and unstructured data. Data stores provide agents with information in its original format, eliminating the need for data transformations, model retraining or fine-tuning. They typically take the form of vector databases which the agent can use to extract information using techniques like Retrieval Augmented Generation (RAG). For instance, an agent can use a data store to access website content, spreadsheets, or PDF documents. A query is turned into embeddings and matched against the vector database, the matched content is returned to the agent and the agent formulates a response.

The orchestration layer โš™๏ธ is like the agent's control center. It's a cyclical process that governs how the agent takes in information, reasons, and decides on its next actions. The complexity of this layer depends on the agent and task. Some loops can be simple calculations while others use techniques like chained logic, machine learning algorithms or probabilistic reasoning. The orchestration layer also manages the agent's memory and state. Prompt engineering and associated frameworks help to guide reasoning and planning in this layer, enabling the agent to interact effectively with its environment. Reasoning techniques such as ReAct, CoT and ToT are commonly used to structure the agent's thought process.

Cognitive architectures define how agents operate. They include the orchestration layer and are responsible for maintaining memory, state, reasoning and planning. They guide agents through the cycle of taking in information, planning, executing, and making adjustments. An agent is programmed to use reasoning frameworks such as ReAct, where the agent follows a loop of question, thought, action, and observation until a final answer is provided to a user. For instance, a ReAct loop might start with a user's query, the agent then "thinks" about what tool to use, "acts" by choosing a specific tool, "observes" the result and then goes back to refine its approach until the user's goal is fulfilled.

To further enhance model performance, there are several targeted learning approaches that can be used:

  • In-context learning: This involves providing the model with prompts, tools, and a few examples at inference time, allowing it to learn "on the fly" how and when to use tools for a specific task. ReAct is an example of this approach.
  • Retrieval-based in-context learning: Here, the model is provided with the most relevant information, tools and examples from external memory, populating the prompt dynamically. Vertex AI extensions and data stores are an example of this.
  • Fine-tuning based learning: In this approach, the model is trained with a large dataset of specific examples which helps it to understand when and how to apply certain tools.

In summary: Tools extend the capabilities of AI agents, connecting them to a wide array of external systems and data. Extensions, functions, and data stores each serve unique purposes, allowing developers flexibility in how they structure their applications and control the flow of information. Using these tools, in combination with advanced reasoning techniques and targeted learning approaches, AI agents can autonomously perform complex tasks and solve real-world problems. AI agents are more than just chatbots; they are intelligent systems that can reason, plan, and act, leading to new possibilities in automation and beyond.