Guides

Agents

Agents allow LLMs to use user-defined tools to complete difficult specialized tasks.

Overview

What is an agent?

An agent is an AI component that utilizes a Language Model (LLM) as an interpreter and decision maker. Unlike LLMs, agents do not need to respond immediately to user requests. Instead, they can call upon user-defined tools for specialized information. These tools are functions that enable the agent to perform specific tasks, such as calculations, web searches, or accessing custom data from private knowledge bases.

The main purpose of an agent is to enhance the capabilities of the underlying LLM, enabling more effective interactions with users by accessing external resources and performing tasks beyond its inherent knowledge. This transformation turns the LLM into a versatile and powerful high-functioning assistant that can outsource complex tasks to specialized tools.

📘

Stateless Nature of Agents

SGP Agents are intentionally stateless, emitting outputs one step at a time. Client-side applications are responsible for managing message history, tool execution, and responses. This gives users greater flexibility to write and execute custom tools, and grants them explicit control over their message history.

Agent Workflow

  1. User Query: User sends single message to initiate a conversation with the Agent.

  2. Agent Response or Tool Request: The Agent executes a step, providing a response or asking for tool execution to gather more context.

  3. Client-Side Tool Execution: If the agent asks the user to use a tool, the user executes the requested tool with arguments provided by the Agent.

  4. Message Queue Management: The user appends the agent request and tool output messages to the message queue to maintain conversation history and sends the updated message queue back to the Agent.

  5. Formulating Final Response: This process is repeated until the Agent gathers enough context for a complete and accurate response.

Advantages of Using Agents

  1. Adaptability: Agents enable dynamic expansion of AI capabilities without retraining the entire LLM. New tools can be added, allowing them to handle a wide range of tasks.

  2. Efficient Resource Utilization: Since the scope of the LLM within an Agent is reduced to query interpretation and decision making, users can use LLMs with less parameters, reducing operating costs and decreasing inference latency.

  3. Custom Data Retrieval: Agents can access user-uploaded data from private knowledge bases. This allows users to tailor applications to different domains without the expensive R&D cost of fine-tuning for each domain.

  4. Customizability: Clients can create custom tools specific to their domain or knowledge base, tailoring the Agent's functionality to their unique needs.

Agents play a crucial role in extending the capabilities of Language Models beyond their pre-trained knowledge. By enabling access to specialized tools, Agents transform LLMs into high-functioning general assistants that can dynamically interact with users, provide real-time information, and perform various tasks. Their adaptability, efficient resource utilization, and ability to handle real-world challenges make them the ideal interface for our generative AI application platform.

Quick Start Example

To show you how Agents work, let's make some simple requests to the Agent API.

📘

For more in-depth guides, please refer to our Guides section in the sidebar for comprehensive tutorials about working with Agents or you can directly browse various end-to-end working examples we've coded up by visiting Recipes in the top navigation bar

First, let's import the requests library and set our constants

import requests

EXECUTE_AGENT_URL = "https://api.spellbook.scale.com/egp/v1/agents/execute"
SPELLBOOK_API_KEY = "sampleapikey12345"

HEADERS = headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "x-api-key": SPELLBOOK_API_KEY,
}

Let's then configure our agent to use GPT 3.5 Turbo 0613 as inform it that we have a local function called search_google(query: str) at our disposal that can make a Google Search request.

Then, let's ask the Agent something it won't know without using our tool: "What is an Apple Vision Pro?"

Request 1

data = {
    "model": "gpt-3.5-turbo-0613",
    "tools": [
        {
            "arguments": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The query to search for"
                    }
                }
            },
            "name": "search_google",
            "description": "A Google Search tool. Useful for when you need to answer questions about current events or about something you don't know."
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": "What is an Apple Vision Pro?"
        }
    ]
}
requests.post(EXECUTE_AGENT_URL, headers=HEADERS, json=data)

Response 1

{
  "action":"tool_request",
  "context":{
    "content":null,
    "tool_request":{
      "name":"search_google",
      "arguments":"{\\n  \"query\": \"Apple Vision Pro\"\\n}"
    }
  }
}

Because we are asking about something that wasn't released when the model was trained, the agent is asking us to give it more information using the search_google tool. To help us out, it responds with the arguments it wants us to pass to the tool function.

Usually, here we would execute the tool as requested by the agent.

tool_response = search_google("Apple Vision Pro")

But, we're just mocking out this tool for now, so let's assume it output the following set of google search results:

1. Vision Pro is Apple's first 3D camera that allows users to capture spatial photos and videos in 3D.
2. It is an upcoming mixed-reality headset developed by Apple Inc.
3. The Vision Pro headset was announced on June 5, 2023, at Apple's Worldwide Developers Conference.

Because the agent is stateless, we must then make a follow up Agent API request with our running list of chat history messages. This new list of message must include both an AgentMessage, which is the agent's response to our previous request, and a ToolMessage, which includes our search_google tool response.

Request 2

data = {
    "model": "gpt-3.5-turbo-0613",
    "tools": [
        {
            "arguments": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The query to search for"
                    }
                }
            },
            "name": "search_google",
            "description": "A Google Search tool. Useful for when you need to answer questions about current events or about something you don't know."
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": "What is an Apple Vision Pro?"
        },
        {
            "role": "agent",
            "tool_request": {
                "name": "search_google",
                "arguments": '{"query": "Apple Vision Pro"}'
            }
        },
        {
            "role": "tool",
            "name": "search_google",
            "content": "1. Vision Pro is Apple's first 3D camera that allows users to capture spatial photos and videos in 3D. 2. It is an upcoming mixed-reality headset developed by Apple Inc. 3. The Vision Pro headset was announced on June 5, 2023, at Apple's Worldwide Developers Conference.",
        }
    ]
}
requests.post(EXECUTE_AGENT_URL, headers=HEADERS, json=data)

Response 2

{
  "action":"content",
  "context":{
    "content":"The Apple Vision Pro is an upcoming mixed-reality headset developed by Apple Inc. It is designed to display augmented reality content overlaid on the real world, creating a seamless blend of the real and digital worlds. The Vision Pro headset was announced on June 5, 2023, at Apple's Worldwide Developers Conference. Apple has also released a Vision Pro developer kit for developers to create and test apps for visionOS before the headset is available to the public. It offers an immersive and interactive experience for users.",
    "tool_request":null
  }
}

Now that the agent has enough information to answer our question, it is able to output an informed response to our original query! 😄


What’s Next

See the following guide for an working example of this agent or directly refer to the recipe for a code walkthrough.