Rock Sun
Pydantic AI

使用 Agent

Agent 解决了哪些问题

没有 Agent 的编程

我最早写的一些 AI 程序,直接使用模型自己的 API,我自己写了个 complete 方法来执行 prompt

def complete(prompt, max_tokens=16192):
    safety_settings = [
        {
            "category": "HARM_CATEGORY_HARASSMENT",
            "threshold": "BLOCK_NONE"
        },
        {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "threshold": "BLOCK_NONE"
        },
        {
            "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
            "threshold": "BLOCK_NONE"
        },
        {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "threshold": "BLOCK_NONE"
        },
    ]

    retries = 3
    for _ in range(retries):
        try:
            text = Gemini(model="models/gemini-2.0-flash", max_tokens=max_tokens, safety_settings=safety_settings).complete(prompt).text
            return cleanup_output(text)
        except Exception as e:
            print(f"Error completing prompt: {prompt} \n with error: {e}")
    return ""

看上面的代码,需要自己组织 Prompt ,需要自己配置各种参数。此外,因为调用模型是会出错的,所以还有个重试。

先不要考虑 Agent 是不是一个自主控制的智能体,PydanticAI 的 Agent 是简化上述操作的类。

Agent 介绍

PydanticAI 官网,称 Agent 包含以下元素的容器:

  • System prompt(s) - 由开发人员编写的一组 LLM 说明。
  • Function tool(s) - LLM 在生成响应时可以调用以获取信息的函数。
  • Structured output type - LLM 在运行结束时必须返回的结构化数据类型(如果指定)。
  • Dependency type constraint - 系统提示函数、工具和输出验证器在运行时都可以使用依赖项。
  • LLM model - 与代理关联的可选默认 LLM 模型。也可以在运行代理时指定。
  • Model Settings - 可选的默认模型设置,以帮助微调请求。也可以在运行代理时指定。

下面先聊聊一些基本的功能。

需求

我之前使用 Gemini API 写了一个文章翻译的程序,整个程序分为以下几个步骤:

  • 下载网页
  • 将网页转化为 Markdown 文档
  • 翻译文档为中文
  • 检查文档

之前的程序没有使用 Agent,我将每个功能写了一个方法,下面我尝试用 PydanticAI 的 Agent 实现相同的功能。

实现

from pydantic_ai import Agent
import os
import httpx
from pydantic_ai.models.gemini import GeminiModel  # Update import to use GeminiModel
from pydantic_ai.providers.google_gla import GoogleGLAProvider
from pydantic import BaseModel
from dataclasses import dataclass
import readability
from markdownify import markdownify as md

proxy = os.getenv('HTTP_PROXY')
client = httpx.AsyncClient(proxy=proxy, timeout=120)

model = GeminiModel( 
    model_name='gemini-2.0-flash',
    provider=GoogleGLAProvider(
        http_client=client
    )
)

@dataclass
class MyDeps:  
    url: str

class ArticleMeta(BaseModel):
    url: str
    summary: str

meta_agent = Agent(
    model=model,
    retries=3,
    deps_type=MyDeps,
    output_type=ArticleMeta,
    instructions="""Analyze the content and provide structured response with Chinese."""
)

translator_agent = Agent(
    model=model,
    retries=3,
    instructions="""As a cloud native expert and translator.
Translate this article from English to Chinese. 
Keep the markdown format intact.

TRANSLATION REQUIREMENTS:

- Do not translate names of non-famous people.
- Do not translate the text in the code block.
- Do not print explanation, just print the translation.
- Ensure the text of link will be translated.
- Translate 'obserablity' into '可观测性'.
- Make sure translate the text into Simplified Chinese."""
)

async def fetch_and_convert_to_markdown(url: str) -> str:
    """
    Fetches and converts the content of the given URL to markdown format.

    Args:
        url (str): The URL to fetch content from.

    Returns:
        str: The content converted to markdown format.
    """
    response = await client.get(url)  # Fetch the URL content
    response.raise_for_status()
    original_html = response.text
    doc = readability.Document(original_html)
    main_html = doc.summary()

    markdown = md(main_html, strip=['img', 'script', 'style'])  # Convert HTML to markdown
    return markdown

async def main():
    url = 'https://thenewstack.io/boost-performance-with-react-server-components-and-next-js/'
    content = await fetch_and_convert_to_markdown(url)
    # print(content)
    result = await translator_agent.run(content)
    print(result.output)
    meta = await meta_agent.run(result.output, deps=MyDeps(url=url))
    print(meta)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

这里也用到了 PydanticAI 的一些功能。首先是 Dependencies,构建 Agent 时可以指定 deps_type,这样,就可以在执行 Agent 的 run 方法时可以传入一些参数(url)。别忘了 Pydantic 就是做数据验证了,通过 deps_type ,我们可以限制传入的参数类型,这样可以让程序更健壮。Agent 运行时会参考 Dependencies 的内容,后面也会讲 Agent 运行时如何访问 Dependencies 。

Agent 也使用了 output_type 。我们通过 output_type 指定了返回的对象的类型,我们期望返回文章的总结,所以指明返回对象有一个 summary 元素。Agent 能够理解我们的用意,所以返回了整篇文章的总结。

使用 Tool

现在获取网页的功能写成了一个方法,我们也可以让 Agent 去调用方法,代码修改成:

from pydantic_ai import Agent, RunContext
import os
import httpx
from pydantic_ai.models.gemini import GeminiModel  # Update import to use GeminiModel
from pydantic_ai.providers.google_gla import GoogleGLAProvider
from pydantic import BaseModel
from dataclasses import dataclass
import readability
from markdownify import markdownify as md

proxy = os.getenv('HTTP_PROXY')
client = httpx.AsyncClient(proxy=proxy, timeout=120)

model = GeminiModel( 
    model_name='gemini-2.0-flash',
    provider=GoogleGLAProvider(
        http_client=client
    )
)

@dataclass
class MyDeps:  
    url: str


class ArticleMeta(BaseModel):
    url: str
    summary: str

meta_agent = Agent(
    model=model,
    retries=3,
    deps_type=MyDeps,
    output_type=ArticleMeta,
    instructions="""Analyze the content and provide structured response with Chinese."""
)

translator_agent = Agent(
    model=model,
    retries=3,
    instructions="""As a cloud native expert and translator.
Fetch and Translate this article from English to Chinese. 
Keep the markdown format intact.

TRANSLATION REQUIREMENTS:

- Do not translate names of non-famous people.
- Do not translate the text in the code block.
- Do not print explanation, just print the translation.
- Ensure the text of link will be translated.
- Translate 'obserablity' into '可观测性'.
- Make sure translate the text into Simplified Chinese."""
)


@translator_agent.tool_plain
async def fetch_and_convert_to_markdown(url: str) -> str:
    """
    Fetches and converts the content of the given URL to markdown format.

    Args:
        url (str): The URL to fetch content from.

    Returns:
        str: The content converted to markdown format.
    """
    response = await client.get(url)  # Fetch the URL content
    response.raise_for_status()
    original_html = response.text
    doc = readability.Document(original_html)
    main_html = doc.summary()

    markdown = md(main_html, strip=['img', 'script', 'style'])  # Convert HTML to markdown
    return markdown

async def main():
    url = 'https://thenewstack.io/boost-performance-with-react-server-components-and-next-js/'
    # print(content)
    result = await translator_agent.run(url)
    print(result.output)
    meta = await meta_agent.run(result.output, deps=MyDeps(url=url))
    print(meta)

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

这样也可以?有一点神奇,后台发生了什么?这是使用 Agent 编程常发生的疑问,我们会通过可观测性工具来解决,下一章见。