使用 Agent
Agent 解决了哪些问题
没有 Agent 的编程
我最早写的一些 AI 程序,直接使用模型自己的 API,我自己写了个 complete 方法来执行 prompt:
def complete(prompt, max_tokens=16192):
safety_settings = [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE"
},
]
retries = 3
for _ in range(retries):
try:
text = Gemini(model="models/gemini-2.0-flash", max_tokens=max_tokens, safety_settings=safety_settings).complete(prompt).text
return cleanup_output(text)
except Exception as e:
print(f"Error completing prompt: {prompt} \n with error: {e}")
return ""看上面的代码,需要自己组织 Prompt ,需要自己配置各种参数。此外,因为调用模型是会出错的,所以还有个重试。
先不要考虑 Agent 是不是一个自主控制的智能体,PydanticAI 的 Agent 是简化上述操作的类。
Agent 介绍
PydanticAI 官网,称 Agent 包含以下元素的容器:
- System prompt(s) - 由开发人员编写的一组 LLM 说明。
- Function tool(s) - LLM 在生成响应时可以调用以获取信息的函数。
- Structured output type - LLM 在运行结束时必须返回的结构化数据类型(如果指定)。
- Dependency type constraint - 系统提示函数、工具和输出验证器在运行时都可以使用依赖项。
- LLM model - 与代理关联的可选默认 LLM 模型。也可以在运行代理时指定。
- Model Settings - 可选的默认模型设置,以帮助微调请求。也可以在运行代理时指定。
下面先聊聊一些基本的功能。
需求
我之前使用 Gemini API 写了一个文章翻译的程序,整个程序分为以下几个步骤:
- 下载网页
- 将网页转化为 Markdown 文档
- 翻译文档为中文
- 检查文档
之前的程序没有使用 Agent,我将每个功能写了一个方法,下面我尝试用 PydanticAI 的 Agent 实现相同的功能。
实现
from pydantic_ai import Agent
import os
import httpx
from pydantic_ai.models.gemini import GeminiModel # Update import to use GeminiModel
from pydantic_ai.providers.google_gla import GoogleGLAProvider
from pydantic import BaseModel
from dataclasses import dataclass
import readability
from markdownify import markdownify as md
proxy = os.getenv('HTTP_PROXY')
client = httpx.AsyncClient(proxy=proxy, timeout=120)
model = GeminiModel(
model_name='gemini-2.0-flash',
provider=GoogleGLAProvider(
http_client=client
)
)
@dataclass
class MyDeps:
url: str
class ArticleMeta(BaseModel):
url: str
summary: str
meta_agent = Agent(
model=model,
retries=3,
deps_type=MyDeps,
output_type=ArticleMeta,
instructions="""Analyze the content and provide structured response with Chinese."""
)
translator_agent = Agent(
model=model,
retries=3,
instructions="""As a cloud native expert and translator.
Translate this article from English to Chinese.
Keep the markdown format intact.
TRANSLATION REQUIREMENTS:
- Do not translate names of non-famous people.
- Do not translate the text in the code block.
- Do not print explanation, just print the translation.
- Ensure the text of link will be translated.
- Translate 'obserablity' into '可观测性'.
- Make sure translate the text into Simplified Chinese."""
)
async def fetch_and_convert_to_markdown(url: str) -> str:
"""
Fetches and converts the content of the given URL to markdown format.
Args:
url (str): The URL to fetch content from.
Returns:
str: The content converted to markdown format.
"""
response = await client.get(url) # Fetch the URL content
response.raise_for_status()
original_html = response.text
doc = readability.Document(original_html)
main_html = doc.summary()
markdown = md(main_html, strip=['img', 'script', 'style']) # Convert HTML to markdown
return markdown
async def main():
url = 'https://thenewstack.io/boost-performance-with-react-server-components-and-next-js/'
content = await fetch_and_convert_to_markdown(url)
# print(content)
result = await translator_agent.run(content)
print(result.output)
meta = await meta_agent.run(result.output, deps=MyDeps(url=url))
print(meta)
if __name__ == "__main__":
import asyncio
asyncio.run(main())这里也用到了 PydanticAI 的一些功能。首先是 Dependencies,构建 Agent 时可以指定 deps_type,这样,就可以在执行 Agent 的 run 方法时可以传入一些参数(url)。别忘了 Pydantic 就是做数据验证了,通过 deps_type ,我们可以限制传入的参数类型,这样可以让程序更健壮。Agent 运行时会参考 Dependencies 的内容,后面也会讲 Agent 运行时如何访问 Dependencies 。
Agent 也使用了 output_type 。我们通过 output_type 指定了返回的对象的类型,我们期望返回文章的总结,所以指明返回对象有一个 summary 元素。Agent 能够理解我们的用意,所以返回了整篇文章的总结。
使用 Tool
现在获取网页的功能写成了一个方法,我们也可以让 Agent 去调用方法,代码修改成:
from pydantic_ai import Agent, RunContext
import os
import httpx
from pydantic_ai.models.gemini import GeminiModel # Update import to use GeminiModel
from pydantic_ai.providers.google_gla import GoogleGLAProvider
from pydantic import BaseModel
from dataclasses import dataclass
import readability
from markdownify import markdownify as md
proxy = os.getenv('HTTP_PROXY')
client = httpx.AsyncClient(proxy=proxy, timeout=120)
model = GeminiModel(
model_name='gemini-2.0-flash',
provider=GoogleGLAProvider(
http_client=client
)
)
@dataclass
class MyDeps:
url: str
class ArticleMeta(BaseModel):
url: str
summary: str
meta_agent = Agent(
model=model,
retries=3,
deps_type=MyDeps,
output_type=ArticleMeta,
instructions="""Analyze the content and provide structured response with Chinese."""
)
translator_agent = Agent(
model=model,
retries=3,
instructions="""As a cloud native expert and translator.
Fetch and Translate this article from English to Chinese.
Keep the markdown format intact.
TRANSLATION REQUIREMENTS:
- Do not translate names of non-famous people.
- Do not translate the text in the code block.
- Do not print explanation, just print the translation.
- Ensure the text of link will be translated.
- Translate 'obserablity' into '可观测性'.
- Make sure translate the text into Simplified Chinese."""
)
@translator_agent.tool_plain
async def fetch_and_convert_to_markdown(url: str) -> str:
"""
Fetches and converts the content of the given URL to markdown format.
Args:
url (str): The URL to fetch content from.
Returns:
str: The content converted to markdown format.
"""
response = await client.get(url) # Fetch the URL content
response.raise_for_status()
original_html = response.text
doc = readability.Document(original_html)
main_html = doc.summary()
markdown = md(main_html, strip=['img', 'script', 'style']) # Convert HTML to markdown
return markdown
async def main():
url = 'https://thenewstack.io/boost-performance-with-react-server-components-and-next-js/'
# print(content)
result = await translator_agent.run(url)
print(result.output)
meta = await meta_agent.run(result.output, deps=MyDeps(url=url))
print(meta)
if __name__ == "__main__":
import asyncio
asyncio.run(main())这样也可以?有一点神奇,后台发生了什么?这是使用 Agent 编程常发生的疑问,我们会通过可观测性工具来解决,下一章见。