[pydantic-ai] Pydantic AI, 도구 검색 기능 도입으로 에이전트의 도구 관리 혁신

2026년 5월 13일수정: 2026년 5월 13일

PR 링크: pydantic/pydantic-ai#5143 상태: Merged | 변경: +0 / -0

들어가며

대규모 언어 모델(LLM) 기반 에이전트가 발전함에 따라, 에이전트가 사용할 수 있는 도구(tools)의 수가 기하급수적으로 증가하는 경우가 많습니다. 이는 단순히 많은 도구를 제공하는 것을 넘어, 에이전트가 효율적으로 필요한 도구를 찾아 사용해야 하는 복잡한 문제를 야기합니다. 기존 방식에서는 모든 도구의 정의가 모델의 초기 컨텍스트에 포함되어 입력 토큰 사용량이 증가하고, 도구 선택의 정확도가 떨어지는 문제가 있었습니다. 특히 수십 개 이상의 도구가 존재하거나, 각기 다른 도메인을 포괄하는 경우 이러한 문제는 더욱 심화됩니다.

Pydantic AI의 최신 PR은 이러한 문제를 해결하기 위해 '도구 검색(Tool Search)' 기능을 도입했습니다. 이 기능은 두 가지 주요 목표를 달성합니다. 첫째, Anthropic 및 OpenAI와 같은 주요 모델 제공업체와의 통합을 통해 네이티브 도구 검색을 지원하여, 모델이 직접 도구의 가시성을 관리하도록 합니다. 둘째, 제공업체 지원 여부와 관계없이 사용자 정의 가능한 검색 전략을 제공하여 유연성을 극대화합니다. 이 글에서는 이 PR이 어떻게 Pydantic AI 에이전트의 도구 관리 방식을 혁신하는지, 코드 변경 사항을 중심으로 상세히 분석하고 그 의미를 조명합니다.

코드 분석

이번 PR의 핵심은 defer_loading 옵션과 ToolSearch 기능을 통해 에이전트가 대규모 도구 세트를 효율적으로 관리할 수 있도록 하는 것입니다. 변경 사항은 주로 문서, 에이전트 그래프 로직, 그리고 모델별 통합 부분에 걸쳐 있습니다.

1. 문서 업데이트 (`docs/tools-advanced.md`)

문서에서는 defer_loading의 필요성과 사용법, 그리고 ToolSearch 기능의 작동 방식에 대한 설명이 대폭 강화되었습니다. 특히, 도구 정의에 사용되는 토큰 수를 줄이고 검색 정확도를 높이기 위한 defer_loading=True 옵션의 중요성이 강조되었습니다.

변경 전:

--- a/docs/tools-advanced.md
+++ b/docs/tools-advanced.md
@@ -604,16 +604,32 @@ For more information on how `end_strategy` works with both function tools and ou
 
 ## Tool Search
 
-Agents with many tools (e.g. [MCP servers](mcp/client.md) exposing dozens of endpoints) can suffer from context bloat and degraded tool selection. Marking tools for deferred loading hides them from the model's initial context; a `search_tools` tool is automatically injected so the model can discover hidden tools by keyword when it needs them.
+Agents with many tools (e.g. [MCP servers](mcp/client.md) exposing dozens of endpoints) can spend a lot of input tokens on tool definitions before any work happens, and tool selection accuracy noticeably degrades past ~30–50 available tools. Marking tools for deferred loading hides them from the model's initial context; the model discovers hidden tools by keyword when it needs them.
 
-This is inspired by Anthropic's [Tool Search Tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool#limits-and-best-practices) for managing large tool collections. Tool search is implemented on the Pydantic AI side and works with any model. Native provider support is planned in [#4167](https://github.com/pydantic/pydantic-ai/issues/4167).
+
+Reach for it when:
+
+* the agent exposes ~10+ tools or more than ~10k tokens of tool definitions
+* tools cover distinct domains (e.g. multiple MCP servers) and only a subset is relevant per request
+* the toolset is growing and you want headroom
+
+Skip it when you have a small, hot toolset where every tool is used most turns — deferring everything would just add a discovery round-trip for no benefit. As a rule of thumb, keep your handful of most-used tools eagerly loaded; defer the long tail.
+
+To opt in, set `defer_loading=True` on individual [`Tool`][pydantic_ai.tools.Tool] / [`@agent.tool`][pydantic_ai.agent.Agent.tool] / [`@agent.tool_plain`][pydantic_ai.agent.Agent.tool_plain] registrations, or use [`.defer_loading()`][pydantic_ai.toolsets.AbstractToolset.defer_loading] on a whole toolset (including [MCP servers](mcp/client.md) and [`FastMCPToolset`][pydantic_ai.toolsets.fastmcp.FastMCPToolset]) — pass a list of tool names to hide specific ones, or `None` to hide all.
+
+Once deferred tools exist, search is handled by the auto-injected [`ToolSearch`][pydantic_ai.capabilities.ToolSearch] capability:
+
+* **Native provider search** on supporting models (Anthropic Sonnet 4.5+, Opus 4.5+, Haiku 4.5+ via [BM25/regex](https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool); OpenAI Responses on GPT-5.4+). Deferred tools are sent to the provider with `defer_loading` on the wire and the provider manages their visibility.
+* **Custom callable** via [`ToolSearch(strategy=...)`][pydantic_ai.capabilities.ToolSearch] — a user-supplied search function. Executed on our side, but routed through the provider's client-executed native surface (Anthropic `tool_reference` blocks, OpenAI `execution='client'`) where supported so the model sees a tool-search call rather than a regular function tool.
+* **Local fallback** on every other model: a `search_tools` function tool matches keywords against tool names and descriptions.
+
Pydantic AI prefers native search whenever available because the discovery exchange happens append-only (a `tool_search_call` + `tool_search_output` pair) — the deferred tools never enter the prompt prefix, so prompt caching is preserved across rounds. The local fallback, by contrast, flips each discovered tool's `defer_loading=False` between rounds, which changes the tool-definition prefix and invalidates the cached request prefix on every discovery turn.
+
+For the model to find tools well, give them descriptive names with consistent prefixes (`github_*`, `slack_*`, `mortgage_*`) and put the keywords a user might search for in the tool's description. A search returns a handful of matches at a time, so the model may iterate (search → discover → call → search again) — instructions can nudge it: "Search by topic when you don't see a tool you need."

변경 후:

문서에는 defer_loading을 적용해야 하는 구체적인 시나리오(예: 10개 이상의 도구, 10k 토큰 이상)가 명시되었습니다. 또한, ToolSearch capability가 어떻게 작동하는지, 네이티브 제공업체 검색, 사용자 정의 콜러블 전략, 로컬 폴백 방식이 상세히 설명되었습니다. 특히, 네이티브 검색이 프롬프트 캐싱을 보존하는 이유와 로컬 폴백 방식의 잠재적 캐시 무효화 영향이 명확히 기술되었습니다.

또한, ToolSearch capability를 설정하는 방법과 사용자 정의 검색 함수(fuzzy_search 예시)를 사용하는 방법이 코드 예제와 함께 제공됩니다.

# tool_search.py 예시
agent = Agent('anthropic:claude-sonnet-4-6')

@agent.tool_plain(defer_loading=True)
def mortgage_calculator(principal: float, rate: float, years: int) -> str:
    """Calculate monthly mortgage payment for a home loan."""
    # ... (implementation)

# tool_search_custom.py 예시
from collections.abc import Sequence
from pydantic_ai import Agent, RunContext
from pydantic_ai.capabilities import ToolSearch
from pydantic_ai.tools import ToolDefinition

def fuzzy_search(
    ctx: RunContext[None], queries: Sequence[str], tools: Sequence[ToolDefinition]
) -> list[str]:
    """Match tools whose name or description contains any query word."""
    needles = [n for q in queries for n in q.lower().split()]
    return [
        t.name
        for t in tools
        if any(n in t.name.lower() or n in (t.description or '').lower() for n in needles)
    ]

agent = Agent('anthropic:claude-sonnet-4-6', capabilities=[ToolSearch(strategy=fuzzy_search)])

@agent.tool_plain(defer_loading=True)
def mortgage_calculator(principal: float, rate: float, years: int) -> str:
    """Calculate monthly mortgage payment for a home loan."""
    monthly_rate = rate / 100 / 12
    n_payments = years * 12
    payment = principal * (monthly_rate * (1 + monthly_rate) ** n_payments) / ((1 + monthly_rate) ** n_payments - 1)
    return f'${payment:.2f}/month'

2. 에이전트 그래프 로직 (`pydantic_ai_slim/pydantic_ai/_agent_graph.py`)

이 파일에서는 ToolSearchTool의 자동 주입 및 관리 로직이 개선되었습니다. 특히, 검색할 도구가 없을 경우 ToolSearchTool을 요청에서 제외하여 불필요한 오버헤드를 줄입니다.

변경 전 (개념적):

ToolSearchTool은 항상 요청에 포함될 수 있었습니다.

변경 후:

--- a/pydantic_ai_slim/pydantic_ai/_agent_graph.py
+++ b/pydantic_ai_slim/pydantic_ai/_agent_graph.py
@@ -501,6 +502,20 @@ async def _prepare_request_parameters(
                 if t is not None:
                     native_tools.append(t)
 
+    # Drop the auto-injected `ToolSearchTool` native tool when the search corpus is empty —
+    # the toolset has nothing to manage, so emitting the native tool would waste a tool slot
+    # and surface an inert native tool in `ModelRequestParameters` snapshots. Filtering
+    # here (at MRP-construction time) keeps the request shape honest before
+    # `prepare_request` runs. Non-optional `ToolSearchTool` instances (user-passed) are
+    # preserved so the request still fails loudly on unsupported models.
+    has_tool_search_corpus = any(t.with_native == ToolSearchTool.kind for t in function_tools)
+    if not has_tool_search_corpus:
+        # Confine the corpus-empty drop to `ToolSearchTool`: other optional native tools
+        # (e.g. a hypothetical `WebSearchTool(optional=True)`) don't have a corpus and
+        # shouldn't be dropped here — they only get dropped on the unsupported-on-this-model
+        # path in `Model.prepare_request`. 
+        native_tools = [t for t in native_tools if not (isinstance(t, ToolSearchTool) and t.optional)]
+
     return models.ModelRequestParameters(
         function_tools=function_tools,
         native_tools=native_tools,

이 코드는 function_tools 내에 ToolSearchTool의 with_native 속성이 있는지 확인합니다. 만약 ToolSearchTool이 없고, ToolSearchTool 인스턴스가 optional=True로 설정되어 있다면, native_tools 리스트에서 해당 인스턴스를 제거합니다. 이는 도구 검색 대상이 없을 때 불필요한 ToolSearchTool이 모델 요청에 포함되는 것을 방지하여, 요청의 효율성을 높이고 불필요한 메타데이터 노출을 줄입니다. 사용자 정의로 명시적으로 ToolSearchTool을 추가한 경우에는 optional 속성과 관계없이 유지되어, 잘못된 설정에 대한 명확한 오류를 발생시킵니다.

3. 메시지 처리 및 타입 변환

PR은 ToolSearchCallPart, ToolSearchReturnPart, NativeToolSearchCallPart, NativeToolSearchReturnPart와 같은 새로운 타입의 메시지 파트를 도입했습니다. 이는 내부적으로 도구 검색 요청 및 응답을 더 명확하게 구분하고 처리하기 위함입니다. 또한, _narrow_tool_call_parts 함수는 응답을 특정 타입으로 좁혀주는 역할을 합니다.

리뷰 댓글에서 지적된 것처럼, tool_kind 필드를 사용하여 BaseToolCallPart 및 BaseToolReturnPart의 기본 동작을 개선했습니다. 이 필드는 Literal['tool_search']와 같은 값으로 설정되어, 특정 도구 검색 관련 파트임을 명확히 합니다. 이는 tool_name에 의존하는 기존 방식보다 더 견고하며, 사용자 정의 도구 이름과의 충돌을 방지합니다.

# 예시: tool_kind를 사용한 타입 판별
# 변경 전 (개념적): tool_name 기반
# if part.tool_name == 'search_tools': ...

# 변경 후: tool_kind 기반
if part.tool_kind == 'tool_search':
    # ToolSearch 관련 처리
    ...

또한, AbstractBuiltinTool.optional 속성이 기본 클래스로 승격되어, ToolSearchTool뿐만 아니라 향후 추가될 다른 내장 도구들도 optional=True 설정 시 지원되지 않는 모델에서 조용히 제거될 수 있도록 일반화되었습니다.

4. Bedrock 및 Anthropic 관련 이슈 해결

리뷰어 kclisp의 상세한 보고를 통해 Bedrock 및 Anthropic API 사용 시 발생했던 여러 이슈가 해결되었습니다. 주요 이슈와 해결책은 다음과 같습니다:

defer_loading=True와 cache_control 충돌: anthropic_cache_tool_definitions=True 설정 시 defer_loading=True 도구가 충돌하는 문제를 해결했습니다. 이는 defer_loading 도구는 프롬프트 캐싱을 사용할 수 없다는 점을 명확히 하고, 관련 설정을 조정하여 해결되었습니다.
고아(Orphaned) 도구 검색 결과: Anthropic API에서 도구 검색 결과가 다음 요청으로 이어지지 않고 누락되는 문제(tool_search_tool_regex 사용 시)를 해결했습니다. PR에서는 이러한 고아된 BuiltinToolSearchCallPart를 와이어 페이로드에서 제거하여, 모델이 불완전한 상태를 받지 않도록 했습니다.
regex 전략의 pattern vs query: Anthropic의 regex 전략이 query 대신 pattern을 사용하는 문제를 해결했습니다. 이는 tool_search_tool_regex_20251119와 같은 특정 도구 정의와 관련된 문제로, 해당 부분을 수정하여 올바르게 작동하도록 했습니다.
Bedrock 접두사 문제: Bedrock의 모델 프로파일 접두사(us.anthropic.claude-haiku-4-5-20251001-v1:0)가 anthropic_model_profile에서 올바르게 처리되지 않아 기능 플래그가 꺼지는 문제를 해결했습니다. 이는 모델 프로파일에서 Bedrock 관련 접두사를 제거하거나 올바르게 처리하도록 수정되었습니다.
Bedrock 기본 bm25 미지원: Bedrock에서 기본 bm25 전략이 지원되지 않아 발생하는 400 오류를 해결했습니다. 이는 Bedrock 환경에서는 tool_search_tool_regex만 지원됨을 명확히 하고, 해당 경우에 대한 처리를 개선했습니다.

이러한 수정은 a2cd460c 커밋 등에서 이루어졌으며, Bedrock 및 Anthropic API와의 통합 안정성을 크게 향상시켰습니다.

왜 이게 좋은가?

이번 PR은 Pydantic AI 에이전트의 도구 관리 및 활용 방식을 근본적으로 개선합니다. 주요 이점은 다음과 같습니다:

효율적인 컨텍스트 관리: defer_loading=True 옵션은 모델의 초기 컨텍스트에 모든 도구 정의를 로드하는 대신, 필요한 도구만 동적으로 로드하도록 하여 입력 토큰 사용량을 획기적으로 줄입니다. 이는 특히 도구 수가 많은 에이전트에서 비용 절감과 성능 향상으로 직결됩니다.
향상된 도구 선택 정확도: 컨텍스트가 깔끔하게 유지되면 모델은 더 적은 노이즈 속에서 관련 도구를 더 정확하게 식별할 수 있습니다. 이는 복잡한 작업 수행 시 에이전트의 성공률을 높입니다.
네이티브 통합을 통한 성능 최적화: Anthropic, OpenAI와 같은 제공업체의 네이티브 도구 검색 기능을 활용함으로써, 도구 검색 및 로딩 과정이 모델 제공업체 자체에서 처리됩니다. 이는 추가적인 API 호출이나 지연 없이 더 빠르고 효율적인 도구 관리를 가능하게 합니다. 특히, 네이티브 검색은 프롬프트 캐싱을 보존하는 데 유리합니다.
유연한 사용자 정의 전략: ToolSearch(strategy=...)를 통해 사용자는 자체적인 검색 로직을 구현할 수 있습니다. 이는 특정 도메인이나 요구사항에 맞는 고도로 맞춤화된 도구 검색 기능을 제공하며, Pydantic AI가 지원하지 않는 환경에서도 유연하게 적용될 수 있습니다.
안정적인 프롬프트 캐싱: 리뷰에서 지적된 바와 같이, 네이티브 도구 검색은 도구 정의 목록의 변경을 최소화하여 프롬프트 캐싱의 효율성을 유지합니다. 반면, 로컬 폴백 방식은 도구 발견 시마다 도구 정의가 변경되어 캐시 무효화를 유발할 수 있다는 점이 문서화되어, 사용자가 성능 특성을 이해하고 최적의 전략을 선택하도록 돕습니다.
견고한 타입 시스템: 새로운 메시지 파트 타입과 tool_kind 필드를 도입하여, 도구 검색 관련 통신이 더욱 명확하고 안전하게 처리됩니다. 이는 디버깅을 용이하게 하고 잠재적인 오류를 줄입니다.

일반적 교훈:

점진적 로딩(Lazy Loading)의 중요성: 모든 리소스를 초기에 로드하는 대신, 필요할 때 동적으로 로드하는 것은 성능과 효율성을 크게 향상시킬 수 있습니다. 이는 LLM 에이전트의 도구 관리뿐만 아니라 일반적인 소프트웨어 개발에서도 중요한 패턴입니다.
모델 제공업체와의 통합 최적화: LLM 에이전트 프레임워크는 모델 제공업체의 고유한 기능을 최대한 활용하여 성능을 극대화해야 합니다. 네이티브 기능 지원은 에이전트의 응답 속도와 정확성에 직접적인 영향을 미칩니다.
사용자 정의 가능성과 기본값의 균형: 강력한 기본값과 자동화된 기능을 제공하면서도, 사용자가 특정 요구사항에 맞게 동작을 사용자 정의할 수 있는 유연성을 제공하는 것이 중요합니다.
명확한 문서화와 성능 특성 이해: 새로운 기능의 작동 방식, 특히 성능에 영향을 미치는 측면(예: 프롬프트 캐싱)에 대한 명확한 문서는 사용자가 기능을 올바르게 이해하고 최적으로 활용하는 데 필수적입니다.

References

참고 자료

⚠️ 알림: 이 분석은 AI가 실제 코드 diff를 기반으로 작성했습니다.

PR Analysis 의 다른글

이전글 [vllm] vLLM의 Triton 통합 어텐션 커널에 Tensor Descriptor 최적화 도입
현재글 : [pydantic-ai] Pydantic AI, 도구 검색 기능 도입으로 에이전트의 도구 관리 혁신
다음글 [vllm] vLLM, DeepSeek V4 모델의 저지연을 위한 RMSNorm과 라우터 GEMV 연산 융합으로 성능 극대화

댓글

관련 포스트

PR Analysis 의 다른글