[pydantic-ai] anthropic_cache_messages 설정 추가 및 캐시 포인트 자동 제한

2025년 11월 25일수정: 2025년 11월 25일

PR 링크: pydantic/pydantic-ai#3442 상태: Merged | 변경: +786 / -29

들어가며

Anthropic prompt caching은 요청당 최대 4개의 cache point만 허용합니다. 기존에는 사용자가 CachePoint() 마커를 직접 삽입해야 했고, system prompt/tool/message 캐시를 모두 활성화하면 4개 제한을 초과하여 API 에러가 발생할 수 있었습니다. 이 PR은 anthropic_cache_messages 설정을 추가하고, 4개 제한을 초과하는 캐시 포인트를 자동으로 제거하는 _limit_cache_points() 메서드를 도입합니다.

핵심 코드 분석

1. anthropic_cache_messages 설정 추가

class AnthropicModelSettings(ModelSettings, total=False):
    anthropic_cache_messages: bool | Literal['5m', '1h']
    """Convenience setting to enable caching for the last user message.
    
    When enabled, this automatically adds a cache point to the last content block
    in the final user message."""

마지막 사용자 메시지에 자동으로 cache_control을 부여합니다. 멀티턴 대화에서 이전 대화 히스토리를 캐시하는 데 유용합니다.

2. 마지막 메시지에 캐시 적용

if anthropic_messages and (cache_messages := model_settings.get('anthropic_cache_messages')):
    ttl: Literal['5m', '1h'] = '5m' if cache_messages is True else cache_messages
    m = anthropic_messages[-1]
    content = m['content']
    if isinstance(content, str):
        m['content'] = [BetaTextBlockParam(
            text=content, type='text',
            cache_control=BetaCacheControlEphemeralParam(type='ephemeral', ttl=ttl),
        )]
    else:
        content = cast(list[BetaContentBlockParam], content)
        self._add_cache_control_to_last_param(content, ttl)

문자열 콘텐츠는 리스트 형태로 변환한 후, 마지막 블록에 cache_control을 추가합니다.

3. 캐시 포인트 자동 제한

self._limit_cache_points(system_prompt, anthropic_messages, tools)

_messages_create()와 _messages_count_tokens() 모두에서 호출되어, system prompt(1개) + tool definitions(1개) + message cache points를 합산하여 4개를 초과하면 가장 오래된 메시지 캐시 포인트부터 제거합니다.

왜 이게 좋은가

anthropic_cache_messages=True 한 줄로 대화 히스토리 캐싱이 가능해집니다. CachePoint() 마커를 수동으로 삽입할 필요가 없습니다.
4-캐시-포인트 제한을 프레임워크가 자동으로 관리하여, 사용자가 모든 캐시 설정을 동시에 활성화해도 API 에러 없이 동작합니다.
제거 우선순위가 명확합니다: system/tool 캐시는 항상 보존하고, 메시지 캐시 중 가장 오래된 것부터 제거합니다.

정리

API 제한은 프레임워크에서 관리하라: 사용자가 제한을 신경 쓰지 않아도 되도록 자동 조정 로직을 내장하면 DX가 크게 개선됩니다.
편의 설정과 세밀한 제어를 모두 제공하라: anthropic_cache_messages(자동)와 CachePoint()(수동)를 함께 제공하여 다양한 사용 패턴을 지원합니다.

참고 자료

pydantic/pydantic-ai#3442 — PR 전체 diff
Anthropic Prompt Caching — 캐시 포인트 제한 설명

⚠️ 알림: 이 분석은 AI가 실제 코드 diff를 기반으로 작성했습니다.

PR Analysis 의 다른글

이전글 [triton] Triton JIT 컴파일러 최적화: `inspect.getclosurevars` 제거를 통한 10,000배 성능 향상
현재글 : [pydantic-ai] anthropic_cache_messages 설정 추가 및 캐시 포인트 자동 제한
다음글 [Loki] 페이지 빌더 메모리 사전 할당 제거로 희소 컬럼 메모리 효율 개선