[axolotl] Triton LoRA 커널 Autotune 테스트 안정화: pytest-xdist 환경에서의 모듈 격리 전략

2026년 3월 19일수정: 2026년 3월 19일

PR 링크: axolotl-ai-cloud/axolotl#3511 상태: Merged | 변경: +41 / -10

들어가며

Triton 기반 LoRA 커널의 autotune 설정을 수집하는 기능에 대한 테스트가 pytest-xdist 환경에서 간헐적으로 실패하는 문제가 있었습니다. 원인은 같은 xdist worker 프로세스 내에서 다른 테스트가 이미 실제 lora_ops 모듈을 sys.modules에 로드해 두었기 때문입니다. 이 PR은 테스트의 mock 전략을 근본적으로 변경하여 이 문제를 해결합니다.

핵심 코드 분석

1. RuntimeError 방지를 위한 sys.modules 순회 안전화

_find_lora_ops_module 함수가 sys.modules.items()를 순회하는 도중 다른 스레드에서 모듈이 추가/삭제되면 RuntimeError: dictionary changed size during iteration이 발생할 수 있었습니다.

Before:

for name, module in sys.modules.items():
    if module is not None and "lora_ops" in name ...

After:

for name, module in list(sys.modules.items()):
    if module is not None and "lora_ops" in name ...

list()로 복사본을 만들어 순회 중 변경에 안전해졌습니다.

2. 테스트 mock 전략 변경: patch.dict에서 직접 함수 패치로

기존 테스트는 patch.dict(sys.modules, ...)로 mock 모듈을 주입했지만, 같은 worker에 실제 lora_ops 모듈이 이미 있으면 _find_lora_ops_module이 mock 대신 실제 모듈을 찾아버렸습니다.

Before:

with patch.dict(sys.modules, {_FAKE_MODULE_NAME: mock_lora_ops}):
    from axolotl.integrations.kernels.autotune_collector import (
        collect_autotune_configs,
    )
    result = collect_autotune_configs()

After:

_FIND_MODULE_PATH = (
    "axolotl.integrations.kernels.autotune_collector._find_lora_ops_module"
)

with patch(_FIND_MODULE_PATH, return_value=mock_lora_ops):
    from axolotl.integrations.kernels.autotune_collector import (
        collect_autotune_configs,
    )
    result = collect_autotune_configs()

_find_lora_ops_module 함수 자체를 패치하여, 실제 sys.modules 상태와 무관하게 테스트가 동작하도록 변경했습니다.

왜 이게 좋은가

이 변경의 핵심은 테스트 격리 수준의 올바른 선택에 있습니다. 기존 접근법은 "상태(sys.modules)를 mock"하는 방식이었지만, 병렬 테스트 환경에서는 상태가 공유되므로 격리가 깨졌습니다. 새로운 접근법은 "동작(_find_lora_ops_module)을 mock"하는 방식으로, 공유 상태에 의존하지 않아 병렬 실행에서도 안정적입니다. list() 복사를 통한 dictionary 순회 안전화도 멀티스레드 환경에서의 방어적 프로그래밍의 좋은 예시입니다.

정리

항목	내용
문제	pytest-xdist에서 sys.modules 공유로 인한 flaky 테스트
해결	_find_lora_ops_module 함수 직접 패치 + list() 복사
영향	CI 안정성 향상, 병렬 테스트 환경에서의 신뢰성 확보

참고 자료

알림: 이 분석은 AI가 실제 코드 diff를 기반으로 작성했습니다.

PR Analysis 의 다른글

이전글 [axolotl] Axolotl 커스텀 Triton 커널 — entropy/softmax 최대 5배 가속
현재글 : [axolotl] Triton LoRA 커널 Autotune 테스트 안정화: pytest-xdist 환경에서의 모듈 격리 전략
다음글 [axolotl] Axolotl: Triton 커널을 활용한 Entropy 및 Selective Log Softmax 최적화