[Ray Data] StreamingRepartition과 MapBatches 퓨전 규칙 개선

2025년 12월 19일수정: 2025년 12월 19일

PR 링크: ray-project/ray#59476 상태: Merged | 변경: +49 / -9

들어가며

Ray Data의 연산자 퓨전은 연속된 연산자를 합쳐 중간 물질화(intermediate materialization)를 제거하는 최적화다. 기존에는 MapBatches -> StreamingRepartition 퓨전이 batch_size == target_num_rows일 때만 허용되었다. 하지만 분석 결과, batch_size가 target_num_rows의 배수이기만 하면 병렬성에 영향 없이 퓨전이 가능하다.

핵심 코드 분석

퓨전 조건 완화

if isinstance(down_logical_op, StreamingRepartition):
    return (
        isinstance(up_logical_op, MapBatches)
        and up_logical_op._batch_size is not None
        and down_logical_op.target_num_rows_per_block is not None
        and down_logical_op.target_num_rows_per_block > 0
        # batch_size가 target_num_rows의 배수이면 퓨전해도 동일한 블록 시퀀스 생성
        and up_logical_op._batch_size
        % down_logical_op.target_num_rows_per_block
        == 0
    )

StreamingRepartition -> MapBatches는 퓨전하지 않는 이유

구분	MapBatches 태스크 수
퓨전	num_input_blocks (StreamingRepartition 출력 블록 수 이하)
비퓨전	StreamingRepartition 출력 블록 수

StreamingRepartition이 병렬성 증가를 위해 블록을 분할하는 경우, 퓨전하면 그 효과가 사라진다.

MapBatches -> StreamingRepartition 퓨전이 안전한 이유

구분	MapBatches 태스크 수
퓨전	total_rows / batch_size
비퓨전	total_rows / batch_size

병렬성이 동일하므로 퓨전하여 중간 물질화를 제거하는 것이 이득이다.

왜 이게 좋은가

적용 범위 확대: batch_size=40, target_num_rows=20 같은 케이스에서도 퓨전이 적용된다.
중간 물질화 제거: 퓨전을 통해 중간 데이터의 직렬화/역직렬화 오버헤드를 없앤다.
정확한 안전성 분석: 퓨전 여부를 병렬성 변화 관점에서 분석하고 문서화했다.
테스트 추가: batch_size=40, target_num_rows=20 케이스의 퓨전 테스트와 E2E 테스트를 추가했다.

참고 자료

댓글

관련 포스트

PR Analysis 의 다른글

이전글 [Grafana Loki] 스케줄러 Peer 연결 미종료로 인한 메모리 누수 수정
현재글 : [Ray Data] StreamingRepartition과 MapBatches 퓨전 규칙 개선
다음글 [triton] Triton PROTON: CUDA 그래프 프로파일링 오버헤드를 줄이고 MsgPack API를 추가하여 성능을 대폭 개선