#Pixel-grounded Representation

1개의 포스트

[논문리뷰] Action Images: End-to-End Policy Learning via Multiview Video Generation

본 논문은 7-DoF 로봇 제어 입력을 Action Images로 변환하여 시각적 표현으로 통합하는 방식을 취합니다 . 제안 모델은 Wan 2.2 비디오 백본을 기반으로 하며, RGB 비디오와 Action Images를 결합하여 비디오 공간에서 물리적 동역학을 모델링합니다 .

#Review #World Action Model #Robot Policy Learning #Multiview Video Generation #Pixel-grounded Representation #Zero-shot Policy

2026년 4월 7일