#Zero-Shot Generalization

8개의 포스트

[논문리뷰] Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

본 논문은 기존 휴머노이드 모션 트래킹 연구가 겪고 있는 데이터 및 모델 규모의 한계와 그로 인한 일반화 성능 저하 문제를 해결하고자 합니다. 기존의 연구들은 주로 소규모 MLP 기반 정책에 의존해왔으며, 이는 정교한 모션 추적과 범용적인 일반화 사이의 고질적인 트레이드오프(trade-off)를 유발했습니다 .

#Review #Humanoid Motion Tracking #Transformer #Zero-Shot Generalization #Large-scale Motion Data #Harmonic Motion Embedding #DAgger Distillation

2026년 6월 2일

[논문리뷰] Steerable Visual Representations

저자들은 텍스트 프롬프트로 ViT의 내부 레이어를 직접 제어하는 SteerViT를 제안합니다. SteerViT는 frozen된 ViT 블록들 사이에 21M 개의 파라미터만 추가하는 경량화된 cross-attention 레이어를 삽입하여 텍스트 정보를 주입합니다 .

#Review #Steerable Visual Representations #Vision Transformers #Early Fusion #Cross-Attention #Text-Conditioned Vision #Representational Quality #Zero-Shot Generalization

2026년 4월 2일

[논문리뷰] SimVLA: A Simple VLA Baseline for Robotic Manipulation

본 논문은 급변하는 VLA 연구 분야에서 성능 향상의 정확한 원인을 파악하기 어려운 문제를 해결하기 위해, 간소화된 VLA 베이스라인 SimVLA 를 제안합니다.

#Review #Robotic Manipulation #Vision-Language-Action (VLA) Models #Baseline Model #Modular Design #Flow Matching #Zero-Shot Generalization #Standardized Training #Efficiency

2026년 2월 23일

[논문리뷰] Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning

본 논문은 기존 MLLMs가 겪는 미세한 공간 이해 및 연속적인 행동 계획의 한계를 극복하고, 복잡한 시각적 추론을 위한 새로운 패러다임을 제시하는 것을 목표로 합니다.

#Review #Video Generation #Visual Reasoning #Zero-Shot Generalization #Test-Time Scaling #Visual Context #Sequential Planning #Continuous Manipulation

2026년 2월 5일

[논문리뷰] Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals

기존 비디오 생성 '월드 모델'이 복잡한 물리적 작업을 위한 정확한 목표를 지정하는 데 어려움을 겪는 문제를 해결하는 것을 목표로 합니다.

#Review #Video Generation #World Models #Physics-Conditioned Goals #Causal Planning #Force Vectors #Zero-Shot Generalization #Diffusion Models #Robotics Planning

2026년 1월 11일

[논문리뷰] Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models

Vision-Language Models(VLM)이 테스트 시점의 도메인 변화(OOD)에 취약하여 성능이 저하되는 문제를 해결하고, 기존 Test-Time Adaptation(TTA) 방법론의 높은 계산 비용과 메모리 사용량, 그리고 frozen encoder 수정의 필요성 같은 제약을 극복하는 효율적이고 비침습적인 프레임워크를 개발하는 것이 목표입니다.

#Review #Vision-Language Models #Test-Time Adaptation #Zero-Shot Generalization #Spectral Decomposition #Latent Space Steering #SVD #Out-of-Distribution

2025년 11월 17일

[논문리뷰] Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

이 논문은 기존 3D 장면 재구성 모델(예: NeRF, Gaussian Splatting)이 시각적 외형에만 집중하고 물리적 속성 예측에는 한계가 있는 문제를 해결하고자 합니다.

#Review #3D Physics Prediction #Supervised Learning #CLIP Features #Neural Radiance Fields #Material Point Method #PIXIEVERSE Dataset #Zero-Shot Generalization

2025년 8월 27일

[논문리뷰] LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer

본 논문은 여러 시각적 레퍼런스와 공간적 레이아웃 정보를 활용하여 일관되고 응집력 있는 이미지를 생성하는 것을 목표로 합니다. 특히, 기존 단일 레퍼런스 확산 모델을 훈련 없이 다중 레퍼런스 시나리오로 확장하고, 개체 일관성 및 정밀한 레이아웃 제어를 동시에 달성하는 문제를 해결하고자 합니다.

#Review #Multi-Image Composition #Layout Control #Diffusion Models #Transformer #Attention Mechanisms #Training-Free #Zero-Shot Generalization

2025년 8월 6일