[논문리뷰] Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language ModelsarXiv에 게시된 'Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models#3D Reasoning#Language-based Localization#Spatial Understanding#Situation Modeling#Global Layout Reconstruction#Monocular Video2026년 3월 19일댓글 수 로딩 중
[논문리뷰] Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language ModelsSong Dai이 arXiv에 게시한 'Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Large Language Models (MLLMs)#Video-SFT#Temporal Trap#Spatial Understanding#Temporal Budget#Hybrid-Frame Strategy#Negative Transfer2026년 3월 18일댓글 수 로딩 중
[논문리뷰] Enhancing Spatial Understanding in Image Generation via Reward ModelingarXiv에 게시된 'Enhancing Spatial Understanding in Image Generation via Reward Modeling' 논문에 대한 자세한 리뷰입니다.#Review#Image Generation#Reward Modeling#Spatial Understanding#Reinforcement Learning#Visual Language Models#Text-to-Image#Preference Learning2026년 3월 1일댓글 수 로딩 중
[논문리뷰] MiMo-Embodied: X-Embodied Foundation Model Technical ReportarXiv에 게시된 'MiMo-Embodied: X-Embodied Foundation Model Technical Report' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Model (VLM)#Embodied AI#Autonomous Driving#Foundation Model#Multimodal Learning#Task Planning#Affordance Prediction#Spatial Understanding#Reinforcement Learning2025년 11월 20일댓글 수 로딩 중
[논문리뷰] Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement LearningarXiv에 게시된 'Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning' 논문에 대한 자세한 리뷰입니다.#Review#Self-supervised learning#Reinforcement Learning#Spatial Understanding#Vision-Language Models#Pretext Tasks#RGB-D Images#Spatial Reasoning2025년 11월 9일댓글 수 로딩 중
[논문리뷰] LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World GenerationZhan Zhao이 arXiv에 게시한 'LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal LLM#3D World Generation#Unreal Engine 5#Procedural Content Generation#Interactive Environments#Sim-to-Real#Spatial Understanding#Multimodal Input2025년 9월 8일댓글 수 로딩 중