#Multi-modal LLMs

4개의 포스트

[논문리뷰] Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image Editing

본 논문은 현재의 확산 모델(Diffusion-based models) 기반 이미지 편집 시스템이 표면적인 지시사항 수행(Surface-level instruction following)에만 치중하여 논리적 일관성이 결여된 결과물을 생성하는 문제를 해결하고자 합니다 .

#Review #Image Editing #Reasoning-aware #Benchmark #Diffusion Models #Multi-modal LLMs #Logic Consistency #EditRefine

2026년 6월 4일

[논문리뷰] Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation

본 논문은 Group Relative Policy Optimization (GRPO) 가 탐색 및 난이도 적응에서 겪는 어려움의 근본 원인을 규명하는 것을 목표로 합니다.

#Review #Reinforcement Learning #LLM Reasoning #Group Relative Policy Optimization #Advantage Estimation #Exploration-Exploitation #Curriculum Learning #Multi-modal LLMs

2026년 2월 12일

[논문리뷰] Reasoning in Space via Grounding in the World

기존 3D LLM이 통일된 3D 표현 부재 및 외부 모듈 의존으로 인해 3D 시각적 그라운딩과 공간 추론을 원활하게 통합하지 못하는 문제를 해결하는 것이 목표입니다. 본 연구는 LLM이 자율회귀적 방식으로 자연스럽고 효과적인 그라운딩을 수행하여 공간 추론 능력을 향상시킬 수 있는 방법을 모색합니다.

#Review #3D Visual Grounding #Spatial Reasoning #Large Language Models (LLMs)#Chain-of-Thought (CoT)#Hybrid Representation #Multi-modal LLMs #Point Clouds

2025년 10월 16일

[논문리뷰] Efficient Multi-modal Large Language Models via Progressive Consistency Distillation

본 논문은 멀티모달 대규모 언어 모델(MLLMs)에서 시각 토큰이 소모하는 막대한 계산 자원으로 인한 효율성 저하 문제를 해결하고자 합니다. 특히, 시각 토큰 압축 과정에서 발생하는 학습 난이도 증가 와 특징 공간 교란 문제를 해결하여, 효율성을 높이면서도 성능 저하를 최소화하는 것을 목표로 합니다.

#Review #Multi-modal LLMs #Token Compression #Efficiency #Knowledge Distillation #Progressive Learning #Consistency Distillation #MLLM Training

2025년 10월 6일