Review

[논문리뷰] Adapting Vision-Language Models Without Labels: A Comprehensive Survey

본 서베이 논문은 레이블링된 데이터 없이 사전 훈련된 Vision-Language Models (VLMs) 를 특정 다운스트림 태스크에 적용할 때 발생하는 성능 저하 문제를 해결하고자 합니다.

#Review #Vision-Language Models (VLMs)#Unsupervised Adaptation #Test-Time Adaptation (TTA)#Domain Transfer #Multimodal Learning #Label-Free Learning #Zero-Shot Learning

2025년 8월 11일

[논문리뷰] Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling

본 연구는 기존 비전-언어 모델(VLMs)이 매개변수 규모에 제약이 있고, 견고한 자가 수정 능력이 부족하며, 긴 시각적 맥락과 복잡한 추론을 요구하는 문서 기반 태스크에서 저조한 성능을 보이는 문제를 해결하고자 합니다.

#Review #Visual Document Understanding #Visual Question Answering #Multi-Agent System #Test-Time Scaling #Self-Correction #Mixed Reward Modeling #Large Language Models

2025년 8월 8일

[논문리뷰] StrandDesigner: Towards Practical Strand Generation with Sketch Guidance

본 연구는 텍스트나 일반 이미지 프롬프트의 정밀도와 사용 편의성 부족 문제를 해결하기 위해, 스케치를 기반으로 하는 최초의 머리카락 스트랜드(strand) 생성 모델을 제안합니다.

#Review #Strand Generation #Sketch Guidance #Diffusion Models #Multi-scale Learning #Adaptive Conditioning #3D Hair Modeling #Computer Graphics

2025년 8월 8일

[논문리뷰] Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression

본 논문은 확산 기반 이미지 압축 모델의 주요 단점인 과도한 디코딩 지연 시간 과 낮은 충실도(fidelity) 문제를 해결하고자 합니다. 특히 낮은 비트레이트 환경에서 높은 지각 품질과 빠른 디코딩 속도, 원본에 충실한 재구성을 동시에 달성하는 단일 스텝 확산 이미지 압축 모델(SODEC) 을 제안하는 것이 목표입니다.

#Review #Image Compression #Diffusion Models #One-Step Decoding #Fidelity Guidance #Rate Annealing #VAE #Perceptual Quality

2025년 8월 8일

[논문리뷰] RPCANet++: Deep Interpretable Robust PCA for Sparse Object Segmentation

본 논문은 기존의 Robust PCA (RPCA) 모델이 가진 높은 계산 비용, 수동 튜닝에 따른 일반화 능력 부족, 그리고 경직된 사전 지식으로 인한 한계를 극복하는 것을 목표로 합니다.

#Review #Robust PCA #Deep Unfolding #Sparse Segmentation #Interpretability #Image Decomposition #Computer Vision

2025년 8월 8일

[논문리뷰] REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation

동시 음성 번역(SimulST) 시스템에서 번역 품질과 지연 시간 간의 최적의 균형을 달성하는 것이 주요 과제입니다.

#Review #Simultaneous Speech Translation #Adaptive Policy #Entropy-based Loss #Mutual Information #Latency-Quality Trade-off #Speech-to-Text Translation #REINA

2025년 8월 8일

[논문리뷰] R-Zero: Self-Evolving Reasoning LLM from Zero Data

본 연구는 기존 LLM의 자가 진화 방식이 방대한 인간 큐레이션 데이터 에 의존하는 한계를 극복하고자 합니다.

#Review #Self-Evolving LLM #Reinforcement Learning #Curriculum Learning #Reasoning #Large Language Models #Self-Play #Zero-Data Training

2025년 8월 8일

[논문리뷰] PRvL: Quantifying the Capabilities and Risks of Large Language Models for PII Redaction

본 연구는 비정형 텍스트에서 개인 식별 정보(PII) 를 자동 제거하는 문제에 초점을 맞춥니다.

#Review #PII Redaction #Large Language Models #Instruction Tuning #Retrieval-Augmented Generation #Privacy Preservation #Model Evaluation #Cross-Domain Generalization #Open-Source LLMs

2025년 8월 8일

[논문리뷰] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

표준 Supervised Fine-Tuning (SFT)이 Reinforcement Learning (RL)에 비해 제한적인 일반화 성능 을 보이는 문제를 해결하는 것이 목표입니다.

#Review #Supervised Fine-Tuning (SFT)#Reinforcement Learning (RL)#Generalization #Reward Rectification #Dynamic Fine-Tuning (DFT)#LLM #Policy Gradient #Mathematical Reasoning

2025년 8월 8일

[논문리뷰] Marco-Voice Technical Report

본 논문은 음성 복제(voice cloning)와 감정 제어(emotion control)를 통합한 다기능 음성 합성 시스템 인 Marco-Voice 를 개발하는 것을 목표로 합니다.

#Review #Speech Synthesis #Voice Cloning #Emotion Control #Text-to-Speech #Disentanglement #Contrastive Learning #Flow Matching #Emotional Speech Dataset

2025년 8월 8일

[논문리뷰] MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes

기존 VOS(Video Object Segmentation) 데이터셋들이 실제와 동떨어진 고립되고 눈에 띄는 객체에 치우쳐 있어 모델의 현실 적용성을 제한하는 문제를 해결하고자 합니다.

#Review #Video Object Segmentation #Dataset #Complex Scenes #Benchmark #Object Tracking #Computer Vision #Dataset Challenges

2025년 8월 8일

[논문리뷰] InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities

본 논문은 대규모 언어 모델(LLM)의 추론 능력을 향상시키기 위한 확장 가능 하고 샘플 효율적인 후속 학습 프레임워크인 InfiAlign 을 제안합니다. 특히, 데이터 및 계산 비용이 많이 드는 기존 방법론의 한계를 극복하고, 적은 양의 고품질 데이터로도 LLM 정렬을 효과적으로 수행하는 것을 목표로 합니다.

#Review #LLM Alignment #Reasoning #Data Curation #Supervised Fine-tuning (SFT)#Direct Preference Optimization (DPO)#Sample Efficiency #Scalability #Multi-dimensional Filtering

2025년 8월 8일

[논문리뷰] I2CR: Intra- and Inter-modal Collaborative Reflections for Multimodal Entity Linking

본 논문은 기존 대규모 언어 모델(LLM) 기반의 다중모달 엔티티 연결(MEL) 방법론이 이미지 데이터를 불필요하게 통합하고 시각적 특징을 단일 추출에 의존하여 성능 저하를 겪는 문제를 해결하고자 합니다.

#Review #Multimodal Entity Linking #Large Language Models #Collaborative Reflection #Iterative Reasoning #Visual Information #Text-centric

2025년 8월 8일

[논문리뷰] I Think, Therefore I Am Under-Qualified? A Benchmark for Evaluating Linguistic Shibboleth Detection in LLM Hiring Evaluations

본 논문은 대규모 언어 모델(LLM)이 채용 평가에서 언어적 시볼레트(linguistic shibboleths) , 특히 완곡어법(hedging language)을 기반으로 잠재적으로 인구통계학적 편향을 보이는 문제를 해결하고자 합니다.

#Review #LLM Bias #Hiring Evaluation #Linguistic Shibboleth #Hedging Language #Fairness #Benchmarking #Sociolinguistics

2025년 8월 8일

[논문리뷰] Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis

현재 대규모 언어 모델(LLM)이 다단계(multi-hop) 질문 답변 태스크에서 환각(hallucination)을 보이거나 추론에 실패하는 근본적인 원인을 진단하는 것이 주된 목표입니다.

#Review #Multi-hop Question Answering #Large Language Models #Reasoning Errors #Error Taxonomy #Human Evaluation #Automated Evaluation #Overthinking

2025년 8월 8일

[논문리뷰] Hi3DEval: Advancing 3D Generation Evaluation with Hierarchical Validity

본 논문은 3D 생성 모델의 품질 평가에 있어 기존 2D 이미지 기반 metrics의 한계와 평가의 거친 입자성(coarse-grained) 문제를 해결하고자 합니다.

#Review #3D Generation Evaluation #Hierarchical Evaluation #Material Properties #Multi-Agent Annotation #Hybrid Scoring System #Video-based Evaluation #Part-level Analysis

2025년 8월 8일

[논문리뷰] Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

본 논문은 로봇 조작을 위한 통합된 세계 파운데이션 플랫폼 (Genie Envisioner) 을 제시하여, 정책 학습, 평가 및 시뮬레이션을 단일 비디오-생성 프레임워크 내에서 통합하는 것을 목표로 합니다. 이는 기존 로봇 개발 과정의 단편적인 단계를 극복하고 확장 가능하며 범용적인 지능형 로봇 시스템 구축을 지향합니다.

#Review #Robotic Manipulation #World Model #Video Generation #Diffusion Model #Embodied AI #Foundation Model #Robotics Simulation #Policy Learning

2025년 8월 8일

[논문리뷰] Evaluating, Synthesizing, and Enhancing for Customer Support Conversation

본 논문은 고객 지원 대화(Customer Support Conversation, CSC) 분야에서 전략적 지침과 고품질 데이터의 부족 문제를 해결하고자 합니다.

#Review #Customer Support #Dialogue Generation #Large Language Models #Role-Playing #COPC Framework #Synthetic Data #Strategy Prediction #Empathetic AI

2025년 8월 8일

[논문리뷰] Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models

본 설문 연구는 DeepSeek R1 과 같은 R1-style Large Reasoning Models (LRMs) 에서 흔히 발생하는 '과잉 사고(overthinking)' 문제를 해결하고, 효율적인 추론 방법을 체계적으로 분류 및 분석하는 것을 목표로 합니다.

#Review #Large Reasoning Models #Efficient Reasoning #Chain-of-Thought #Model Optimization #Model Collaboration #Overthinking Problem #LLM Efficiency

2025년 8월 8일

[논문리뷰] DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning

본 논문은 Vision Language Models(VLMs)이 복잡하고 동적인 물리 환경에서 정확한 행동 계획 및 공간/시간 추론 능력 에 한계를 보이는 문제를 해결하고자 합니다.

#Review #Vision Language Models (VLMs)#Agentic AI #Physical Reasoning #Benchmark #Simulation Environments #Action Planning #Interactive AI

2025년 8월 8일