Review

[논문리뷰] Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

본 논문은 Speculative decoding에서 draft 품질과 연산 비용 간의 trade-off 문제를 해결하는 것을 목표로 합니다.

#Review #Speculative Decoding #LLM Inference #Autoregressive Drafting #Parallel Drafting #Causal Modeling #Low-Rank Correction

2026년 6월 1일

[논문리뷰] Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

본 논문은 과학적 도해(Scientific Figure) 생성의 자동화가 현실적인 연구 환경의 다양성을 충족하지 못하며, 생성된 출력물이 편집 불가능하다는 한계를 해결하고자 합니다.

#Review #Scientific Figure Generation #Multi-Agent Harness #Editable SVGs #Raster-to-Vector Conversion #CraftBench #LLM Agent #Iterative Refinement

2026년 6월 1일

[논문리뷰] Confidence-Adaptive SwiGLU for Mixture-of-Experts

본 논문은 MoE 모델 내 SwiGLU 활성화 함수의 게이트 선택성이 훈련 과정 전반에 걸쳐 고정되어 있다는 점을 해결하고자 합니다.

#Review #Mixture-of-Experts #SwiGLU #Gate Sharpness #Routing Confidence #Transformer #Activation Function #MoE

2026년 6월 1일

[논문리뷰] Brain-IT-VQA: From Brain Signals to Answers

본 논문은 기존의 fMRI 기반 시각적 재구성 및 VQA 연구들이 가진 성능적 한계와 신경과학적 해석의 어려움을 해결하고자 합니다.

#Review #fMRI #Visual Question Answering #Brain Decoding #Vision-Language Models #Brain-IT #NSD-VQA

2026년 6월 1일

[논문리뷰] Agent Skills Should Go Beyond Text: The Case for Visual Skills

본 논문은 현재 에이전트 스킬 학습 패러다임이 텍스트 중심적(text-only)으로 구성되어 있어 시각적 과업 수행 시 발생하는 '텍스트 병목 현상(Textual Bottleneck)'을 해결하고자 합니다 .

#Review #Multimodal Agent #Visual Skill #Spatial Prior #GUI Grounding #Task Decomposition #Skill Reusability #Textual Degradation

2026년 6월 1일

[논문리뷰] Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillation

본 연구는 기존의 Multilingual Embedding Models가 Turkish와 같은 저자원(Low-resource) 언어에서 충분한 성능을 발휘하지 못하는 구조적 한계를 해결하고자 한다.

#Review #Multilingual Embedding Models #Turkish #Tokenizer Surgery #Offline Distillation #Cross-Lingual Transfer #Semantic Search

2026년 6월 1일

[논문리뷰] ACL-Verbatim: hallucination-free question answering for research

본 논문은 현대적인 Retrieval-Augmented Generation (RAG) 시스템이 근본적으로 지니고 있는 환각(Hallucination) 및 답변의 불투명성 문제를 해결하고자 합니다. 기존 LLM 기반 RAG는 문서를 참조하더라도 모델 내부 지식과 혼합되어 부정확하거나 무의미한 답변을 생성할 위험이 큽니다.

#Review #Retrieval-Augmented Generation #Hallucination-free #Extractive Question Answering #ModernBERT #ACL Anthology #Scientific QA

2026년 6월 1일

[논문리뷰] A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

본 논문은 기존의 툴 사용 에이전트 벤치마크가 고정된 시나리오에 의존함에 따라 발생하는 심각한 포화(Saturation) 현상과 벤치마크 구축의 높은 노동 집약적 비용 문제를 해결하고자 합니다.

#Review #Agent Benchmarks #Tool-use #Task Synthesis #Coverage #Difficulty #Adaptive Contrastive n-gram Model

2026년 6월 1일

[논문리뷰] 3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code

본 논문은 현대 3D 생성 분야에서 Procedural Code 생성을 통한 모델링의 중요성이 커지고 있으나, 이를 객관적으로 평가할 수 있는 표준화된 벤치마크가 부재하다는 문제점을 해결하고자 합니다 .

#Review #3D Modeling #Procedural Generation #Vision-Language Models #Agentic Workflow #Benchmark #Human-Preference #Blender

2026년 6월 1일

[논문리뷰] iVGR: Internalizing Visually Grounded Reasoning for MLLMs with Reinforcement Learning

본 논문은 MLLM의 fine-grained perception을 향상하기 위해 도입된 Visually Grounded CoT가 오히려 추론 단계에서 성능 저하를 일으킬 수 있다는 문제점을 지적합니다.

#Review #Multimodal Large Language Models #Reinforcement Learning #Visually Grounded Reasoning #Chain-of-Thought #Dual-Stream Training #Test-Time Scaling

2026년 5월 31일

[논문리뷰] dMoE: dLLMs with Learnable Block Experts

본 논문은 MoE 기반 dLLM에서 블록 병렬 디코딩(block parallel decoding) 시 발생하는 과도한 전문가 활성화 문제를 해결하여 inference 효율성을 높이는 것을 목적으로 합니다.

#Review #dLLM #Mixture-of-Experts #Parallel Decoding #Block-level Routing #Expert Compression #Memory-bound

2026년 5월 31일

[논문리뷰] When Confidence Misleads: Suffix Anchoring and Anchor-Proximity Confidence Modulation for Diffusion Language Models

본 논문은 Fully Non-AR DLM decoding 과정에서 나타나는 고질적인 생성 실패 문제를 해결하고자 한다. 기존의 확신도 기반 디코딩은 EOT(End-of-Text) 토큰에 과도하게 높은 확신도를 부여하여 응답이 불완전하게 생성되는 문제를 안고 있다 .

#Review #Diffusion Language Models #Fully Non-Autoregressive Decoding #Suffix Anchoring #Confidence Modulation #Inference Optimization

2026년 5월 31일

[논문리뷰] VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action Policies

본 논문은 기존 VLA 모델들이 겪는 '정확도와 효율성'의 상충 관계를 해결하고자 한다.

#Review #Vision-Language-Action (VLA) Policies #Visual Intermediate Reasoning #Low-Latency Inference #Task-Adaptive Routing #Embodied Control

2026년 5월 31일

[논문리뷰] VLM3: Vision Language Models Are Native 3D Learners

본 논문은 표준 VLM이 복잡한 전용 설계 없이도 3D 이해를 수행할 수 있음을 증명하기 위해 수행되었다.

#Review #Vision Language Models #3D Understanding #Metric Depth Estimation #Pixel Correspondence #Camera Pose Estimation #Focal Length Unification #Scalable Training

2026년 5월 31일

[논문리뷰] Trust-Region Behavior Blending for On-Policy Distillation

본 논문은 OPD 초기 단계에서 발생하는 학습 불안정성과 낮은 품질의 데이터 생성 문제를 해결하고자 합니다. 기존 OPD는 학생 모델이 학습 초기에 낮은 품질의 trajectory를 생성하면, 교사 모델의 지도(supervision)가 비효율적인 영역에 집중되는 한계가 있습니다 .

#Review #On-policy Distillation #Trust Region #Knowledge Distillation #Language Model Alignment #Annealed Warmup #Behavior Policy

2026년 5월 31일

[논문리뷰] Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

본 논문은 실시간 인터랙티브 환경에서 몰입형 경험을 제공하기 위한 고품질 공간 오디오 생성 모델의 지연 시간과 정확도 문제를 해결하고자 합니다.

#Review #Spatial Audio Generation #Autoregressive Diffusion Transformer #Multimodal Learning #Streaming Generation #First-Order Ambisonics #Contrastive Learning #Direct Preference Optimization

2026년 5월 31일

[논문리뷰] The Good, the Bad, and the Ugly of Markov Boundary for Tabular Prediction

본 논문은 Markov boundary가 이론적으로는 tabular prediction에 가장 효율적인 feature 집합임에도 불구하고, 실제 ML 파이프라인에서 왜 기대만큼의 성과를 내지 못하는지 그 이유를 규명한다.

#Review #Markov boundary #Markov-blanket discovery #Tabular prediction #Feature selection #Causal discovery #Structural causal models

2026년 5월 31일

[논문리뷰] The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement

본 논문은 현대의 RLHF 파이프라인에서 발생하는 정적 RM 학습 데이터의 한계와 정책 드리프트(distribution shift) 문제를 해결하고자 한다.

#Review #RLHF #Reward Model #Self-Supervised Learning #On-Policy Feedback #Value-Anchored #Minimax Optimization #Policy Alignment

2026년 5월 31일

[논문리뷰] Task-Focused Memorization for Multimodal Agents

본 논문은 멀티모달 에이전트가 방대한 스트리밍 데이터 속에서 '무엇을 메모리화할 것인가'를 스스로 판단해야 하는 문제를 해결하고자 한다.

#Review #Multimodal Agents #Long-term Memory #Reinforcement Learning #Task-Focused Memorization #Direct Preference Optimization #Streaming VQA

2026년 5월 31일

[논문리뷰] SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue

본 논문은 기존의 긴 형식(Long-form) 다이얼로그 합성이 화자 전환, 정서적 연속성, 그리고 음향적 일관성 유지를 유지하는 데 한계가 있다는 문제를 해결하고자 한다. 기존의 워크아웃 방식인 개별 턴 단위 합성 및 병합은 전체적인 대화 맥락을 파악하지 못해 부자연스러운 전환과 환경 불일치를 초래한다.

#Review #Zero-Shot TTS #Long-Form Synthesis #Dialogue Synthesis #Flow-Matching #DiffusionNFT #Speech Alignment

2026년 5월 31일