Review

[논문리뷰] ChildVox: A Speech, Audio, and Large Audio-Language Model Benchmark in Understanding and Characterizing Sound across Childhood

본 논문은 기존 아동 음성 처리 연구들이 일반적인 ASR(Automatic Speech Recognition)에만 편중되어 있어, 아동 발달 과정의 핵심인 비언어적 의사소통 신호를 충분히 포착하지 못하는 한계를 해결하고자 합니다.

#Review #ChildVox #Child Development #Audio Benchmark #LALMs #Speech Foundation Models #Physiological Sounds #Acoustic Intelligence

2026년 5월 28일

[논문리뷰] CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

본 논문은 기존의 인과 추론 벤치마크가 LLM의 진정한 인과적 사고를 평가하기보다 암기된 지식에 의존하는 'Causal parrot' 문제를 해결하기 위해 CausaLab을 제안한다 .

#Review #Causal Discovery #LLM Agents #Structural Causal Models #Interactive Benchmarking #Scientific Discovery #Mechanism Recovery

2026년 5월 28일

[논문리뷰] Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning

본 논문은 현대의 Vision-Language Models(VLMs)가 3D 공간 추론에서 겪는 근본적인 한계를 해결하고자 합니다.

#Review #Vision-Language Models #3D Spatial Reasoning #Geometric Priors #Correspondence Learning #Depth Consistency #Object Constancy

2026년 5월 28일

[논문리뷰] AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios

기존 LLM 에이전트 연구들은 주로 단일 태스크 환경과 즉각적인 도구 응답을 가정하여 평가를 수행해왔습니다. 그러나 실제 환경에서는 도구 호출 시 지연 시간(latency)이 발생하며, 여러 태스크를 동시에 처리해야 하는 상황이 빈번합니다.

#Review #Asynchronous Tool Calling #Multi-task Scenarios #LLM Agent #Temporal Coordination #Latency #Benchmark

2026년 5월 28일

[논문리뷰] Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

본 논문은 RLHF가 정렬을 위해 사용하는 선호도 데이터셋이 사실은 모델 스스로의 출력에 의해 오염될 수 있는 구조적 한계를 해결하고자 한다. 기존의 RLHF는 응답의 선택 이유를 명시하지 않고 단순히 pairwise 비교 결과만을 사용하기 때문에, 고품질 응답에 포함된 편향을 의도치 않게 학습하게 된다 .

#Review #RLHF #Alignment Tampering #Bias Amplification #Reward Hacking #Bias-Quality Correlation

2026년 5월 28일

[논문리뷰] AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

본 논문은 현대 에이전트 시스템(예: OpenClaw)의 강력한 실행 능력으로 인해 기존 안전성 프레임워크가 대응하기 어려운 광범위한 위험 요소가 발생하고 있다는 문제의식에서 출발한다. 기존 연구들은 주로 단일 시점의 입력이나 출력만을 평가하여 궤적 전체에 누적되는 복합적인 위험 패턴을 탐지하는 데 한계가 있었다 .

#Review #Agent Safety #Alignment Framework #AgentDoG 1.5 #Trajectory-level Diagnosis #Reinforcement Learning #Online Guardrail

2026년 5월 28일

[논문리뷰] AdaState: Self-Evolving Anchors for Streaming Video Generation

본 논문은 기존 autoregressive 비디오 생성 모델이 겪는 '일관성과 동적 표현 사이의 트레이드오프' 문제를 해결하고자 합니다.

#Review #Streaming Video Generation #Autoregressive Diffusion #Adaptive State #Attention Sink #Horizon-Weighted DMD #KV Cache #Temporal Dynamics

2026년 5월 28일

[논문리뷰] VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild

본 논문은 LLM 기반 에이전트가 기존 벤치마크에서는 높은 성능을 보임에도 불구하고, 실사용 환경에서는 사용자 만족도가 낮은 'Evaluation–Experience Gap' 문제를 해결하고자 한다.

#Review #VibeSearch #Proactive Search #Large Language Models #Agent Harness #Knowledge Graph #Benchmark

2026년 5월 27일

[논문리뷰] Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

본 논문은 AI 코딩 에이전트가 생성한 코드의 정확성을 보장하기 위해 형식적 검증을 도입하려 할 때, 정작 그 코드의 기준이 되는 '형식적 명세' 자체의 오류가 발생하는 문제를 해결하고자 한다.

#Review #Formal Verification #Specification Autoformalization #Agentic Environment #Verus #Codeforces #Executable Specifications

2026년 5월 27일

[논문리뷰] Triplet-Block Diffusion RWKV

본 논문은 Causal Transformer Language Models(LLMs)가 겪는 두 가지 핵심 한계를 해결하고자 합니다.

#Review #Triplet-Block Layout #Diffusion Language Models #RWKV #Linear-time Recurrent Networks #Parallel Decoding #Inference Throughput

2026년 5월 27일

[논문리뷰] The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages

본 연구는 Chain-of-Thought(CoT) 모니터링이 다양한 언어 환경과 모델군에서 얼마나 신뢰할 수 있는가를 실증적으로 검증하기 위해 시작되었다.

#Review #Chain-of-Thought #CoT Monitorability #Deception #Linguistic Distribution Shift #Mechanistic Interpretability #LLM Safety

2026년 5월 27일

[논문리뷰] SkillGrad: Optimizing Agent Skills Like Gradient Descent

본 논문은 LLM 에이전트의 도메인 적응력을 높이기 위한 Agent Skills가 흔히 불완전하거나, 시대에 뒤떨어지거나, 신뢰할 수 없다는 문제를 해결하고자 합니다.

#Review #Agent Skills #Gradient Descent #Skill Evolution #LLM Agents #Procedural Knowledge #Structured Optimization #Textual Momentum

2026년 5월 27일

[논문리뷰] Self-Improving Language Models with Bidirectional Evolutionary Search

본 논문은 기존의 LLM 추론 및 학습 방식인 Best-of-N sampling과 Tree search가 가진 근본적인 제약 사항을 해결하고자 합니다 .

#Review #Large Language Models #Evolutionary Search #Bidirectional Search #Goal Decomposition #Post-Training #Inference Scaling

2026년 5월 27일

[논문리뷰] ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

본 논문은 자율 연구 에이전트가 생성한 결과물에서 발견되는 심각한 신뢰성 결여 문제를 해결하고자 한다. 기존 에이전트 시스템은 전문적인 논문과 경쟁력 있는 솔루션을 생성하지만, 인용 조작, 검증 불가능한 점수 보고, 코드 구현과 논문 서술 간의 불일치 등 표면적인 완성도에만 치중한 오류들을 반복적으로 보이고 있다.

#Review #Autonomous Research #Chain-of-Evidence #Verifiability #Provenance #Integrity Audit #LLM

2026년 5월 27일

[논문리뷰] Revealing Algorithmic Deductive Circuits for Logical Reasoning

본 논문은 LLM이 복잡한 논리적 추론을 수행할 때 내부적으로 어떠한 메커니즘을 사용하는지에 대한 근본적인 의문을 해결하고자 합니다.

#Review #Large Language Models #Logical Reasoning #Chain-of-Thought #Causal Mediation Analysis #Circuit Interpretability #Attention Heads #Deductive Reasoning

2026년 5월 27일

[논문리뷰] Rethinking Memory as Continuously Evolving Connectivity

본 논문은 기존 LLM agent의 기억 시스템이 Static Repository에 의존하여 동적인 환경 변화나 피드백을 반영하지 못하는 한계를 해결하고자 합니다.

#Review #FluxMem #Memory Connectivity #Heterogeneous Graph #Agentic Memory #Long-term Consolidation #Self-evolving Agents

2026년 5월 27일

[논문리뷰] ResearchMath-14K: Scaling Research-Level Mathematics via Agents

본 논문은 최신 LLM이 기초적인 수학 경시 수준을 넘어 연구 수준(research-level)의 수학 문제를 해결하도록 유도하는 데 필요한 대규모 학습 데이터가 부족하다는 점을 해결하고자 한다.

#Review #Research-level Mathematics #Dataset Construction #Agentic Pipeline #Factuality #Reasoning Trajectories #Fine-tuning #Language Models

2026년 5월 27일

[논문리뷰] ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation

본 논문은 추천 시스템이 과거 데이터를 단순히 모방하는 것을 넘어, 사용자의 선호도를 새로운 영역으로 확장하는 Proactive Recommendation의 한계를 해결하고자 한다.

#Review #Proactive Recommendation #Reinforcement Learning #Policy Gradient Estimation #Path Feasibility #Guidance Effectiveness

2026년 5월 27일

[논문리뷰] PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective

본 논문은 현대의 PEFT 방식이 다운스트림 작업 성능 향상(Plasticity)에는 집중하고 있으나, 사전 학습된 범용 역량의 보존(Stability) 문제는 간과하고 있다고 지적한다.

#Review #Parameter-Efficient Finetuning #Stability-Plasticity #Orthogonal Finetuning #Representation Geometry #Spectral Analysis #Pathwise Diagnosis

2026년 5월 27일

[논문리뷰] PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft

본 논문은 기존 LLM 기반 embodied agent가 의존하는 비파라미터식(non-parametric) 기억 방식의 근본적인 한계를 해결하고자 합니다.

#Review #Embodied Agent #Parametric Memory #Contrastive Learning #Mixture-of-Experts #Continual Learning #Minecraft

2026년 5월 27일