최신 포스트

[논문리뷰] Pruning and Distilling Mixture-of-Experts into Dense Language Models

본 연구는 MoE 모델의 높은 메모리 요구량으로 인해 발생하는 배포 제약 문제를 해결하기 위해, 전문가 기반 구조를 효율적인 Dense 모델로 변환하는 체계적인 프레임워크를 제안한다.

#Review #Mixture-of-Experts #Knowledge Distillation #Model Pruning #D-Optimal Selection #Dense Language Models #Expert Scoring #Submodularity

2026년 6월 8일

[논문리뷰] Phase Marginalization for Patch-Grid Instability in Vision Transformers

본 논문은 Vision Transformers(ViT)의 패치화(patchification) 과정에서 발생하는 patch-grid phase instability 문제를 해결하고자 한다.

#Review #Vision Transformers #Patch-Grid Phase #Dense Prediction #Phase Marginalization #Test-Time Augmentation #Aliasing

2026년 6월 8일

[논문리뷰] PIPE-Cypher: Automatic Enterprise Benchmark Generation for Text-to-Cypher Systems

본 논문은 기업 내 Property Graph 환경에서 신뢰할 수 있는 Text2Cypher 평가를 위한 반복 가능하고 자동화된 벤치마크 생성 파이프라인의 부재 문제를 해결합니다.

#Review #Text2Cypher #Benchmark Generation #Property Graph #Execution Validation #Local LLM #Governed Generation

2026년 6월 8일

[논문리뷰] PBSD: Privileged Bayesian Self-Distillation for Long-Horizon Credit Assignment

본 논문은 long-horizon agentic task에서 발생하는 sparse reward로 인한 credit assignment의 근본적인 한계를 해결하고자 한다.

#Review #Reinforcement Learning #Long-Horizon Credit Assignment #Bayesian Inference #Self-Distillation #Search Agents #Agentic RL

2026년 6월 8일

[논문리뷰] Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text

본 연구는 기존의 텍스트 기반 CoT(Chain-of-Thought)가 가지는 비효율성과 multimodal 태스크에서의 표현력 한계를 해결하고자 한다.

#Review #Optical Reasoning #Multimodal Large Language Models #Chain-of-Thought #Context Compression #Interleaved-modal Reasoning #Visual Reasoning

2026년 6월 8일

[논문리뷰] On the Geometry of On-Policy Distillation

본 논문은 OPD가 SFT와 RLVR의 특성을 모두 공유함에도 불구하고, 파라미터 공간에서의 구체적인 학습 동역학(training dynamics)은 제대로 규명되지 않았다는 점을 핵심 문제로 정의합니다.

#Review #On-policy Distillation #Parameter-space Geometry #Subspace Locking #SFT #RLVR #Large Language Models

2026년 6월 8일

[논문리뷰] OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

본 논문은 기존 VLM Agent 벤치마크가 단일 시도(First-attempt) 점수만을 보고하고, Solo 플레이 위주로 구성되어 있어 에이전트의 학습 및 개선 능력을 측정하지 못한다는 문제를 지적한다.

#Review #VLM Agents #Benchmark #Unreal Engine 5 #Improvement Dynamics #Agentic Reflection #Cold-start #Generalization

2026년 6월 8일

[논문리뷰] OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning

본 논문은 Omni-modal 모델들이 복잡한 사용자 지시 사항을 준수하는 능력인 Instruction Following에 대한 체계적인 평가 도구가 부족하다는 점을 해결하고자 합니다.

#Review #Omni-modal Large Language Models #Instruction Following #Video Captioning #Temporal Grounding #Constraint Framework #Format-Content Tradeoff

2026년 6월 8일

[논문리뷰] OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation

본 논문은 휴머노이드 로봇의 Loco-Manipulation 태스크를 위한 대규모의 고품질 Demonstration Data 부족 문제를 해결하고자 합니다.

#Review #Humanoid Loco-Manipulation #Simulation Data Collection #Zero-Shot Transfer #Domain Randomization #Visuomotor Policy #Flow Matching #Unitree G1

2026년 6월 8일

[논문리뷰] Light-WAM: Efficient World Action Models with State-Fusion Action Decoding

본 논문은 대규모 생성적 아키텍처를 기반으로 하는 기존 WAM의 높은 학습 비용과 추론 Latency 문제를 해결하기 위해 Light-WAM을 제안합니다.

#Review #World Action Models #Robot Manipulation #State-Fusion Action Decoding #Efficient Inference #Latent Space Supervision #Video Co-training

2026년 6월 8일

[논문리뷰] Liberating LLM Capabilities in Full-Duplex Speech Models

본 논문은 기존의 음성 기반 LLM이 음성 응답이라는 제한된 출력 채널에 갇혀, 텍스트가 가진 구조적·논리적 강점을 충분히 활용하지 못한다는 점을 지적한다.

#Review #Full-Duplex #Speech LLM #Visible Writing #Tri-channel Paradigm #Token Schema #Real-time Interaction

2026년 6월 8일

[논문리뷰] Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory

본 논문은 LLM 에이전트 워크플로우 및 실행 궤적에 대한 공식적인 모델링, 검증, 디버깅 방법론이 부재한 문제를 해결한다.

#Review #Formal Methods #LLM Agent #Lean4 #Workflow Verification #Trajectory Analysis #FormalAgentLib #LeanEvolve

2026년 6월 8일

[논문리뷰] LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents

본 논문은 LLM Agent가 외부 기술을 프롬프트에 직접 주입할 때 발생하는 컨텍스트 오버헤드와 보안 노출 문제를 해결하고자 합니다. 기존의 In-Context Skill 방식은 매 단계마다 기술 텍스트를 삽입해야 하므로 추론 비용이 높고, 프롬프트 내에 기술 내용이 그대로 노출되어 공격에 취약하다는 단점이 있습니다.

#Review #LLM Agents #LoRA #Hypernetworks #Skill Composition #Weight Space #Prompt Efficiency #Modular Learning

2026년 6월 8일

[논문리뷰] Latent Spatial Memory for Video World Models

본 논문은 기존 비디오 월드 모델이 가진 3D 공간 일관성 유지의 한계와 과도한 계산 비용 문제를 해결하기 위해 Mirage를 제안한다.

#Review #Video Generation #Spatial Memory #3D-consistent Video Generation #Video World Models #Latent Space #Diffusion Models

2026년 6월 8일

[논문리뷰] Human Psychometric Questionnaires Mischaracterize LLM Behavior

본 논문은 LLM의 가치와 성격을 평가하기 위해 인간용 심리측정 설문지를 사용하는 것이 과연 실제 사용자 상호작용에서의 행동을 신뢰성 있게 예측하는지 의문을 제기합니다.

#Review #LLM #Psychometrics #Value Portrait #Generation Probability #Alignment #Construct Validity

2026년 6월 8일

[논문리뷰] Honest Lying: Understanding Memory Confabulation in Reflexive Agents

본 논문은 Reflexion과 같은 에이전트가 자가 생성한 피드백에 의존할 때 발생하는 'Memory Confabulation' 문제를 해결하고자 합니다. 기존 연구는 에이전트가 스스로 자신의 실패를 정확히 진단할 수 있다고 가정하지만, 저자들은 이 가정이 체계적으로 실패할 수 있음을 입증합니다.

#Review #Reflexive Agents #Memory Confabulation #Reflexion #ALFWorld #LLM Agents #Programmatic Feedback Extraction #Reflection Repetition Rate

2026년 6월 8일

[논문리뷰] Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

본 논문은 현대의 Agent Benchmarks가 보유한 outcome verifier의 취약성을 지적하고, 이를 자동으로 강화할 수 있는 체계적인 방법론을 제안한다. 기존의 방식은 새로운 유형의 공격이 발견될 때마다 개발자가 수동으로 검증기를 패치하는 사후 대응적(reactive) 접근에 의존하고 있어 확장이 어렵다 .

#Review #Agentic Evaluation #Reward Hacking #Adversarial Robustness #LLM Benchmarks #Hacker-Fixer Loop #Verifiers #Defense Pool

2026년 6월 8일

[논문리뷰] FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

본 논문은 초장기 context 처리 시 발생하는 KV 캐시의 메모리 병목 현상을 해결하고자 합니다 . 기존 LLM은 모든 historical context를 GPU 메모리에 상주시켜야 하므로, context 길이가 길어짐에 따라 GPU 메모리 요구량이 선형적으로 증가하는 치명적인 한계가 있습니다.

#Review #Large Language Models #Ultra-Long Context #Sparse Attention #KV Cache Compression #Lookahead Sparse Attention #Neural Memory Indexer #Decoupled Training

2026년 6월 8일

[논문리뷰] Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

본 논문은 기존 의료용 에이전트(Medical Agent)들이 정적인 지식이나 단기 메모리에 의존하여, 복잡한 임상 상황에서 장기적인 경험을 효과적으로 축적하지 못하는 한계를 해결하고자 합니다.

#Review #Medical Agent #Skill Memory #Self-Evolving #Clinical Reasoning #Value-aware Retrieval #Trajectory-to-Skill Distillation #Non-parametric Reinforcement

2026년 6월 8일

[논문리뷰] Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting

본 논문은 현재 AI 평가 생태계가 파편화되어 있어, 모델의 성능 지표를 신뢰하거나 비교하기 어렵다는 점을 해결하고자 합니다. 기존 연구들은 평가의 특정 측면만을 다루거나 정적인 보고서 형태에 머물러 있어, 실제 평가 파이프라인에서 발생하는 데이터들을 체계적으로 통합하지 못합니다.

#Review #AI Evaluation #Reporting Framework #Reproducibility #Transparency #Interpretive Layer #Benchmark Metadata #Rollout Hierarchy

2026년 6월 8일