최신 포스트

[논문리뷰] MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation

본 논문은 기존의 에이전트 벤치마크가 범용 도구 사용에만 집중되어 있어, 실제 사용자의 계정 및 로컬 데이터와 밀접하게 연동되는 개인화된 앱에서의 성능을 평가하지 못하는 문제를 해결하고자 한다.

#Review #Model Context Protocol #LLM Agents #Personalized Applications #Environment Simulation #Benchmarking #Tool-Traverse

2026년 6월 1일

[논문리뷰] LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

본 논문은 Autoregressive(AR) 비디오 생성 모델에서 장기 생성 시 발생하는 오류 누적과 identity drift 문제를 해결하고자 합니다. 기존 방식은 효율성을 위해 Sliding-window Attention에만 의존하며, 생성된 초기 Latent를 폐기하거나 고정된 앵커(anchor)만을 사용합니다 .

#Review #Long Video Generation #Autoregressive #Retrieval-Augmented Generation #Video Diffusion #Temporal Consistency #Attention

2026년 6월 1일

[논문리뷰] LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning

본 논문은 Large Language Models (LLMs)의 long-context inference에서 발생하는 memory 및 compute cost 증가 문제를 해결하고자 한다.

#Review #Context Compression #Long-Context Reasoning #Large Language Models #Fine-Tuning #Cross-Attention #Code Reasoning #Cross-Family Generalization #Two-Stage Training

2026년 6월 1일

[논문리뷰] Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

본 논문은 현대의 다중 모델(multi-provider) 생태계에서 기존의 LLM 워터마킹 기술이 근본적으로 취약하다는 점을 지적합니다. 기존 연구들은 공격자가 단일 모델에만 접근할 수 있다는 가정하에 설계되었으나, 실제로는 사용자가 여러 frontier LLM을 자유롭게 사용할 수 있는 환경이 조성되어 있습니다.

#Review #Watermarking #LLM #Ensemble #Distributional Perturbation #WASH #Attribution

2026년 6월 1일

[논문리뷰] LVSA: Training-Free Sparse Attention for Long Video Diffusion

본 논문은 video diffusion transformers의 긴 영상 생성 과정에서 발생하는 dense self-attention의 연산 효율성 저하와 품질 저하 문제를 해결합니다.

#Review #Video Diffusion Transformers #Sparse Attention #Long Video Generation #Training-Free #FlashInfer #Attention Optimization

2026년 6월 1일

[논문리뷰] K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

본 논문은 최신 Frontier 모델들이 Agentic Capability 평가로 패러다임을 전환하고 있음에도 불구하고, 한국어 환경에 특화된 브라우징 에이전트 벤치마크가 부재하다는 문제 의식에서 출발합니다.

#Review #Web Browsing Agent #Korean Contexts #Agentic Benchmark #Information Retrieval #Multi-hop Reasoning #Synthetic Data Generation

2026년 6월 1일

[논문리뷰] Joint Agent Memory and Exploration Learning via Novelty Signals

본 논문은 LLM 기반 에이전트가 개방형 환경에서 효율적인 탐색을 수행하지 못하는 문제를 해결하고자 합니다. 기존 에이전트는 환경과의 상호작용 기록이 길어짐에 따라 전체 기록을 유지하는 데 발생하는 막대한 계산 비용과 메모리 저장 공간 문제에 직면해 있습니다.

#Review #Agent Memory #Exploration #Novelty Signals #GUI Agents #Latency #Token Efficiency #Latent Memory

2026년 6월 1일

[논문리뷰] Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

본 연구는 기존 검색 에이전트들이 semantic 검색 결정과 복잡한 상태 관리(bookkeeping)를 동시에 수행함에 따라 발생하는 학습의 비효율성과 성능 저하 문제를 해결하고자 합니다.

#Review #Retrieval-Augmented Generation #Reinforcement Learning #Stateful Harness #Cognitive Offloading #Search Agents

2026년 6월 1일

[논문리뷰] HakushoBench: A Japanese Chart and Table VQA Benchmark from Governmental White Papers

본 연구는 기존 VQA 벤치마크들이 주로 서구권의 데이터나 단순한 합성 차트에 편향되어 있어, 일본의 공식 행정 문서와 같이 복잡한 레이아웃과 높은 Domain-Specific 지식을 요구하는 자료에 대한 평가가 부족하다는 점을 해결하고자 합니다.

#Review #VQA #Japanese #Document AI #Multimodal LLMs #Chart Understanding #Table Reasoning #Benchmark

2026년 6월 1일

[논문리뷰] FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search

본 논문은 기존의 Agentic Search 모델들이 겪는 정답의 희소성 문제와 기존 Test-Time Compute scaling 기법들이 가진 신뢰성 한계를 해결하고자 합니다.

#Review #Agentic Search #Test-Time Compute #Self-Verification #Fine-Grained #LLM #Benchmark Auditing

2026년 6월 1일

[논문리뷰] EVA01: Unified Native 3D Understanding and Generation via Mixture-of-Transformers

본 논문은 기존의 Diffusion 기반 3D 생성 모델들이 의미론적 이해(semantic understanding)와 기하학적 추론(geometric reasoning)을 분리하여 처리함으로써 발생하는 한계를 해결하고자 합니다.

#Review #Multimodal Large Language Models #Mixture-of-Transformers #3D Native Generation #Context-aware Editing #Flow Matching #Sparse Voxel Representation

2026년 6월 1일

[논문리뷰] ESPO: Early-Stopping Proximal Policy Optimization

본 논문은 LLM의 다단계 추론(Multi-step reasoning) 과정에서 발생하는 연산 비효율성과 잘못된 학습 신호 문제를 해결하기 위해 ESPO를 제안한다.

#Review #Reinforcement Learning #Large Language Models #Proximal Policy Optimization #Early Stopping #Reasoning #Compute Efficiency #Credit Assignment

2026년 6월 1일

[논문리뷰] Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

본 논문은 Speculative decoding에서 draft 품질과 연산 비용 간의 trade-off 문제를 해결하는 것을 목표로 합니다.

#Review #Speculative Decoding #LLM Inference #Autoregressive Drafting #Parallel Drafting #Causal Modeling #Low-Rank Correction

2026년 6월 1일

[논문리뷰] Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

본 논문은 과학적 도해(Scientific Figure) 생성의 자동화가 현실적인 연구 환경의 다양성을 충족하지 못하며, 생성된 출력물이 편집 불가능하다는 한계를 해결하고자 합니다.

#Review #Scientific Figure Generation #Multi-Agent Harness #Editable SVGs #Raster-to-Vector Conversion #CraftBench #LLM Agent #Iterative Refinement

2026년 6월 1일

[논문리뷰] Confidence-Adaptive SwiGLU for Mixture-of-Experts

본 논문은 MoE 모델 내 SwiGLU 활성화 함수의 게이트 선택성이 훈련 과정 전반에 걸쳐 고정되어 있다는 점을 해결하고자 합니다.

#Review #Mixture-of-Experts #SwiGLU #Gate Sharpness #Routing Confidence #Transformer #Activation Function #MoE

2026년 6월 1일

[논문리뷰] Brain-IT-VQA: From Brain Signals to Answers

본 논문은 기존의 fMRI 기반 시각적 재구성 및 VQA 연구들이 가진 성능적 한계와 신경과학적 해석의 어려움을 해결하고자 합니다.

#Review #fMRI #Visual Question Answering #Brain Decoding #Vision-Language Models #Brain-IT #NSD-VQA

2026년 6월 1일

[논문리뷰] Agent Skills Should Go Beyond Text: The Case for Visual Skills

본 논문은 현재 에이전트 스킬 학습 패러다임이 텍스트 중심적(text-only)으로 구성되어 있어 시각적 과업 수행 시 발생하는 '텍스트 병목 현상(Textual Bottleneck)'을 해결하고자 합니다 .

#Review #Multimodal Agent #Visual Skill #Spatial Prior #GUI Grounding #Task Decomposition #Skill Reusability #Textual Degradation

2026년 6월 1일

[논문리뷰] Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillation

본 연구는 기존의 Multilingual Embedding Models가 Turkish와 같은 저자원(Low-resource) 언어에서 충분한 성능을 발휘하지 못하는 구조적 한계를 해결하고자 한다.

#Review #Multilingual Embedding Models #Turkish #Tokenizer Surgery #Offline Distillation #Cross-Lingual Transfer #Semantic Search

2026년 6월 1일

[논문리뷰] ACL-Verbatim: hallucination-free question answering for research

본 논문은 현대적인 Retrieval-Augmented Generation (RAG) 시스템이 근본적으로 지니고 있는 환각(Hallucination) 및 답변의 불투명성 문제를 해결하고자 합니다. 기존 LLM 기반 RAG는 문서를 참조하더라도 모델 내부 지식과 혼합되어 부정확하거나 무의미한 답변을 생성할 위험이 큽니다.

#Review #Retrieval-Augmented Generation #Hallucination-free #Extractive Question Answering #ModernBERT #ACL Anthology #Scientific QA

2026년 6월 1일

[논문리뷰] A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

본 논문은 기존의 툴 사용 에이전트 벤치마크가 고정된 시나리오에 의존함에 따라 발생하는 심각한 포화(Saturation) 현상과 벤치마크 구축의 높은 노동 집약적 비용 문제를 해결하고자 합니다.

#Review #Agent Benchmarks #Tool-use #Task Synthesis #Coverage #Difficulty #Adaptive Contrastive n-gram Model

2026년 6월 1일