최신 포스트

[논문리뷰] OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents

본 논문은 급격히 팽창하는 LLM Agent용 Skill 생태계에서 발생하는 평가의 불투명성과 비효율성 문제를 해결하고자 합니다. 현재 커뮤니티에서 배포되는 수많은 Skill들이 실제 성능 향상에 기여하는지, 혹은 특정 모델과 프레임워크에서 어떻게 상호작용하는지에 대한 체계적인 분석이 부재합니다.

#Review #LLM Agents #Agent Skills #Automatic Evaluation #Skill Ecosystem #Benchmarking #Trajectory Trace Analysis #Artifact Evaluation

2026년 5월 31일

[논문리뷰] One Click per Cell Type Suffices: Training-free Group Interaction for Cell Instance Segmentation

본 연구는 기존 세포 인스턴스 분할 모델들이 학습 데이터에 종속되어 Out-of-Distribution (OOD) 세포 유형에서 성능이 급격히 저하되는 문제를 해결하고자 합니다.

#Review #Cell Instance Segmentation #Foundation Models #Group Prompting #Chain-of-Prompts #Training-free #Histopathology #SAM

2026년 5월 31일

[논문리뷰] Not All Disagreement Is Learnable: Token Teachability in On-Policy Distillation

본 논문은 기존의 Selective OPD 기법들이 단순히 토큰의 불확실성(Entropy)이나 교사-학생 간의 불일치(Divergence)만을 토큰 선택 기준으로 삼는 한계를 해결하고자 합니다.

#Review #On-policy Distillation #Knowledge Distillation #Token Teachability #Selective OPD #Teacher-Student Compatibility

2026년 5월 31일

[논문리뷰] Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

본 논문은 Physical AI 환경에서 필수적인 batch-1 LLM decode가 단순히 HBM Bandwidth에 의해서만 제한되는 것이 아니라, CPU-side Launch Overhead에 의해 크게 제약받고 있음을 밝힙니다.

#Review #Batch-1 Inference #LLM Decode #HBM Bandwidth #CUDA Graphs #Launch Overhead #Physical AI

2026년 5월 31일

[논문리뷰] Mellum2 Technical Report

Marko Kojic이 arXiv에 게시한 'Mellum2 Technical Report' 논문에 대한 자세한 리뷰입니다.

#Review #LLM #Pretraining #Model Architecture #Technical Report #Evaluation #Training Pipeline

2026년 5월 31일

[논문리뷰] MAAT: Multi-phase Adapter-Aware Targeted Unlearning

본 논문은 기존의 기계 망각(machine unlearning) 연구들이 인과 관계(causal knowledge)를 다루는 'Why-type' 질문에 대한 평가가 전무하다는 결정적인 결함을 해결하고자 한다.

#Review #Machine Unlearning #LoRA #Causal Knowledge #5WBench #Adapter-Aware #SVD Pruning

2026년 5월 31일

[논문리뷰] Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified Models

본 논문은 기존의 커넥터 기반 비디오 생성 모델이 높은 시각적 품질과 복잡한 논리적 추론 능력을 동시에 달성하는 데 겪는 한계를 해결하고자 합니다.

#Review #Video Unified Models #Unified Progressive Frequency Bridging #Reasoning-driven Generation #Connector-based #Flow-matching #Visual Fidelity

2026년 5월 31일

[논문리뷰] LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

본 논문은 기존 long-context 강화학습이 가진 데이터의 낮은 난이도와 보상 신호의 희소성(Sparsity) 문제를 해결하고자 합니다.

#Review #Long-Context #Reinforcement Learning #Rubric Reward #Search Agent Trajectories #Tiered Distractors #Multi-hop Reasoning

2026년 5월 31일

[논문리뷰] LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

실제 데이터 분석은 단일 단계가 아닌, 긴 세션 동안 상태가 지속적으로 축적되고 변화하는 반복적 과정입니다. 그러나 기존 데이터 분석 벤치마크는 주로 독립적이거나 짧은 인터랙티브 작업만을 평가하여, 복잡한 분석 세션 속에서 상태를 추적하고 수정하는 에이전트의 능력을 충분히 테스트하지 못합니다 .

#Review #Agentic Data Analysis #Long-Horizon #State Management #Benchmark #LLM Agents #State-Evolution

2026년 5월 31일

[논문리뷰] Linear Scaling Video VLMs for Long Video Understanding

본 논문은 현대의 Video VLM이 긴 비디오나 실시간 스트리밍 작업을 처리할 때 겪는 이차 시간(quadratic time) 복잡도 문제를 해결하는 것을 목표로 합니다.

#Review #Video VLM #Long-video Understanding #Linear Scaling #StateKV #KV Cache Compression #Attention Approximation

2026년 5월 31일

[논문리뷰] Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models

본 논문은 interactive video world model의 장기 생성 시 발생하는 과도한 연산 비용과 추론 지연 문제를 해결하기 위해 Light Interaction을 제안한다.

#Review #Interactive Video World Models #Inference Acceleration #Adaptive Context Management #Denoising Cache Acceleration #3D Sparse Attention #Autoregressive Generation

2026년 5월 31일

[논문리뷰] How can embedding models bind concepts?

본 논문은 최신 Vision-Language Embedding Models인 CLIP이 개념을 개별적으로는 잘 인지하면서도, 이들을 올바르게 조합하여 객체를 구성하는 Concept Binding에는 실패하는 문제에 주목합니다.

#Review #Concept Binding #Embedding Models #Compositional Generalization #Multiplicative Interaction #Representation Geometry #CLIP #Transformer

2026년 5월 31일

[논문리뷰] Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring

본 논문은 범용 VLA 모델이 실환경 배포 시 겪는 실행 실패 문제를 실시간으로 감지하기 위한 효율적인 방법을 모색합니다. 기존의 방법론들은 고가의 단계별 실패 주석이 필요하거나, 액션 재샘플링 및 외부 VLM 모델 사용에 따른 높은 계산 오버헤드로 인해 실시간 배포가 어렵다는 한계가 있습니다.

#Review #Vision-Language-Action (VLA)#Failure Detection #Coarsely Supervised Learning #Contrastive Learning #Conformal Prediction #Embodied AI

2026년 5월 31일

[논문리뷰] GrepSeek: Training Search Agents for Direct Corpus Interaction

본 논문은 기존의 retrieval-augmented agentic search 시스템이 pre-computed index와 retriever에 의존함으로써 발생하는 한계를 해결하고자 합니다.

#Review #Direct Corpus Interaction #Search Agent #Reinforcement Learning #Sharded-Parallel Execution #Information Retrieval #Agentic Search

2026년 5월 31일

[논문리뷰] GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

실세계 이미지 복원(IR) 모델은 학습 데이터 부족으로 인해 실제 환경에서의 일반화 성능이 현저히 떨어지는 고질적인 병목 현상을 겪고 있습니다. 합성 데이터는 실세계의 복잡한 열화(degradation) 과정을 제대로 모델링하지 못하며, 실제 촬영된 데이터는 비용과 확장성 및 장면 다양성 확보에 한계가 있습니다.

#Review #Image Restoration #Generative Ground Truth #Multimodal Foundation Models #Generalization #Dataset Construction #Quality Control

2026년 5월 31일

[논문리뷰] GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models

dLLMs는 기존의 Autoregressive Models(ARMs) 대비 효율적인 생성 성능을 제공하지만, 최적의 성능을 위해 필요한 강화학습(RL) 적용 시 정책 likelihood가 계산 불가능하다는 핵심적인 난관에 직면합니다.

#Review #Diffusion Language Models #Reinforcement Learning #Self-Distillation #Training-Inference Mismatch #Logit Matching

2026년 5월 31일

[논문리뷰] Function2Scene: 3D Indoor Scene Layout from Functional Specifications

기존의 텍스트 기반 3D 실내 레이아웃 생성 모델들은 주로 가구 목록을 배치하는 '객체 중심(object-centric)' 접근 방식을 취하고 있어, 실제 실내 디자인의 핵심인 인간의 활동과 기능을 충분히 지원하지 못한다는 한계가 있습니다 .

#Review #3D Indoor Scene Synthesis #Functional Specification #Constraint Taxonomy #Iterative Refinement #Agentic Pipeline #Human-Centered Design

2026년 5월 31일

[논문리뷰] From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

본 논문은 Agentic Harness 환경에서 발생하는 다단계 Trojan 공격이 기존의 단일 턴 기반 방어 체계를 무력화하는 심각한 보안 위협임을 지적한다.

#Review #Agentic Harness #Multi-step Trojan Attack #Prompt Injection #DASGuard #ClawTrojan #Workspace Security

2026년 5월 31일

[논문리뷰] Flat-Pack Bench: Evaluating Spatio-Temporal Understanding in Large Vision-Language Models through Furniture Assembly

본 논문은 현대의 LVLM이 일상적인 비디오 이해와 조작 과제를 해결하기 위한 세밀한 시공간적 추론 능력이 부족하다는 문제에서 시작한다.

#Review #Large Vision-Language Models #Video Understanding #Spatio-Temporal Reasoning #Furniture Assembly #Object Tracking #Contact Reasoning

2026년 5월 31일

[논문리뷰] FRAPPE: Full Input, Residual Output Autoencoding with Projection Pursuit Encoder

본 연구는 로봇, 웨어러블 기기 등 자원이 제한된 환경에서 클라우드 기반의 AI 인식을 원활하게 수행하기 위한 실시간 영상 압축 기술의 한계를 해결하고자 합니다.

#Review #Compression #Autoencoder #Projection Pursuit #Asymmetric Codec #Real-time #Resource-constrained #Variable-rate

2026년 5월 31일