Review

[논문리뷰] One-step Language Modeling via Continuous Denoising

기존 이산 확산(discrete diffusion) 언어 모델 이 토큰 간 상관관계를 무시하는 인자화된 근사(factorized approximation)로 인해 소수 단계(few-step) 생성 시 품질이 급격히 저하되는 문제를 해결하고자 합니다.

#Review #Language Modeling #Continuous Denoising #Flow-based Models #Diffusion Models #One-step Generation #Few-step Sampling #Time Reparameterization #Model Distillation

2026년 2월 24일

[논문리뷰] On Data Engineering for Scaling LLM Terminal Capabilities

본 논문은 최신 터미널 에이전트의 훈련 데이터 전략에 대한 정보 부족을 해결하고자 합니다. LLM의 터미널 역량 확장을 위한 데이터 엔지니어링 실천법을 체계적으로 연구하고, 효율적이고 확장 가능한 데이터 생성 프레임워크를 통해 효과적인 터미널 에이전트를 훈련하는 것을 목표로 합니다.

#Review #LLM #Terminal Agents #Data Engineering #Synthetic Data Generation #Supervised Fine-tuning (SFT)#Terminal-Bench #Nemotron-Terminal #Dataset Adapters

2026년 2월 24일

[논문리뷰] OmniOCR: Generalist OCR for Ethnic Minority Languages

대부분의 OCR 시스템이 잘 알려진 스크립트에 집중되어 있어, 복잡한 문자 체계와 희소한 데이터를 가진 소수 민족 언어(Ethnic Minority Languages) 의 OCR은 zero-shot 환경에서 일반화가 어렵습니다.

#Review #OCR #Ethnic Minority Languages #Low-Resource #Dynamic LoRA #Parameter-Efficient Fine-Tuning #Multimodal Models #Sparsity Regularization

2026년 2월 24일

[논문리뷰] OCR-Agent: Agentic OCR with Capability and Memory Reflection

Large Vision-Language Models(VLM)이 복잡한 시각 이해 태스크에서 인지적 편향을 독립적으로 수정하지 못하고, 반복적이고 비효율적인 수정 루프에 빠져 답변 품질을 안정적으로 개선하지 못하는 문제를 해결하는 것이 목표입니다.

#Review #OCR #VLM #Self-Correction #Agentic AI #Capability Reflection #Memory Reflection #Iterative Refinement #Chain-of-Thought

2026년 2월 24일

[논문리뷰] LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces

본 논문은 기존 벤치마크의 한계(짧은 태스크 범위, 데이터 오염, 미흡한 평가 지표)를 극복하고, 명령줄 인터페이스(CLI) 환경 에서 에이전트 기반 프로그래밍의 장기적인 계획 및 실행 능력 을 엄격하게 평가할 수 있는 종합 벤치마크인 LongCLI-Bench 를 제안하는 것을 목표로 합니다.

#Review #Agentic Programming #CLI #Benchmark #Long-horizon Tasks #Code Generation #LLM Evaluation #Human-Agent Collaboration #Software Engineering

2026년 2월 24일

[논문리뷰] Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

본 논문은 기존 Embodied LLM이 고정된 오라클로서 실패로부터 학습하거나 경험을 축적하지 못하여 반복적인 실수를 초래하는 문제를 해결하고자 합니다.

#Review #Embodied LLMs #Test-Time Adaptation #Reflection-in-Action #Reflection-on-Action #Robotics #Long-Horizon Planning #Policy Gradient #Self-Supervised Learning

2026년 2월 24일

[논문리뷰] LaS-Comp: Zero-shot 3D Completion with Latent-Spatial Consistency

기존 3D 형상 완성 방법론들이 다양한 부분 관측 패턴, 범주 간 일반화, 그리고 쌍을 이루는 데이터셋 의존성 및 불완전한 렌더링 가정에서 겪는 한계를 극복하는 것을 목표로 합니다.

#Review #3D Shape Completion #Zero-shot #Latent-Spatial Consistency #Foundation Models #Diffusion Models #Category-Agnostic #Generative Priors

2026년 2월 24일

[논문리뷰] Implicit Intelligence -- Evaluating Agents on What Users Don't Say

AI 에이전트가 사용자의 명시적 지시 를 따르는 것을 넘어, 암묵적인 기대치와 요구사항 을 추론하고 충족하는 능력을 평가하는 것을 목표로 합니다. 현실 세계의 요청은 본질적으로 불완전하게 명시되며, 기존 벤치마크들이 명시적인 지시 수행 에만 초점을 맞춰왔다는 한계를 극복하고자 합니다.

#Review #Implicit Intelligence #AI Agents #Agent-as-a-World #Contextual Reasoning #Safety #Privacy #Accessibility #LLM Evaluation

2026년 2월 24일

[논문리뷰] From Perception to Action: An Interactive Benchmark for Vision Reasoning

기존 VLM 평가가 구조 불가지론적이고 단일 턴 질의응답(VQA)에 치중하여 동적 환경에서 기하학, 접촉, 지지 관계가 행동 가능성을 어떻게 제약하는지에 대한 에이전트의 추론 능력을 평가하지 못하는 문제를 해결하는 것이 목표입니다.

#Review #Vision-Language Models #Physical Reasoning #Interactive AI #3D Benchmark #Mechanical Puzzles #Spatial Packing #Embodied AI

2026년 2월 24일

[논문리뷰] FlowPrefill: Decoupling Preemption from Prefill Scheduling Granularity to Mitigate Head-of-Line Blocking in LLM Serving

본 논문은 LLM 서빙 시스템에서 컴퓨팅 집약적인 프리필(prefill) 단계 중 발생하는 헤드-오브-라인(Head-of-Line, HoL) 블로킹 문제 를 해결하고자 합니다.

#Review #LLM Serving #Head-of-Line Blocking #Preemption #Prefill Scheduling #Time-to-First-Token (TTFT)#SLO-aware Scheduling #Operator-Level Preemption #Event-Driven Scheduling

2026년 2월 24일

[논문리뷰] DREAM: Deep Research Evaluation with Agentic Metrics

본 논문은 기존의 심층 연구 에이전트(Deep Research Agent, DRA) 평가 벤치마크들이 겪는 '합성의 신기루(Mirage of Synthesis)' 문제를 해결하고자 합니다.

#Review #Deep Research Evaluation #Agentic Evaluation #LLM Evaluation #Capability Parity #Factuality #Temporal Validity #Reasoning Quality #Research Agents #Mirage of Synthesis

2026년 2월 24일

[논문리뷰] Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

본 논문은 금융 자문 분야에서 LLM 기반 추천 시스템 의 성능 평가가 단순히 사용자의 행동 모방에 그치지 않고, 실질적인 효용성(utility)에 기반한 의사결정 품질 을 측정하는 것을 목표로 합니다.

#Review #Financial Recommendation #Conversational AI #Large Language Models #Utility-Grounded Evaluation #Behavioral Finance #Stock Recommendation #Longitudinal Benchmark #Inverse Optimization

2026년 2월 24일

[논문리뷰] Communication-Inspired Tokenization for Structured Image Representations

본 논문은 기존 이미지 토크나이저들이 재구성 및 압축에만 초점을 맞춰 객체 수준의 의미론적 구조보다는 국부적인 텍스처를 포착하는 한계를 해결하고자 합니다.

#Review #Image Tokenization #Structured Representation #Attentive Encoding #Flow Matching #Semantic Alignment #Compositional Generalization #Transformer Architecture

2026년 2월 24일

[논문리뷰] Aletheia tackles FirstProof autonomously

이 논문은 Aletheia 라는 수학 연구 에이전트가 FirstProof 챌린지 에서 보여준 성능을 보고합니다. 주요 목표는 AI가 전문 수학 문헌의 엄격한 기준에 부합하는 연구 수준의 수학 문제를 자율적으로 해결할 수 있는 능력을 평가하고, 그 결과를 투명하게 공개하는 것입니다.

#Review #Mathematics Research Agent #Autonomous Problem Solving #FirstProof Challenge #Gemini 3 Deep Think #Mathematical Proof Generation #Human-AI Interaction #Deep Learning

2026년 2월 24일

[논문리뷰] Adaptive Text Anonymization: Learning Privacy-Utility Trade-offs via Prompt Optimization

본 논문은 기존 텍스트 익명화 방법론들이 수동적이고 정적이며 다양한 도메인과 프라이버시-유틸리티 요구사항에 유연하게 대응하지 못하는 한계를 해결하고자 합니다. 이를 위해 익명화 전략을 특정 프라이버시-유틸리티 요구사항에 맞춰 자동으로 조정하는 적응형 텍스트 익명화 라는 새로운 태스크를 제안합니다.

#Review #Text Anonymization #Large Language Models #Prompt Optimization #Privacy-Utility Trade-offs #Evolutionary Algorithms #Multi-objective Optimization #Data Privacy

2026년 2월 24일

[논문리뷰] tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction

본 논문은 기존 3D 재구성 모델들이 가지는 느린 최적화 , 제한적인 입력 뷰 확장성 , 그리고 긴 시퀀스 컨텍스트 처리 능력 의 한계를 극복하는 것을 목표로 합니다.

#Review #3D Reconstruction #Test-Time Training (TTT)#Autoregressive Modeling #Long-Context #Gaussian Splatting #Neural Radiance Fields #Large Reconstruction Models

2026년 2월 23일

[논문리뷰] VLANeXt: Recipes for Building Strong VLA Models

파편화되어 있는 Vision-Language-Action (VLA) 모델 연구 분야에 구조를 제공하고, 일관된 프레임워크와 평가 환경에서 VLA 모델의 설계 공간을 체계적으로 재검토하는 것을 목표로 합니다.

#Review #Vision-Language-Action Models #Robotics #Imitation Learning #Foundation Models #Ablation Study #Generalization #LIBERO Benchmark #Time-Series Forecasting

2026년 2월 23일

[논문리뷰] TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics

본 논문은 로봇 공학 분야의 주요 병목 현상인 보상 모델링 문제를 해결하기 위해, 사전 훈련된 Vision-Language Models (VLMs) 의 내부 토큰 확률 을 활용하여 제로-샷(zero-shot) 진척도(progress) 추정 을 가능하게 하는 것을 목표로 합니다.

#Review #Robotics #Reward Modeling #Vision-Language Models #Zero-Shot Learning #Token Probabilities #Progress Estimation #Behavior Cloning #Manipulation

2026년 2월 23일

[논문리뷰] SkillOrchestra: Learning to Route Agents via Skill Transfer

논문은 복합 AI 시스템에서 효과적인 오케스트레이션 문제를 해결하고자 합니다.

#Review #Agent Orchestration #Skill Transfer #LLM Routing #Performance-Cost Trade-off #Routing Collapse #Multi-turn Dialogue #Skill Handbook #Reinforcement Learning

2026년 2월 23일

[논문리뷰] SimVLA: A Simple VLA Baseline for Robotic Manipulation

본 논문은 급변하는 VLA 연구 분야에서 성능 향상의 정확한 원인을 파악하기 어려운 문제를 해결하기 위해, 간소화된 VLA 베이스라인 SimVLA 를 제안합니다.

#Review #Robotic Manipulation #Vision-Language-Action (VLA) Models #Baseline Model #Modular Design #Flow Matching #Zero-Shot Generalization #Standardized Training #Efficiency

2026년 2월 23일