최신 포스트

[논문리뷰] Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis

본 논문은 언어 모델의 자기 개선(Self-improvement)이 단순히 정적인 데이터셋을 모방하는 것을 넘어, 모델이 자신을 학습시킬 새로운 환경을 스스로 구축해야 한다는 관점에서 출발합니다.

#Review #Reinforcement Learning #Reasoning RL #Verifiable Environment Synthesis #Self-Improving LLM #Stable Solve–Verify Asymmetry

2026년 5월 14일

[논문리뷰] LLM-based Detection of Manipulative Political Narratives

본 연구는 소셜 미디어상에서 폭증하는 정치적 조작 서사를 실시간으로 식별하고 구조화하는 계산적 프레임워크의 부재 문제를 해결하고자 합니다.

#Review #FIMI #Strategic Narrative #LLM #HDBSCAN #UMAP #Computational Social Science #Manipulation Detection

2026년 5월 14일

[논문리뷰] IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation

본 논문은 프레임 단위로만 조건을 부여하는 기존 VLA 모델들이 부분 관측성(Partial Observability) 하에서 발생하는 짧은 기간의 의도 모호성 문제를 해결하지 못한다는 점을 지적합니다.

#Review #Vision-Language-Action (VLA)#Robot Manipulation #AliasBench #Short-Horizon Intent #Imitation Learning #Inter-chunk Consistency #Partial Observability

2026년 5월 14일

[논문리뷰] Ideology Prediction of German Political Texts

본 논문은 기존의 정치 성향 분석 도구들이 가진 이분법적 분류의 한계를 극복하고, 정치 담론을 연속적인 스펙트럼상에서 정량화하기 위한 새로운 알고리즘을 제안한다 .

#Review #Political Ideology Prediction #Transformer-based Models #Continuous Spectrum #Multilabel Classification #German Political Texts

2026년 5월 14일

[논문리뷰] FutureSim: Replaying World Events to Evaluate Adaptive Agents

본 연구는 AI 에이전트가 변화하는 현실 세계 환경에서 적응형 예측을 수행하는 능력을 실질적으로 측정하기 위한 표준화된 시뮬레이션 환경의 부재를 해결하고자 합니다. 기존의 게임 기반이나 정적인 벤치마크는 실제 사회적 진화와 사건의 연대기적 특성을 반영하지 못한다는 한계가 있습니다.

#Review #Adaptive Agents #Long-horizon Forecasting #Test-time Adaptation #Chronological Replay #Agentic Search #Brier Skill Score

2026년 5월 14일

[논문리뷰] FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale

본 논문은 open-ended 코딩 훈련을 위한 고품질 데이터의 부족 문제를 해결하기 위해 FrontierSmith를 제안합니다.

#Review #FrontierSmith #Open-ended Coding #LLM #Idea Divergence #Automated Data Synthesis #Reinforcement Learning

2026년 5월 14일

[논문리뷰] Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models

본 논문은 AR 비디오 확산 모델에서 발생하는 과도한 어텐션 연산 복잡도와 메모리 오버헤드 문제를 해결하고자 합니다. 기존 모델들은 생성된 프레임이 축적될수록 전체 KV Cache를 참조하도록 강제되어, 고해상도 및 장기 비디오 생성 시 효율성이 극도로 저하되는 한계가 있습니다 .

#Review #Autoregressive Video Diffusion #KV Cache Compression #Attention Head Specialization #Inference Efficiency #Video Generation

2026년 5월 14일

[논문리뷰] EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

기존 LLM agent의 기억 시스템은 저장된 데이터(콘텐츠)는 진화하지만, 이를 검색하는 Retrieval 인프라가 배포 시점에 고정되어 있어 시간 경과에 따른 최적화 불일치(Mismatch)가 발생한다.

#Review #LLM Agents #Long-term Memory #AutoResearch #Self-evolving Architecture #Retrieval-Augmented Generation

2026년 5월 14일

[논문리뷰] Dynamic Latent Routing

본 연구는 LLM의 post-training 과정에서 기존 discrete latent 주입 방식이 갖는 구조적 파괴와 학습 단계의 비효율성을 해결하고자 합니다.

#Review #Dynamic Latent Routing #Markov Decision Processes #General Dijkstra Search #Language Model #Representation Engineering #Policy Composition

2026년 5월 14일

[논문리뷰] Does Synthetic Layered Design Data Benefit Layered Design Decomposition?

본 연구는 고품질 레이어드 그래픽 디자인 데이터를 생성하기 위한 스케일러블(scalable)하고 실용적인 대안으로서 순수 합성 데이터의 효용성을 검증하고자 합니다.

#Review #Layered Design Decomposition #Synthetic Data #Graphic Design #Data-Centric Study #VLM-Guided Inference #CLD Baseline

2026년 5월 14일

[논문리뷰] DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models

본 논문은 기존의 멀티태스크 강화학습(RL) 방식이 겪는 최적화 간섭(Optimization Interference)과 성능 불균형 문제를 해결하기 위해 고안되었습니다.

#Review #Diffusion Models #On-Policy Distillation #Multi-Task Reinforcement Learning #Flow Matching #Preference Alignment

2026년 5월 14일

[논문리뷰] Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

본 논문은 대규모 LLM의 추론 성능 향상을 위한 고비용의 post-training(instruction tuning, RL 등) 과정을 배제하고, 기존 Checkpoint 내에 잠재된 능력을 재조합하는 비용 효율적인 대안을 제시합니다.

#Review #Model Merging #Evolutionary Optimization #Large Language Models #Reasoning #Diagnostic-Guided #Training-Free

2026년 5월 14일

[논문리뷰] CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves

본 논문은 현대의 Vision-Language Models(VLMs)가 시각적 입력으로부터 위상적 위계 구조를 정확하게 파악하는 데 한계를 보인다는 문제를 해결하고자 합니다.

#Review #Topological Reasoning #Vision-Language Models #Jordan Curves #Reinforcement Learning #Structured Prediction #Containment Tree

2026년 5월 14일

[논문리뷰] Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

본 논문은 실시간 인터랙티브 비디오 생성을 위해 Frame-wise 수준의 초저지연 1–2 step 생성 체계로 확장이 필요함을 정의합니다 . 기존의 연구들은 주로 Chunk-wise 4-step 방식을 채택하여 실시간성 확보에 한계가 있었으며, 적절한 Few-step AR 학생 모델 초기화가 병목 현상으로 작용합니다.

#Review #Autoregressive Diffusion #Diffusion Distillation #Real-time Video Generation #Causal Consistency Distillation #Few-Step Inference #World Models

2026년 5월 14일

[논문리뷰] Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation

본 논문은 현대의 Omni-modal LLM들이 기록하는 벤치마크 성능 향상이 진정한 모달리티 통합(integration)보다는 visual shortcut을 활용한 결과일 수 있다는 문제를 제기합니다.

#Review #Omni-modal LLM #Visual Leakage #OmniClean #Staged Post-Training #Self-Distillation #Reinforcement Learning

2026년 5월 14일

[논문리뷰] Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

본 논문은 LLM 기반 멀티 에이전트 시스템이 고도로 복잡해짐에 따라 발생하는 비예측적 장애와 구조적 경직성 문제를 해결하기 위해 작성되었습니다.

#Review #LLM-based Agents #Multi-Agent Systems #Multi-Agent Collaboration #Failure Attribution #Self-Evolution

2026년 5월 14일

[논문리뷰] BOOKMARKS: Efficient Active Storyline Memory for Role-playing

기존 Role-playing Agents (RPAs)의 메모리 시스템은 주로 recurrent summarization 방식에 의존하며, 이는 중요한 세부 정보가 압축 과정에서 불가피하게 손실되는 문제를 야기합니다.

#Review #Role-playing Agents #Memory Systems #Search-based Grounding #Active Grounding #Passive Updating #Long-horizon Consistency #Efficiency #Storyline Memory

2026년 5월 14일

[논문리뷰] BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

본 논문은 표준 MoE 모델의 고정된 Top-K 라우팅 방식이 초래하는 연산 중복 문제를 해결하기 위해 BEAM을 제안한다. 기존의 Top-K 메커니즘은 토큰별 복잡도를 고려하지 않고 모든 토큰에 동일한 수의 Expert를 할당하여 불필요한 연산을 발생시킨다.

#Review #Mixture-of-Experts #Dynamic Routing #Expert Sparsity #Inference Acceleration #Binary Expert Activation Masking #vLLM

2026년 5월 14일

[논문리뷰] Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

본 논문은 LLM reasoning을 위한 On-Policy Self-Distillation (OPSD)에서 teacher-side exposure mismatch라는 간과된 bottleneck을 식별하고 해결하고자 합니다.

#Review #Self-Distillation #LLM Reasoning #Teacher Exposure #On-Policy #Adaptive Control #Reinforcement Learning #Beta-policy

2026년 5월 14일

[논문리뷰] Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

본 연구는 고도의 수학 및 과학 Olympiad 문제에서 금메달 수준의 추론 능력을 갖춘 모델을 만들기 위한 간단하고 통합된 레시피를 제안합니다. 기존의 일반적인 추론 모델들은 수학적 문제 해결에서 단기적인 성과를 내지만, 복잡한 증명 문제에 필요한 엄격한 추론과 검증 능력이 부족하다는 한계가 있습니다.

#Review #Olympiad Reasoning #Reinforcement Learning #Test-time Scaling #Supervised Fine-tuning #Reasoning Models #Proof-search #Reverse-Perplexity Curriculum

2026년 5월 14일