최신 포스트

[triton] Membar 분석 함수 호출 시 smem offset 수정

Triton의 membar 분석에서 callee 함수의 shared memory 접근을 caller 컨텍스트로 변환할 때, allocation offset을 올바르게 반영하도록 수정한 PR을 분석합니다.

#Triton #Memory Barrier #Shared Memory #Function Call #Bug Fix

2026년 2월 9일

[triton] Generic Multi-CTA convert_layout 지원

Triton의 convert_layout 연산을 multi-CTA 환경에서 범용적으로 처리하도록 확장한 PR을 분석합니다. CTA 간 데이터 전송을 위한 cluster barrier와 distributed shared memory 활용 방식을 살펴봅니다.

#Triton #GPU Compiler #Multi-CTA #Layout Conversion #MLIR

2026년 2월 9일

[논문리뷰] Self-Improving World Modelling with Latent Actions

본 논문은 액션이 레이턴트 변수로 취급되는 상태-온리 시퀀스 로부터 LLM(Large Language Models) 및 VLM(Vision-Language Models)의 내재적 월드 모델링 능력을 향상시키는 것을 목표로 합니다.

#Review #World Modeling #Latent Actions #Self-Improvement #Reinforcement Learning #LLMs #VLMs #Inverse Dynamics Model #Forward World Modelling

2026년 2월 8일

[논문리뷰] Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training

다국어 환경에서 긴 추론 모델( LRMs )이 겪는 어려움, 즉 비영어권 질문에 대해 영어로 추론하려는 경향과 질문 언어로 추론 시 정확도가 현저히 떨어지는 문제를 해결하는 것을 목표로 합니다.

#Review #Multilingual Reasoning #Reinforcement Learning #Machine Translation #Question Understanding #Self-Improvement #Language Models #Cross-Lingual Alignment

2026년 2월 8일

[논문리뷰] SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks

기존의 다중 턴(multi-turn) 탈옥(jailbreak) 공격 방법론들이 겪는 탐색 복잡성 과 의도 왜곡(intent drift) 문제를 해결하고자 합니다.

#Review #Multi-Turn Jailbreaks #LLM Safety #Red Teaming #Reinforcement Learning #Intent Drift #Response-Agnostic Generation #Self-Tuning

2026년 2월 8일

[논문리뷰] RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs

논문은 LLM의 극단적인 2비트 양자화에서 발생하는 성능과 효율성 간의 치명적인 트레이드오프 를 해결하고자 합니다.

#Review #LLM Quantization #2-bit Quantization #Residual Binarization #Quantization-Aware Training (QAT)#Inter-Path Adaptation #Hardware Efficiency #Model Compression #Low-Bit LLMs

2026년 2월 8일

[논문리뷰] PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

본 논문은 통합 멀티모달 모델(UMMs)이 일상생활과 밀접한 컴퓨터 사용 계획 태스크(planning-oriented computer-use tasks)를 얼마나 잘 지원하는지 평가하는 것을 목표로 합니다.

#Review #Multimodal Models #Image Generation #Image Editing #Benchmark #Computer-Use Tasks #Planning #Evaluation Metrics

2026년 2월 8일

[논문리뷰] POINTS-GUI-G: GUI-Grounding Journey

본 논문은 최소한의 GUI grounding 능력을 가진 POINTS-1.5 와 같은 기반 모델에서 출발하여, GUI grounding을 위한 완전한 기술 파이프라인을 구축하고 자동화하는 것을 목표로 합니다.

#Review #GUI Grounding #Vision-Language Models (VLMs)#Reinforcement Learning (RL)#Data Engineering #UI Automation #Perception-intensive AI

2026년 2월 8일

[논문리뷰] On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

본 논문은 LLM의 강화 학습 미세 조정(RFT) 과정에서 발생하는 엔트로피 동학에 대한 이론적인 이해를 확립하고, 탐색-활용(exploration-exploitation) 균형을 최적화하는 실용적인 전략을 개발하는 것을 목표로 합니다.

#Review #Reinforcement Fine-Tuning (RFT)#Large Language Models (LLMs)#Entropy Dynamics #Exploration-Exploitation #Policy Optimization #GRPO #Entropy Control #Discriminator Score

2026년 2월 8일

[논문리뷰] OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale

본 논문은 MoE 아키텍처에서 전문가 전문화의 세분성과 하드웨어 실행 효율성 사이의 본질적인 trade-off를 해결하는 것을 목표로 합니다.

#Review #Mixture-of-Experts (MoE)#Fine-Grained Experts #Efficient Architectures #Transformer #Routing Algorithms #Hardware Acceleration #Sparse Models

2026년 2월 8일

[논문리뷰] OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

현재 LLM 에이전트 평가가 주로 연역적(deductive) 패러다임 에 집중되어 있어, 에이전트가 환경의 숨겨진 규칙을 자율적으로 발견하는 귀납적(inductive) 능력 을 측정하는 데 한계가 있음을 지적합니다.

#Review #LLM Agents #Benchmarking #Inductive Reasoning #Long-Horizon Tasks #Active Exploration #World Models #Autonomous Discovery

2026년 2월 8일

[논문리뷰] MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments

본 논문은 기존 모바일 GUI 에이전트 벤치마크가 메모리 능력을 체계적으로 평가하지 못하고 메모리 관련 태스크 비중이 5.2-11.8%에 불과 하며 교차 세션 학습 평가가 부재하다는 문제를 제기합니다.

#Review #Mobile GUI Agents #Memory Benchmarking #Short-Term Memory #Long-Term Memory #LLM-as-Judge #Dynamic Environments #Evaluation Metrics #Task Automation

2026년 2월 8일

[논문리뷰] MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration

대규모 언어 모델(LLM) 사전 학습 중 발생하는 갑작스러운 그레디언트 폭발 과 같은 훈련 불안정성 문제를 해결하는 것을 목표로 합니다. 특히, 이러한 불안정성의 근본적인 메커니즘을 규명하고 이를 효과적으로 방지하는 새로운 최적화 기법을 제안합니다.

#Review #LLM Training Stability #Gradient Explosion #Stable Rank #Jacobian Alignment #Matrix Sign Operation #Optimizer #Transformer

2026년 2월 8일

[논문리뷰] Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

연구 수준 수학 문제에 대한 LLM(Large Language Model) 생성 솔루션 의 검증은 전문가 시간을 많이 소모하고 기존 LLM 평가 모델은 신뢰할 수 없거나 편향되어 있습니다.

#Review #LLM Evaluation #Mathematical Reasoning #Oracle-Free Validation #Consequence-Based Utility #Solution Quality #In-Context Learning #Research-Level Math

2026년 2월 8일

[논문리뷰] InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

대규모 추론 모델의 Chain-of-Thought(CoT) 방식이 직면한 2차 비용, 컨텍스트 길이 제한, 'lost-in-the-middle' 현상 으로 인한 추론 품질 저하 문제를 해결하는 것을 목표로 합니다.

#Review #Iterative Reasoning #Reinforcement Learning #Large Language Models #Context Management #Summarization #Chain-of-Thought #Efficiency #Mathematical Reasoning

2026년 2월 8일

[논문리뷰] Group-Evolving Agents: Open-Ended Self-Improvement via Experience Sharing

본 논문은 기존의 개별 에이전트 중심, 트리 구조 진화 방식이 탐색적 다양성의 비효율적인 활용과 고립된 진화 브랜치로 인한 장기적인 누적 발전의 한계를 가지는 문제를 해결하고자 합니다. 궁극적으로 인간 개입 없이 스스로 구조적 설계를 수정하여 능력을 향상시키는 오픈엔드 자가 개선 에이전트 를 개발하는 것을 목표로 합니다.

#Review #Open-Ended Learning #Self-Improving Agents #Evolutionary Algorithms #Experience Sharing #Meta-Learning #Code Generation #Agent Frameworks

2026년 2월 8일

[논문리뷰] F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare

RLVR (Reinforcement Learning with Verifiable Rewards)에서 그룹 샘플링 기반의 정책 업데이트가 흔한 해결책으로 편향되어 희귀하지만 올바른 해결책을 간과하는 '정책 샤프닝(policy sharpening)' 문제를 해결하는 것이 목표입니다.

#Review #Reinforcement Learning #LLM #Policy Optimization #Reward Models #Diversity Preservation #Focal Loss #Group Sampling #Mathematical Reasoning

2026년 2월 8일

[논문리뷰] Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers

논문은 대규모 언어 모델(LLM) 훈련에서 Shampoo, Muon, SOAP 와 같은 행렬 기반 옵티마이저 의 효율성을 높이고자 합니다.

#Review #Distributed Training #Matrix-based Optimizers #Load Balancing #Asynchronous Compute #Data Parallelism #Tensor Parallelism #ZeRO-1 #LLMs

2026년 2월 8일

[논문리뷰] Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making

본 논문은 기존 의료 LLM이 보이는 수동적인 질문-답변 방식과 개방형 임상 상담에서의 환각 문제를 해결하고자 합니다. 능동적인 정보 획득, 장기적 추론, 적응형 환각 억제 기능을 갖춘 임상 등급의 의사결정 지원 시스템인 Baichuan-M3 를 개발하여 신뢰할 수 있는 의료 의사결정을 목표로 합니다.

#Review #Medical LLM #Clinical Decision Support #Reinforcement Learning #Hallucination Suppression #Multi-task Learning #Speculative Decoding #Quantization #Clinical Inquiry

2026년 2월 8일

[논문리뷰] Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities

본 논문은 LLM 추론에서 RLVR(Reinforcement Learning with Verifiable Rewards) 훈련 시 발생하는 엔트로피 붕괴(entropy collapse) 및 모드 붕괴(mode collapse) 문제를 해결하는 것을 목표로 합니다.

#Review #Reinforcement Learning #LLM Reasoning #Exploration-Exploitation #Group Relative Policy Optimization #Entropy Collapse #Generative Models #Confidence-Aware Rewards

2026년 2월 8일