최신 포스트

[논문리뷰] Learning to Discover at Test Time

본 연구는 AI를 활용하여 과학적 문제에서 새로운 SOTA(State-of-the-Art) 솔루션 을 발견하는 방법을 제시합니다. 특히, 훈련 데이터 범위를 넘어서는 새로운 아이디어 를 요구하는 난제들을 LLM이 테스트 시점에 지속적으로 학습 하며 해결하는 것을 목표로 합니다.

#Review #Test-Time Training #Reinforcement Learning #Scientific Discovery #LLM Optimization #GPU Kernel Engineering #Algorithm Design #Single-Cell Analysis

2026년 1월 22일

[논문리뷰] LLM-in-Sandbox Elicits General Agentic Intelligence

본 논문은 LLM이 코드 샌드박스(가상 컴퓨터) 내에서 탐색할 수 있도록 지원하여, 비-코드 도메인에서 일반 에이전트 지능 을 이끌어내는 LLM-in-Sandbox 패러다임을 제안합니다.

#Review #LLM-in-Sandbox #Agentic Intelligence #Code Sandbox #Reinforcement Learning #Generalization #Tool Use #Multi-Modal Generation #Long-Context Processing

2026년 1월 22일

[논문리뷰] HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

기존 Multimodal Large Language Models (MLLMs) 이 스트리밍 비디오 이해에서 겪는 성능 불안정, 높은 응답 지연 시간, 높은 GPU 메모리 사용량 등의 문제를 해결하는 것을 목표로 합니다.

#Review #Streaming Video Understanding #KV Cache Management #Hierarchical Memory #MLLMs #Low Latency #Training-free #Memory Efficiency

2026년 1월 22일

[논문리뷰] EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience

본 논문은 정적 데이터 스케일링의 한계로 인해 장기적인 컴퓨터 사용 작업에서 복잡한 인과적 역학을 포착하는 데 어려움을 겪는 네이티브 컴퓨터 사용 에이전트(CUA) 의 문제를 해결하고자 합니다.

#Review #Computer Use Agent #Synthetic Experience #Evolutionary Learning #Reinforcement Learning #Direct Preference Optimization #GUI Automation #Scalable Infrastructure #Verifiable Synthesis

2026년 1월 22일

[논문리뷰] Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

본 논문은 대규모 사전 훈련된 비디오 생성 모델 의 시공간적 사전 지식을 로봇 정책 학습에 활용하는 것을 목표로 합니다.

#Review #Video Models #Visuomotor Control #Robot Policy #Fine-tuning #Diffusion Models #World Models #Model-based Planning #Imitation Learning

2026년 1월 22일

[논문리뷰] BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries

Vision-Language-Action (VLA) 모델이 새로운 지시나 복잡한 다중 작업 시나리오에서 일반화하는 데 어려움을 겪는 문제를 해결하는 것이 목표입니다.

#Review #Vision-Language-Action Models #Bayesian Decomposition #Latent Action Queries #Information Collapse #OOD Generalization #Robot Manipulation #Pointwise Mutual Information

2026년 1월 22일

[논문리뷰] ActionMesh: Animated 3D Mesh Generation with Temporal 3D Diffusion

논문은 기존 애니메이션 3D 객체 생성 모델의 한계점인 느린 최적화 과정, 제한적인 입력 방식, 낮은 품질, 그리고 토폴로지 불일치 문제 를 해결하고자 합니다.

#Review #3D Mesh Generation #Animated 3D Models #Temporal Diffusion #Video-to-4D #Deep Learning #Generative Models #Topology Consistency

2026년 1월 22일

[논문리뷰] 360Anything: Geometry-Free Lifting of Images and Videos to 360°

본 논문은 기존의 카메라 메타데이터(FoV, 자세)에 의존하는 한계를 극복하고, 단일 시점의 이미지 및 비디오를 360° 파노라마 로 변환하는 견고하고 기하학적 제약이 없는(geometry-free) 프레임워크를 개발하는 것을 목표로 합니다.

#Review #Panorama Generation #Diffusion Transformers #Geometry-Free Learning #Latent Encoding #Seam Artifacts #Camera Pose Estimation #Video Outpainting

2026년 1월 22일

[Loki] shuffle shard 캐시 크기 설정을 experimental로 표시

Grafana Loki의 shuffle-shard-cache-size 설정 플래그를 experimental로 표시하여, 향후 변경 가능성을 사용자에게 명확히 전달하는 PR을 분석합니다.

#Grafana Loki #Configuration #Experimental #Documentation #Cache

2026년 1월 22일

[triton] AMD membarFilter에 bufferID 고려 추가

AMD 백엔드의 membar 분석에서 buffer ID를 고려하여 불필요한 barrier 삽입을 줄이고, 재사용된 allocation 간 누락된 barrier를 올바르게 삽입하도록 개선한 PR을 분석합니다.

#Triton #AMD GPU #Memory Barrier #Shared Memory #Optimization

2026년 1월 22일

[Triton] AxisInfo의 divisibility 초기화 로직 문서화 개선

MulIOp에서 contiguity > 1일 때 divisibility를 1로 리셋하는 이유를 명확히 문서화

#Triton #Documentation #MLIR #AxisInfo #Compiler Analysis

2026년 1월 22일

[triton] CUDA 가변 인자 Pre-compiled Launcher로 커널 런치 오버헤드 제거

Triton의 CUDA/HIP 커널 런처를 Python 문자열 치환 방식에서 C 기반 가변 인자 방식으로 전환하여 런치 오버헤드를 제거한 PR을 분석합니다.

#Triton #CUDA #HIP #Runtime #Performance

2026년 1월 21일

[논문리뷰] sangkuriang: A pseudo-spectral Python library for Korteweg-de Vries soliton simulation

본 논문은 Korteweg-de Vries (KdV) 방정식을 해결하는 오픈소스 Python 라이브러리인 sangkuriang을 소개하는 것을 목표로 합니다.

#Review #Nonlinear Wave Physics #Soliton Simulation #Korteweg-de Vries Equation #Pseudo-spectral Methods #Adaptive Time Integration #Python Library #Computational Physics

2026년 1월 21일

[논문리뷰] XR: Cross-Modal Agents for Composed Image Retrieval

AI 시대의 Composed Image Retrieval (CIR)에서 기존 유사성 기반 패러다임의 한계를 극복하고, 레퍼런스 이미지와 텍스트 수정 사항을 통합하는 데 필요한 교차-모달 추론 능력 을 향상시키는 것이 목표입니다.

#Review #Composed Image Retrieval #Cross-Modal Agents #Multimodal Reasoning #Training-free Framework #Information Retrieval #Agentic AI #Progressive Retrieval

2026년 1월 21일

[논문리뷰] Typhoon OCR: Open Vision-Language Model For Thai Document Extraction

기존 VLM이 태국어와 같은 저자원 언어의 복잡한 스크립트 특성(비라틴 문자, 명시적 단어 경계 부재, 스택형 발음 구별 부호) 및 비정형 문서 레이아웃으로 인해 겪는 한계를 해결하는 것입니다.

#Review #Vision-Language Model #OCR #Thai Language Processing #Document Understanding #Low-Resource Language #Data Synthesis #Fine-tuning #Layout Analysis

2026년 1월 21일

[논문리뷰] Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition

본 논문은 높은 지연 시간 때문에 스트리밍 애플리케이션에 비실용적인 대규모 오프라인 ASR 모델(예: Whisper)의 한계를 극복하고, 저지연 태국어 자동 음성 인식(ASR)을 위한 효율적인 스트리밍 솔루션을 개발하는 것을 목표로 합니다.

#Review #Thai ASR #Real-time Speech Recognition #FastConformer-Transducer #Low-latency #Text Normalization #Dialect Adaptation #Data Curation #Streaming ASR

2026년 1월 21일

[논문리뷰] The Responsibility Vacuum: Organizational Failure in Scaled Agent Systems

본 논문은 현대 AI 에이전트 기반 시스템에서 의사결정 처리량이 인간의 검증 역량을 초과할 때 발생하는 구조적인 책임 귀속 실패, 즉 책임 공백(Responsibility Vacuum) 현상을 정의하고 분석합니다. 의사결정에 대한 공식적 승인 권한과 이해 역량이 일치하지 않는 조직적 문제를 규명하는 것을 목표로 합니다.

#Review #Responsibility Vacuum #Scaled Agent Systems #Organizational Failure #CI/CD Pipelines #Human Verification Capacity #Authority-Capacity Mismatch #AI Governance #Ritualized Approval

2026년 1월 21일

[논문리뷰] RoboBrain 2.5: Depth in Sight, Time in Mind

본 논문은 기존 embodied AI foundation model의 2D pixel 기반 grounding 및 sparse temporal supervision의 한계를 극복하고, 정확한 3D 공간 추론(Precise 3D Spatial Reasoning) 과 밀집 시간 가치 예측(Dense Temporal Value Estimation) 능력을 통해 로봇의 물리적 상호작용 신뢰성과 실행 인지도를 향상시키는 것을 목표로…

#Review #Embodied AI #Foundation Model #3D Spatial Reasoning #Temporal Value Estimation #Robotics #Manipulation #Multimodal Learning

2026년 1월 21일

[논문리뷰] Rethinking Video Generation Model for the Embodied World

본 연구는 로봇 상호작용을 정확하게 반영하는 고품질 비디오 생성의 어려움을 해결하고, 표준화된 벤치마크 부족으로 인한 공정한 비교 및 발전의 한계를 극복하는 것을 목표로 합니다. 궁극적으로 로봇 학습 및 행동 예측을 위한 비디오 생성 모델의 실제 적용 가능성을 높이고, 신체화된 AI의 발전을 가속화하고자 합니다.

#Review #Video Generation #Embodied AI #Robotics Benchmark #RBench #Robotics Dataset #RoVid-X #Physical Plausibility #Task Completion

2026년 1월 21일

[논문리뷰] Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning

본 논문은 Chain-of-Thought (CoT) 프롬프팅의 지나친 장황함으로 인한 높은 연산 오버헤드 와 중간 추론 과정의 불투명성 문제를 해결하고자 합니다.

#Review #Chain-of-Thought (CoT)#Large Language Models (LLMs)#Vision Language Models (VLMs)#Latent Reasoning #Visual Modality #Image Rendering #Computational Efficiency #Knowledge Distillation

2026년 1월 21일