최신 포스트

[triton] AMD GPU에서 Block Scaled Matmul 지원 추가

Triton의 block scaled matrix multiplication 튜토리얼에 AMD CDNA4 GPU 지원을 추가하고, 스케일 프리셔플링 로직을 문서화한 PR 분석.

#Triton #AMD #CDNA4 #MatMul #MXFP #GPU

2025년 11월 19일

[Loki] 인메모리 레이트 트래커로 UpdateRates RPC 구현

Grafana Loki의 인제스트 리미터에서 UpdateRates RPC를 순환 버퍼 기반 인메모리 레이트 트래커로 구현하여, 스트림별 속도 제한의 기반을 마련한 PR을 분석합니다.

#Grafana Loki #Rate Limiting #Go #In-Memory #Circular Buffer

2025년 11월 19일

[ultralytics] Ultralytics 8.3.229: COCO Segmentation 평가 300% 가속화 분석

외부 라이브러리 의존성을 제거하고 PyTorch 기반의 최적화된 RLE 인코딩 및 마스크 스케일링을 도입하여 성능을 3배 향상시킨 사례를 분석합니다.

#Ultralytics #YOLO #Optimization #PyTorch #ComputerVision

2025년 11월 18일

[논문리뷰] Φeat: Physically-Grounded Feature Representation

기존의 자기 지도 시각 백본이 고수준의 의미론적 특징과 저수준의 물리적 요소를 혼합하여 물리적 추론을 방해하는 문제를 해결하고자 합니다.

#Review #Self-supervised Learning #Physically-Grounded Features #Material Representation #Intrinsic Scene Understanding #Vision Transformer #Synthetic Data #Contrastive Learning

2025년 11월 18일

[논문리뷰] VIDEOP2R: Video Understanding from Perception to Reasoning

기존 비디오 RFT 프레임워크가 인식(perception)과 추론(reasoning) 과정을 단일 절차로 처리하여 신용 할당(credit assignment)이 모호해지고 오류 수정 효율성이 떨어진다는 문제를 해결하고자 합니다.

#Review #Video Understanding #Reinforcement Fine-Tuning (RFT)#Large Video Language Models (LVLMs)#Perception and Reasoning #Chain-of-Thought (CoT)#Process-Aware Learning #Policy Optimization #Credit Assignment

2025년 11월 18일

[논문리뷰] TopoPerception: A Shortcut-Free Evaluation of Global Visual Perception in Large Vision-Language Models

Large Vision-Language Models (LVLMs)가 시각적 인코더의 정보 병목 현상 과 로컬 단축키 로 인해 전역 시각 정보를 제대로 인지하지 못하는 문제를 해결하는 것이 목표입니다.

#Review #LVLM Evaluation #Global Visual Perception #Topological Properties #Shortcut-Free Benchmark #Visual Bottleneck #Multimodal AI #Synthetic Data

2025년 11월 18일

[논문리뷰] REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

본 논문은 기존 텍스트 기반 자기 성찰(self-reflection) 메커니즘 이 풍부하고 동적인 시각 정보를 처리하는 데 한계가 있어, 장문 비디오 이해(long-form video understanding) 태스크에서 성능 저하를 겪는 문제를 해결하고자 합니다.

#Review #Multimodal Reasoning #Long-Form Video Understanding #Self-Reflection #Reinforcement Learning #Tool-Augmented MLLMs #Visual Rethinking #Video Question Answering #Causal Attribution

2025년 11월 18일

[논문리뷰] Proactive Hearing Assistants that Isolate Egocentric Conversations

본 논문은 사용자의 명시적인 프롬프트 없이도 대화 상대를 자동으로 식별하고 분리하여 다른 방해 음성을 억제하는 선제적(proactive) 보청 보조 장치 를 개발하는 것을 목표로 합니다. 이는 복잡한 다자간 대화 환경에서 실시간으로 작동하며, 착용자의 자율적인 대화 참여를 지원하는 데 중점을 둡니다.

#Review #Proactive Hearing Assistant #Egocentric Audio Processing #Speech Separation #Turn-taking Dynamics #Dual-Model Architecture #Real-time Inference #Wearable Devices #Dialogue Modeling

2025년 11월 18일

[논문리뷰] Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution

본 논문은 기존의 단일(monolithic) VLM(Vision-Language Model)이 가진 정밀성, 결정론적 제어 및 복합적 시각 작업 처리 능력의 한계를 극복하고자 합니다.

#Review #Visual Agent #Multimodal Perception #Tool-Augmented LLM #Agentic AI #Visual Reasoning #Computer Vision #Structured Outputs #ReAct Framework

2025년 11월 18일

[논문리뷰] OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

옴니모달 대규모 언어 모델(OmniLLMs)이 직면한 오디오-비디오 토큰의 과도한 수 와 주의 메커니즘의 2차 복잡성 으로 인한 계산 및 메모리 병목 현상 을 해결하는 것을 목표로 합니다. 특히, 기존의 단일 모달 압축 방법으로는 멀티모달 토큰의 공동 압축 요구사항을 충족하기 어렵다는 문제를 해결하고자 합니다.

#Review #Omnimodal LLMs #Token Compression #Audio-Video Understanding #Dynamic Pruning #Inference Acceleration #Spatio-Temporal Compression #Large Language Models

2025년 11월 18일

[논문리뷰] Mitigating Label Length Bias in Large Language Models

논문은 대규모 언어 모델(LLMs)이 다중 토큰 클래스 레이블을 예측할 때 발생하는 '레이블 길이 편향(label length bias)' 문제를 해결하는 것을 목표로 합니다.

#Review #Large Language Models #Label Bias #Calibration #In-Context Learning #Text Classification #Multi-token Labels #Label Length Bias #Multiple Choice QA

2025년 11월 18일

[논문리뷰] MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs

기존 Large Vision-Language Models (LVLMs) 강건성 벤치마크들이 환각이나 오해의 소지가 있는 텍스트 입력에만 집중하고, 시각적 이해 평가에서 오해의 소지가 있는 시각적 입력 을 간과하는 문제를 해결하는 것이 목표입니다.

#Review #LVLM Robustness #Misleading Visual Inputs #VQA Benchmark #Visual Perception #Visual Reasoning #MVI-Sensitivity #Multimodal AI

2025년 11월 18일

[논문리뷰] Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework

본 연구는 Extreme Multi-label Classification (XMC)에서 Large Language Models (LLMs) 의 잠재력을 효과적으로 활용하고, 시각적 정보 를 효율적으로 통합하여 성능을 향상하는 것을 목표로 합니다.

#Review #Extreme Multi-label Classification (XMC)#Large Language Models (LLMs)#Multi-modal Learning #Dual-decoder Learning #Vision Transformers #Contrastive Learning #Prompt Engineering

2025년 11월 18일

[논문리뷰] LLM-Powered Fully Automated Chaos Engineering: Towards Enabling Anyone to Build Resilient Software Systems at Low Cost

본 논문은 카오스 엔지니어링(CE)의 수동적이고 노동 집약적인 단계(가설 설정, 실험 계획, 시스템 재구성)를 자동화하여, 누구나 저비용으로 탄력적인 소프트웨어 시스템을 구축할 수 있도록 하는 것을 목표로 합니다.

#Review #Chaos Engineering #Large Language Models #System Resilience #Kubernetes #Software Automation #AI Agents #Fault Injection

2025년 11월 18일

[논문리뷰] Error-Driven Scene Editing for 3D Grounding in Large Language Models

본 논문은 현재 3D-LLMs 가 3D 환경에서 언어를 시각적 및 공간적 요소에 정확하게 연결하지 못하는 문제점을 해결하고자 합니다.

#Review #3D Grounding #3D-LLMs #Scene Editing #Counterfactual Augmentation #Error-Driven Learning #Spatial Reasoning #Visual Grounding

2025년 11월 18일

[논문리뷰] Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark

본 논문은 최신 비디오 생성 모델 이 단순한 시각적 품질을 넘어 실제 세계의 물리 법칙과 연속성을 이해하며 추론하는 Chain-of-Frames (CoF) 추론 능력 을 체계적으로 평가할 수 있는 벤치마크의 부재를 해결하는 것을 목표로 합니다.

#Review #Generative Visual Reasoning #Chain-of-Frames (CoF)#Video Generation Models #World Simulators #AI Benchmarking #Cognitive Reasoning #VLM Evaluation

2025년 11월 18일

[논문리뷰] AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models

본 연구는 기존 아랍어 대규모 언어 모델(LLM) 평가 벤치마크들이 사실적 지식과 일반 추론에 치중하여 심층적인 언어학적 이해도 를 제대로 측정하지 못하는 문제를 해결하고자 합니다.

#Review #Arabic LLMs #Linguistic Benchmark #Human Annotation #Natural Language Understanding #Grammar Evaluation #Morphology Analysis #Syntax Assessment #Reading Comprehension

2025년 11월 18일

[논문리뷰] Agent READMEs: An Empirical Study of Context Files for Agentic Coding

본 연구는 AI 코딩 에이전트의 작동 방식을 정의하고 안내하는 에이전트 컨텍스트 파일(Agent Context Files) 에 대한 체계적인 이해가 부족한 문제를 해결하고자 합니다.

#Review #Agentic Coding #Context Files #READMEs for Agents #Empirical Study #Software Engineering #Documentation Maintenance #Non-functional Requirements #LLMs

2025년 11월 18일

[논문리뷰] Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

본 논문은 대규모 언어 모델(LLMs)을 복잡한 다중 턴(multi-turn) 상호작용 태스크를 수행하는 에이전트로 훈련시키기 위한 강화 학습(RL)의 효과적인 적용 방안 을 모색합니다.

#Review #LLM Agents #Reinforcement Learning #Markov Decision Process #Tool Use #Multi-turn Interaction #Policy Optimization #Reward Shaping #Agent Framework

2025년 11월 18일

[논문리뷰] ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

기존 벤치마크의 성능 포화 , 협소한 분야 집중 , 단순화된 답변 형식 , 그리고 데이터 오염 문제로 인해 최신 대규모 언어 모델(LLMs)의 진정한 역량을 평가하기 어렵다는 문제를 해결하고자 합니다.

#Review #Benchmark #LLMs #Scientific Reasoning #Multidisciplinary #AI4S #Data Contamination #Evaluation #LRM-as-Judge

2025년 11월 18일