최신 포스트

[논문리뷰] Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

언어 모델 아키텍처 간의 성능 차이를, 특히 학술 규모의 사전 훈련에서 발생하는 높은 노이즈와 비용 문제 없이 신뢰성 있게 평가하고 이해하는 것을 목표로 합니다.

#Review #Language Models #Transformer Architecture #Canon Layers #Synthetic Pretraining #Reasoning Depth #Linear Attention #State-Space Models #NoPE

2025년 12월 21일

[논문리뷰] PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

본 연구는 시점 불일치 문제로 인해 로봇 일반화에 한계가 있는 기존 VLM(Vision-Language Model)의 단점을 해결하고자 합니다.

#Review #Egocentric Data #Physical Intelligence #VLM #Robot Control #Embodied AI #VQA Supervision #Human-Robot Interaction #Zero-shot Transfer

2025년 12월 21일

[논문리뷰] Meta-RL Induces Exploration in Language Agents

본 논문은 기존 강화 학습(RL) 기반의 대규모 언어 모델(LLM) 에이전트가 환경에서 능동적인 탐색과 시행착오 경험으로부터 효율적인 정책 적응에 어려움을 겪는 문제를 해결하고자 합니다.

#Review #Meta-RL #LLM Agents #Exploration #Reinforcement Learning #Policy Adaptation #In-context Learning #Self-reflection #Multi-episode tasks

2025년 12월 21일

[논문리뷰] HERBench: A Benchmark for Multi-Evidence Integration in Video Question Answering

기존 VideoQA 벤치마크가 단일 단서나 언어 사전 지식에 의존하는 경향이 있어 다중 증거 통합 능력을 제대로 평가하지 못하는 문제를 해결하고자 합니다.

#Review #Video Question Answering #Multi-evidence Integration #Video-LLMs #Benchmark #Temporal Reasoning #Frame Selection #Evidential Requirement #MRFS

2025년 12월 21일

[논문리뷰] GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

본 연구는 기존 벤치마크에서 MLLM(Multimodal Large Language Models)이 달성한 높은 성능에도 불구하고, 인간과 유사한 시각적 접지(visual grounding) 능력 을 실제 복잡한 시나리오에서 갖추고 있는지 근본적인 질문을 던집니다.

#Review #Visual Grounding #MLLMs #Benchmark #Multi-Dimensional Evaluation #Rejection Capability #Test-Time Scaling #Data Mixture Training

2025년 12월 21일

[논문리뷰] Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing

본 논문은 최신 Latent Diffusion Models (LDMs)가 주로 픽셀 수준 재구성에 최적화된 저수준 Variational Autoencoder (VAE) 잠재 공간 을 사용하는 한계를 지적합니다.

#Review #Text-to-Image Generation #Image Editing #Representation Encoders #Latent Diffusion Models #Variational Autoencoder (VAE)#Semantic Reconstruction #Off-manifold Latents #DINOv2

2025년 12월 21일

[논문리뷰] Are We on the Right Way to Assessing LLM-as-a-Judge?

본 논문은 현재 LLM-as-a-Judge 평가 방법론이 인간 주석에 과도하게 의존하여 발생하는 편향, 불일치성, 확장성 문제를 해결하고자 합니다.

#Review #LLM-as-a-Judge #Evaluation Metrics #Consistency #Robustness #Positional Bias #Transitivity #Situational Preference #Multi-agent Systems

2025년 12월 21일

[논문리뷰] An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

본 논문은 급변하는 Vision-Language-Action (VLA) 모델 분야에 대한 명확하고 구조화된 가이드를 제공하는 것을 목표로 합니다.

#Review #Vision-Language-Action Models #Embodied Intelligence #Robotics #Foundation Models #Multi-modal Learning #Reinforcement Learning #Sim-to-Real Transfer #Human-Robot Interaction

2025년 12월 21일

[논문리뷰] 4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

본 논문은 기존 MLLM이 3D 구조와 시간적 역학(4D)을 추론하는 능력이 부족하며, 특히 4D 인지 및 시간적 이해 가 약하다는 문제를 해결하고자 합니다.

#Review #Multimodal LLMs #4D Understanding #Perceptual Distillation #Region-level VQA #Video Question Answering #Temporal Perception #Depth Perception

2025년 12월 21일

[논문리뷰] 3D-RE-GEN: 3D Reconstruction of Indoor Scenes with a Generative Framework

본 논문은 단일 2D 이미지로부터 시각 효과(VFX) 및 게임 개발에 즉시 활용 가능한, 수정 가능한 생산 준비 완료(production-ready) 3D 텍스처 메시 장면 을 재구성하는 것을 목표로 합니다.

#Review #3D Reconstruction #Generative AI #Indoor Scenes #Compositional Framework #Differentiable Rendering #Image-to-3D #VFX #Game Development

2025년 12월 21일

[Open WebUI] FileMetadataResponse의 meta 필드를 Optional로 변경하여 배치 추가 오류 수정

Open WebUI에서 메타데이터가 없는 파일을 Knowledge에 배치 추가할 때 Pydantic 유효성 검사 오류가 발생하던 문제를, meta 필드를 Optional로 변경하여 수정한 버그 픽스를 분석합니다.

#Open WebUI #Python #Pydantic #Bug Fix #Data Validation

2025년 12월 20일

[triton] Triton에서 cuBLAS를 활용한 mxfp8 및 nvfp4 블록 스케일 행렬 곱셈 벤치마킹

Triton의 블록 스케일 행렬 곱셈 성능을 검증하기 위해 cuBLAS 기반의 베이스라인을 도입하고 튜토리얼을 개선했습니다.

#Triton #cuBLAS #mxfp8 #nvfp4 #Performance

2025년 12월 19일

[Loki] Partition Ring Shuffle Sharding에 LRU 캐시 도입

dskit 업데이트로 partition ring shuffle shard 캐시에 LRU 기반 바운디드 메모리 관리 추가.

#Grafana Loki #Go #Performance #Memory Management #Caching

2025년 12월 19일

[triton] Triton AMD 백엔드 최적화: Subtiling을 통한 GEMM 성능 향상

AMD GPU 환경에서 Subtiling 기법을 도입하여 공유 메모리 사용량을 줄이고 레지스터 스필을 제거한 GEMM 최적화 분석.

#Triton #AMD #GEMM #GPU #Optimization

2025년 12월 19일

[Triton] TMA multicast 지원 추가

2025년 12월 20일

[triton] Triton PROTON: CUDA 그래프 프로파일링 오버헤드를 줄이고 MsgPack API를 추가하여 성능을 대폭 개선

Triton PROTON 라이브러리의 CUDA 그래프 프로파일링 오버헤드를 줄이고 MsgPack 직렬화 API를 추가하여 성능을 3배~10배 향상시킨 코드 변경 분석.

#Triton #PROTON #CUDA #Profiling #Optimization #MsgPack #C++#Python

2025년 12월 19일

[Ray Data] StreamingRepartition과 MapBatches 퓨전 규칙 개선

batch_size가 target_num_rows의 배수일 때 연산자 퓨전 허용으로 중간 물질화 제거

#Ray #Operator Fusion #Data Pipeline #Performance

2025년 12월 19일

[Grafana Loki] 스케줄러 Peer 연결 미종료로 인한 메모리 누수 수정

streamSink 종료 시 Peer 연결을 닫지 않아 반대편 워커의 Serve()가 영원히 반환되지 않던 메모리 누수를 defer conn.Close()로 해결한 분석.

#Grafana Loki #Go #Memory Leak #Distributed Systems #gRPC

2025년 12월 19일

[triton] CGAEncodingAttr::getDefault를 get1CTALayout/get1DLayout로 분리하여 multi-CTA 지원

1CTA 전용이던 getDefault 함수를 명확한 이름의 두 함수로 분리하고, multi-CTA 환경에서의 coalesce 유틸리티를 수정한 분석.

#Triton #MLIR #CGA #Multi-CTA #Encoding #Compiler

2025년 12월 18일

[Triton] ConSan에서 barrier 다중 도착 시 false positive deadlock 감지 수정

barrier_expect를 arrive로 모델링하여 여러 TMA copy가 같은 barrier를 공유할 때 발생하는 오탐 deadlock 해결

#Triton #ConSan #Concurrency Sanitizer #Bug Fix #TMA

2025년 12월 19일