최신 포스트

[논문리뷰] NaTex: Seamless Texture Generation as Latent Color Diffusion

본 논문은 기존 Multi-View Diffusion (MVD) 모델의 텍스처 생성 시 발생하는 occlusion 처리 미흡, 정밀한 메시-텍스처 정렬 난이도, 크로스-뷰 일관성 문제와 같은 한계를 해결하고자 합니다.

#Review #3D Texture Generation #Latent Diffusion Model #Geometry-Aware VAE #Multi-Control DiT #Color Point Cloud #Texture Synthesis #3D Asset Creation

2025년 11월 20일

[논문리뷰] MiMo-Embodied: X-Embodied Foundation Model Technical Report

이 논문은 자율 주행(Autonomous Driving)과 인공지능(Embodied AI) 두 가지 핵심 도메인을 단일 모델 로 통합하는 최초의 오픈소스 크로스-엠바디드 파운데이션 모델(MiMo-Embodied) 을 개발하는 것을 목표로 합니다.

#Review #Vision-Language Model (VLM)#Embodied AI #Autonomous Driving #Foundation Model #Multimodal Learning #Task Planning #Affordance Prediction #Spatial Understanding #Reinforcement Learning

2025년 11월 20일

[논문리뷰] First Frame Is the Place to Go for Video Content Customization

비디오 생성 모델에서 여러 참조 이미지를 활용한 유연한 콘텐츠 맞춤화 시, 아키텍처 변경 이나 대규모 파인튜닝 없이도 일반화된 성능을 유지 하는 방법을 모색하는 것이 주된 목표입니다. 기존 모델들이 가진 '첫 프레임'의 잠재적인 역할을 재해석하여, 이를 시각적 엔티티를 저장하는 개념적 메모리 버퍼 로 활용하고자 합니다.

#Review #Video Generation #Content Customization #Few-shot Learning #LoRA #Vision-Language Models (VLMs)#First Frame Conditioning #Reference-based Generation

2025년 11월 20일

[논문리뷰] Draft and Refine with Visual Experts

최신 Large Vision-Language Models (LVLMs) 는 시각적 증거보다 언어적 사전 지식에 과도하게 의존하여 근거 없는 환각(hallucination)을 자주 생성합니다.

#Review #Large Vision-Language Models (LVLMs)#Visual Grounding #Hallucination Mitigation #Agent Framework #Visual Question Answering (VQA)#Expert Coordination #Relevance Map #Multi-modal Reasoning

2025년 11월 20일

[triton] AMD 비동기 복사에서 block 차원 중복 복사 허용

AMD GPU의 async_copy_global_to_local에서 block 차원의 redundant copy를 허용하여, multi-CTA 환경에서 각 CTA가 자신의 shared memory에 데이터를 올바르게 복사하도록 수정한 PR을 분석합니다.

#Triton #AMD GPU #Async Copy #Multi-CTA

2025년 11월 20일

[Ray] iter_batches에서 프리페치 버퍼링을 올바르게 처리하여 지연시간 안정화

iter_batches의 큐 깊이를 프리페치 수에 맞추고, 포맷 스레드풀 워커 수를 제한하여 배치 소비 지연시간의 변동을 줄인 최적화 분석.

#Ray #Python #Performance #Prefetch #Latency #Data Pipeline

2025년 11월 20일

[Ray] Ray 대규모 리소스 뷰 동기화 -- 메시지 배칭으로 개선

RaySyncer의 gRPC 스트리밍에 메시지 배칭을 도입해 대규모 클러스터의 리소스 동기화 효율을 높이다

#Ray #Distributed Systems #gRPC #Performance

2025년 11월 20일

[논문리뷰] What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity

AI 연구 에이전트의 성능에 있어 아이디어 다양성(ideation diversity)이 핵심 병목 현상인지를 규명하고, 에이전트 궤적의 성공 또는 실패를 좌우하는 요인을 이해하는 것을 목표로 합니다.

#Review #AI Research Agents #Ideation Diversity #MLE-bench #LLM Backbones #Agentic Scaffolds #Shannon Entropy #Machine Learning Engineering #Performance Metrics

2025년 11월 19일

[논문리뷰] VisPlay: Self-Evolving Vision-Language Models from Images

본 논문은 인간 주석이나 작업별 휴리스틱 없이, 대규모 비정형 이미지 데이터로부터 Vision-Language Models (VLMs) 의 추론 능력을 자율적으로 개선하는 것을 목표로 합니다. 기존 강화 학습(RL) 방식이 지닌 비용과 확장성 한계를 극복하고자 합니다.

#Review #Self-Evolving #Vision-Language Models #Reinforcement Learning #Self-Play #Unlabeled Data #Multimodal Reasoning #Group Relative Policy Optimization #Hallucination Mitigation

2025년 11월 19일

[논문리뷰] Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

본 논문은 비디오 모델의 추론 능력, 특히 비디오 생성 을 통한 추론 능력을 체계적으로 평가하기 위한 포괄적인 벤치마크의 부재를 해결합니다.

#Review #Video Models #Spatial Reasoning #Maze Solving #Video Generation #Benchmark #Supervised Fine-tuning #Test-Time Scaling #Multimodal Reasoning

2025년 11월 19일

[논문리뷰] Mixture of States: Routing Token-Level Dynamics for Multimodal Generation

본 논문은 멀티모달 확산 모델에서 텍스트 및 시각 신호의 효과적인 정렬 문제를 해결하고자 합니다.

#Review #Multimodal Diffusion #Mixture of States (MoS)#Token-Level Routing #Dynamic Conditional Fusion #Text-to-Image Generation #Image Editing #Transformer Architecture

2025년 11월 19일

[논문리뷰] Medal S: Spatio-Textual Prompt Model for Medical Segmentation

의료 영상 분할에서 다양한 모달리티와 해부학적 변이로 인한 문제를 해결하고, 기존 모델의 해상도 불일치 및 순차 처리 비효율성을 극복하는 것이 목표입니다.

#Review #Medical Segmentation #Foundation Model #Spatio-Textual Prompts #3D Convolution #Multi-modal Imaging #Dynamic Resampling #Parallel Inference #Iterative Refinement

2025년 11월 19일

[논문리뷰] MHR: Momentum Human Rig

본 논문은 ATLAS 모델의 골격/형상 분리 패러다임 에 Momentum 라이브러리에서 영감을 받은 유연하고 현대적인 리그 및 자세 보정 시스템을 결합하여, 산업 및 AR/VR 파이프라인에 통합 가능한 표현력 있고 해부학적으로 타당한 파라메트릭 인체 모델(MHR) 을 제안합니다.

#Review #Parametric Body Model #Human Animation #Character Rigging #Pose Correctives #Skeletal Decoupling #Computer Graphics #AR/VR

2025년 11월 19일

[논문리뷰] Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

본 논문은 고품질의 일관되고 제어 가능한 이미지 및 비디오 생성을 위한 AI/ML 분야의 핵심 과제를 해결하고자 합니다. 특히, 최신 이미지 및 10초 비디오 합성을 위한 Kandinsky 5.0 이라는 최첨단 파운데이션 모델 제품군을 개발하여 최고 수준의 품질과 운영 효율성을 달성하는 것을 목표로 합니다.

#Review #Image Generation #Video Generation #Diffusion Models #Flow Matching #Diffusion Transformer #NABLA #RLHF #Supervised Fine-tuning

2025년 11월 19일

[논문리뷰] Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset

본 연구는 흉부 X-ray(CXR)에서 병변 분할 모델의 제한적인 타겟 레이블 수와 전문가 수준의 상세 텍스트 입력 의존성을 해결하고자 합니다.

#Review #Medical Imaging #Chest X-ray #Lesion Segmentation #Vision-Language Models #Instruction Following #Data Generation #MIMIC-CXR

2025년 11월 19일

[논문리뷰] FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI

본 논문은 기존 VLN(Vision-and-Language Navigation) 시스템의 정적인 지시, 사회적 의도 모델링 부족, 비현실적인 상호작용 환경 등의 한계를 극복하고자 합니다.

#Review #Embodied AI #Vision-and-Language Navigation (VLN)#LLM-driven Simulation #Human-Agent Interaction #Closed-Loop #Benchmark Dataset #Social Cognition

2025년 11월 19일

[논문리뷰] Aligning Generative Music AI with Human Preferences: Methods and Challenges

본 논문은 생성형 음악 AI 시스템이 계산적 최적화와 인간의 미적 감각 사이의 근본적인 격차로 인해 발생하는 문제를 해결하고, 인간의 미묘한 음악적 선호도에 더욱 잘 부합하도록 정렬하는 방법을 모색합니다.

#Review #Generative Music AI #Preference Alignment #Reinforcement Learning from Human Feedback (RLHF)#Direct Preference Optimization (DPO)#Inference-Time Optimization #Music Generation #Human-Computer Interaction

2025년 11월 19일

[논문리뷰] ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries

본 논문은 기존 비디오 챕터링 방법론이 짧고 거친 주석에 의해 제한되어 장시간 비디오의 미묘한 전환에 대한 일반화가 어렵다는 문제를 해결하고자 합니다.

#Review #Video Chaptering #Long-form Video Understanding #Large Language Models #Multimodal Learning #Hierarchical Summarization #Video Segmentation #Reinforcement Learning #Dataset Creation

2025년 11월 19일

[triton] tl.cat 연산을 permute+reshape+join으로 재구현하여 결정적(deterministic) 동작 보장

Triton의 tl.cat 연산에서 CatOp을 제거하고 permute, reshape, join 조합으로 대체하여 결정적 결과를 보장하는 변경 분석.

#Triton #Compiler #MLIR #Tensor Operations #Determinism

2025년 11월 19일

[Triton] AMD CI에 pip 캐시 디렉토리 도입 — 네트워크 장애 대응

AMD GPU CI 환경에서 pip 캐시 디렉토리를 사용하여 네트워크 지연에 의한 빌드 실패를 방지한다

#Triton #AMD #CI/CD #GitHub Actions #DevOps

2025년 11월 19일