최신 포스트

[Triton] ConSan에 버퍼 aliasing 지원 추가 — 메모리 안전성 분석 강화

ConSan(Concurrency Sanitizer)에 BufferRegion 기반 aliasing 분석을 추가하여 겹치는 버퍼 간 동시성 버그를 감지한다

#Triton #ConSan #Aliasing #Memory Safety #Static Analysis

2025년 12월 11일

[Triton] WGMMA register pipelining에서 누락된 wait 삽입 수정

Persistent matmul epilogue에서 accumulator 접근 시 필요한 wgmma wait 누락 버그 수정

#Triton #NVIDIA #MLIR #Bug Fix #Pipelining

2025년 12월 11일

[Triton] MXFP4→BF16 변환에서 mul.bf16x2 강제 사용 — 1% MoE 성능 향상

LLVM 자동 벡터화 실패를 우회하여 ptxas가 HMUL2 명령어를 생성하도록 유도

#Triton #NVIDIA #Performance #PTX #Inline Assembly

2025년 12월 11일

[논문리뷰] WonderZoom: Multi-Scale 3D World Generation

본 논문은 단일 이미지로부터 다양한 공간 스케일에 걸쳐 일관된 3D 세계를 생성하는 다중 스케일 3D 세계 생성 의 핵심 문제를 해결하고자 합니다. 기존 3D 생성 모델들이 단일 스케일 합성에 국한되고 스케일 인식 3D 표현이 부족하여 상호작용적 탐색 및 콘텐츠 생성에 한계가 있다는 점을 극복하는 것을 목표로 합니다.

#Review #Multi-Scale 3D Generation #Gaussian Surfel #Progressive Synthesis #Neural Rendering #Scale-Adaptive #Content Creation #Zoom-in

2025년 12월 10일

[논문리뷰] VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory

본 논문은 AR(Autoregressive) 비디오 확산 모델의 고질적인 문제인 에러 누적, 모션 드리프트, 콘텐츠 반복 문제를 해결하여 분 단위 스케일의 장기적인 일관성 과 점진적인 동적 변화 를 동시에 유지하는 것을 목표로 합니다.

#Review #Autoregressive Video Generation #Diffusion Models #Hybrid Memory #State-Space Models (SSM)#Long Video Synthesis #Temporal Consistency #Interactive AI

2025년 12월 10일

[논문리뷰] UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

자율 주행 시스템이 제한된 세계 지식 과 시각적 동적 모델링 부족 으로 인해 롱테일 시나리오에서 겪는 어려움을 해결하는 것이 목표입니다.

#Review #Autonomous Driving #End-to-End Learning #Vision-Language Models #World Model #Chain-of-Thought #Video Generation #Trajectory Planning #Multimodal Learning

2025년 12월 10일

[논문리뷰] TED-4DGS: Temporally Activated and Embedding-based Deformation for 4DGS Compression

동적 3D Gaussian Splatting (4DGS) 표현을 위한 시간적으로 활성화되고 임베딩 기반의 변형(deformation) 스키마 를 개발하여, rate-distortion 최적화 압축 을 달성하는 것이 목표입니다.

#Review #4D Gaussian Splatting #Dynamic Scene Compression #Rate-Distortion Optimization #Temporal Activation #Embedding-based Deformation #Neural Compression #3D Gaussian Splatting

2025년 12월 10일

[논문리뷰] StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation

기존 단안 비디오 생성 모델의 스테레오 기능 부재 및 취약한 pose estimation/multi-stage warping 파이프라인으로 인한 스테레오 비디오 생성의 한계를 극복하는 것이 목표입니다.

#Review #Monocular-to-Stereo #Video Generation #Diffusion Models #Geometry-Aware #XR #IPD-aligned Dataset #Novel View Synthesis

2025년 12월 10일

[논문리뷰] Reinventing Clinical Dialogue: Agentic Paradigms for LLM Enabled Healthcare Communication

임상 대화에서 기존 LLM 의 반응적, 무상태적 특성 및 환각 문제의 한계를 극복하고, LLM 을 자율적인, 목표 지향적 시스템으로 전환하는 'Agentic Paradigm'을 제안합니다.

#Review #Clinical Dialogue #LLM Agents #Healthcare AI #Agentic Paradigm #Medical Decision Support #Knowledge Grounding #AI Safety #Workflow Automation

2025년 12월 10일

[논문리뷰] Pay Less Attention to Function Words for Free Robustness of Vision-Language Models

Vision-Language Model (VLM)의 견고성과 성능 간의 상충 관계를 해결하고, 특히 함수어(function words) 가 교차-모달 적대적 공격에 대한 VLM의 취약성을 유발한다는 가설을 검증하고자 합니다.

#Review #Vision-Language Models #Adversarial Robustness #Function Words #Cross-Attention #Adversarial Attacks #Differential Attention #Vision-Language Alignment

2025년 12월 10일

[논문리뷰] OmniPSD: Layered PSD Generation with Diffusion Transformer

본 논문은 기존 생성 모델의 한계인 단일 평면 이미지 출력 문제를 해결하고, 투명한 알파 채널을 포함하는 레이어드 PSD 파일 을 생성 및 재구성하는 통합 프레임워크인 OmniPSD 를 제안합니다.

#Review #Diffusion Transformer #PSD Generation #Image Decomposition #RGBA-VAE #In-Context Learning #Text-to-PSD #Image-to-PSD

2025년 12월 10일

[논문리뷰] Learning Unmasking Policies for Diffusion Language Models

마스킹된 이산 확산 언어 모델(dLLMs)에서 토큰 마스킹 해제(unmasking) 방식이 추론 효율성과 생성 품질에 중요한 영향을 미칩니다.

#Review #Diffusion Language Models #Reinforcement Learning #Masked Diffusion #Sampling Policy #Inference Optimization #Markov Decision Process #Generative AI #Text Generation

2025년 12월 10일

[논문리뷰] InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

본 연구는 기존 VLM의 이차적인 계산 복잡성과 증가하는 KV 캐시로 인한 장기 컨텍스트 이해 능력 및 배포 제약 문제를 해결하는 것을 목표로 합니다. 특히, 선형 어텐션의 정보 집약적 작업에서의 저조한 성능과 윈도우 기반 어텐션의 장기 기억 유지 부족이라는 한계를 극복하고자 합니다.

#Review #Vision-Language Models #Linear Attention #Sliding Window Attention #Gated DeltaNet #Long-Context Understanding #Efficiency #Hybrid Architecture #Multimodal Learning

2025년 12월 10일

[논문리뷰] IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting

본 연구는 주로 자연 이미지에 훈련된 Multimodal Large Language Models (MLLMs) 의 적외선 이미지 이해 능력이 미개척 상태임을 문제로 인식하고 있습니다.

#Review #Multimodal Large Language Models (MLLMs)#Infrared Image Understanding #Benchmark Dataset #Visual Question Answering (VQA)#Generative Visual Prompting (GenViP)#Domain Adaptation #Image-to-Image Translation

2025년 12월 10일

[논문리뷰] HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

대부분의 Vision-Language-Action (VLA) 모델이 Markov 속성을 가정하여 장기 태스크에서 temporal myopia 와 일관성 부족 을 겪는 문제를 해결하는 것이 목표입니다.

#Review #Vision-Language-Action #Motion Representation #Temporal Reasoning #Long-Horizon Manipulation #Hindsight #Foresight #Robotics

2025년 12월 10일

[논문리뷰] Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules

본 논문은 확산 언어 모델(dLLM)이 오토회귀 모델에 비해 가지는 잠재력에도 불구하고, 느리고 반복적인 샘플링 과정으로 인해 실용성이 저해되는 문제를 해결하고자 합니다.

#Review #Diffusion Language Models #Decoding Efficiency #Early Exit #Confidence Schedules #Training-free #Model-agnostic #Progress-aware

2025년 12월 10일

[논문리뷰] EtCon: Edit-then-Consolidate for Reliable Knowledge Editing

본 논문은 대규모 언어 모델(LLM)의 지식 편집 방법론이 제어된 환경에서는 높은 성능을 보이나, 실제 자율 회귀 생성 및 평생 학습 시나리오에서는 치명적인 실패를 겪는 문제를 해결하고자 합니다.

#Review #Knowledge Editing #Large Language Models #Lifelong Learning #Reinforcement Learning #Trust Region Policy Optimization #Chain-of-Thought #Catastrophic Forgetting

2025년 12월 10일

[논문리뷰] Composing Concepts from Images and Videos via Concept-prompt Binding

본 논문은 복잡한 시각적 개념(예: 스타일, 모션)을 이미지 및 비디오 입력에서 정확하게 추출하고, 이를 유연하게 조합하여 일관된 시각적 출력을 생성하는 문제를 해결하고자 합니다.

#Review #Visual Concept Composition #Diffusion Models #Text-to-Video Generation #Concept Binding #Hierarchical Binder #Diversify-and-Absorb Mechanism #Temporal Disentanglement #One-shot Learning

2025년 12월 10일

[논문리뷰] BrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human Brain

본 논문은 인간 뇌에서 시각적 개념 표현을 대규모로 발견하고 해석하는 자동화된 프레임워크인 BrainExplore 를 제안합니다. 기존 fMRI 연구의 소규모, 수동 분석 및 특정 영역 의존성의 한계를 극복하고, 방대한 시각적 개념 공간에서 정교하고 해석 가능한 뇌 활동 패턴 을 자동으로 식별하는 것을 목표로 합니다.

#Review #fMRI #Brain Mapping #Visual Representation #Interpretability #Sparse Autoencoders #Vision-Language Models #Unsupervised Learning #Neuroscience

2025년 12월 10일

[논문리뷰] Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS

경량화된 실시간 TTS 시스템에서 문맥 인지 phonemization의 품질과 추론 속도 간의 근본적인 트레이드오프를 해결하는 것이 목표입니다.

#Review #TTS #Phonemization #G2P #Low Latency #Real-time #Service-Oriented Architecture #Context-Aware #Persian Language

2025년 12월 10일