#Visual Representation

4개의 포스트

[논문리뷰] BrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human Brain

본 논문은 인간 뇌에서 시각적 개념 표현을 대규모로 발견하고 해석하는 자동화된 프레임워크인 BrainExplore 를 제안합니다. 기존 fMRI 연구의 소규모, 수동 분석 및 특정 영역 의존성의 한계를 극복하고, 방대한 시각적 개념 공간에서 정교하고 해석 가능한 뇌 활동 패턴 을 자동으로 식별하는 것을 목표로 합니다.

#Review #fMRI #Brain Mapping #Visual Representation #Interpretability #Sparse Autoencoders #Vision-Language Models #Unsupervised Learning #Neuroscience

2025년 12월 10일

[논문리뷰] TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

논문은 멀티모달 이해와 생성 태스크를 단일 프레임워크 내에서 원활하게 수행하는 TUNA라는 네이티브 통합 멀티모달 모델(UMM) 을 개발하는 것을 목표로 합니다. 기존 UMM의 분리된 또는 편향된 시각 표현 방식 으로 인한 한계를 극복하고, 이해와 생성 모두에 효과적인 통합된 연속 시각 표현 공간 을 구축하고자 합니다.

#Review #Unified Multimodal Models #Visual Representation #VAE #Flow Matching #Multimodal Understanding #Multimodal Generation #Image Editing #State-of-the-Art

2025년 12월 1일

[논문리뷰] VLA-4D: Embedding 4D Awareness into Vision-Language-Action Models for SpatioTemporally Coherent Robotic Manipulation

본 논문은 기존 VLA 모델이 겪는 공간-시간적 불연속성(spatiotemporally discontinuous) 및 미세한 제어 부족 문제를 해결하여, 로봇 조작을 위한 공간-시간적으로 일관성 있는(spatiotemporally coherent) VLA 모델인 VLA-4D 를 제안합니다.

#Review #Vision-Language-Action Models #Robotic Manipulation #SpatioTemporal Coherence #4D Awareness #Visual Representation #Action Representation #Cross-Attention

2025년 11월 23일

[논문리뷰] VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

본 논문은 에이전트 시대의 추론 및 행동을 위한 시각 중심 코딩의 미개척 영역을 탐구합니다. 기존 RGB 픽셀 기반 이미지 표현의 제한된 상징적 추상화를 넘어서, 이미지를 SVG 코드 와 같은 압축적이고 해석 가능하며 실행 가능한 시각적 표현으로 변환하는 것을 목표로 합니다.

#Review #Multimodal AI #Code Generation #SVG #Visual Representation #Benchmark #Large Vision-Language Models #Agentic AI #Reasoning

2025년 11월 9일