#Visual Prompts

3개의 포스트

[논문리뷰] VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation

본 논문은 기존의 LLM 기반 비디오 이해 모델들이 겪는 공간적·시간적 참조의 모호성 문제를 해결하기 위해 VideoSeeker를 제안한다.

#Review #Large Vision-Language Models #Instance-level Video Understanding #Visual Prompts #Agentic Tool Invocation #Reinforcement Learning #Data Synthesis Pipeline

2026년 5월 18일

[논문리뷰] FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching

본 논문은 기존 multimodal generation이 언어 모델 중심의 파이프라인에 의존하여 vision의 자체적인 추론 및 생성 능력이 제한되는 문제를 해결하고자 한다.

#Review #Multimodal Generation #Flow Matching #Visual Prompts #Image-in Image-out #Visual Instruction Following #VisPrompt-5M #VP-Bench

2026년 4월 8일

[논문리뷰] Exploring Conditions for Diffusion models in Robotic Control

본 논문은 사전 훈련된 텍스트-투-이미지 diffusion 모델 을 로봇 제어에 활용하여 태스크 적응형 시각 표현 을 얻는 것을 목표로 합니다.

#Review #Diffusion Models #Robotic Control #Imitation Learning #Task-Adaptive Representations #Visual Prompts #Text-to-Image #Conditioning #Behavior Cloning

2025년 10월 31일