#Multi-stage Training

4개의 포스트

[논문리뷰] VIBE: Visual Instruction Based Editor

본 논문은 기존의 대규모 및 고비용 이미지 편집 모델의 한계를 극복하고, 오픈소스 기반의 초고속, 컴팩트한 시각적 지시 기반 이미지 편집 시스템을 개발하는 것을 목표로 합니다.

#Review #Instruction-Based Image Editing #Diffusion Models #Vision-Language Models (VLM)#Model Efficiency #Multi-stage Training #Preference Alignment #Source Consistency

2026년 1월 15일

[논문리뷰] Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

기존 래스터 이미지 편집 시 발생하는 일관성 문제(semantic drift, geometric misalignment)를 해결하는 것이 목표입니다.

#Review #Image Editing #Diffusion Models #Layer Decomposition #RGBA Layers #Variational Autoencoder (VAE)#Multi-stage Training #Photoshop Documents (PSD)#Inherent Editability

2025년 12월 17일

[논문리뷰] F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

본 논문은 동적인 시각 환경에서 언어 조건부 태스크를 실행하는 로봇의 한계를 극복하고자 합니다.

#Review #Vision-Language-Action #Embodied AI #Visual Foresight #Predictive Inverse Dynamics #Mixture-of-Transformer #Robot Manipulation #Multi-stage Training #Generalization

2025년 9월 10일

[논문리뷰] LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling

현재의 Reward Model (RM)은 주로 짧은 컨텍스트에 국한되며 응답의 유용성이나 안전성과 같은 표면적인 속성에만 집중하고 있습니다.

#Review #Reward Model #Long Context #LLM Alignment #Multi-stage Training #Context Window Scaling #Preference Learning #Long-RewardBench

2025년 10월 10일