#Zero-shot Learning

21개의 포스트

[논문리뷰] IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools

본 논문은 MLLM의 강력한 제로샷 성능에도 불구하고, 고도의 정밀함이 요구되는 산업 환경에서 도메인 불일치 및 구조적 환각(structural hallucination)으로 인해 발생하는 이상 탐지 성능 저하 문제를 해결합니다 .

#Review #Industrial Anomaly Detection #Multimodal Large Language Models #Agentic Framework #Reinforcement Learning #Tool Augmentation #Zero-shot Learning

2026년 5월 20일

[논문리뷰] Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video

본 논문은 대규모 카메라 주석 데이터셋이나 복잡한 아키텍처 수정 없이, 사전 학습된 비디오 생성 모델의 잠재적 카메라 제어 능력을 활용하는 효율적인 방법을 제안합니다.

#Review #Video Generation #Camera Control #History Conditioning #LoRA #Zero-shot Learning

2026년 5월 14일

[논문리뷰] SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing

현재 instruction-guided video editing models은 fine-grained semantic modifications와 faithful motion preservation 간의 균형을 맞추는 데 어려움을 겪고 있습니다.

#Review #Instruction-Guided Video Editing #Diffusion Models #Semantic Anchoring #Motion Alignment #Factorized Pre-training #Zero-shot Learning #Temporal Consistency

2026년 3월 19일

[논문리뷰] DVD: Deterministic Video Depth Estimation with Generative Priors

기존 비디오 Depth Estimation 방법론은 근본적인 Trade-off에 직면해 있습니다.

#Review #Video Depth Estimation #Generative Priors #Deterministic Adaptation #Diffusion Models #Latent Manifold Rectification #Global Affine Coherence #Zero-shot Learning #Temporal Consistency

2026년 3월 12일

[논문리뷰] Reward Prediction with Factorized World States

본 연구는 AI 에이전트가 새로운 목표와 환경에 걸쳐 일반화할 수 있는 정확하고 일반화 가능한 보상 예측 모델 을 개발하는 것을 목표로 합니다. 특히 훈련 데이터의 편향과 일반화 한계가 있는 기존 지도학습 기반 보상 모델의 문제를 해결하고, 미세한 단계별 보상 평가를 위한 벤치마크 부족을 해소하고자 합니다.

#Review #Reward Prediction #World Models #State Representation #Large Language Models #Zero-shot Learning #Reinforcement Learning #Planning #Factorization

2026년 3월 10일

[논문리뷰] ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors

본 연구는 3D/4D 감독 없이 물리적으로 그럴듯한 관절형 인간-객체 상호작용(HOI)을 합성 하는 근본적인 문제를 해결하고자 합니다. 기존 제로샷 방법론들이 강체 객체 조작 에만 한정되며 명시적인 4D 기하학적 추론 이 부족하여 발생하는 비현실적인 상호작용 문제를 극복하는 것이 주된 목표입니다.

#Review #Human-Object Interaction (HOI)#4D Reconstruction #Articulated Objects #Video Diffusion Models #Inverse Rendering #Zero-shot Learning #Motion Synthesis #3D Gaussians

2026년 3월 4일

[논문리뷰] Large Causal Models for Temporal Causal Discovery

본 논문은 시계열 데이터에 대한 인과 관계 탐색(Causal Discovery, CD)에서 데이터셋별 모델 학습 패러다임의 한계 를 극복하고자 합니다.

#Review #Causal Discovery #Temporal Models #Foundation Models #Transformer Architecture #Zero-shot Learning #Time-series Data #Scalability #Multi-dataset Pretraining

2026년 2월 23일

[논문리뷰] StereoAdapter-2: Globally Structure-Consistent Underwater Stereo Depth Estimation

수중 스테레오 깊이 추정에서 발생하는 도메인 시프트(domain shift) 문제를 해결하고, 특히 대규모 깊이 차이(large-disparity) 및 텍스처 없는(textureless) 영역에서의 기존 GRU 기반 반복 정제 방식의 효율성 및 정확도 한계 를 극복하는 것을 목표로 합니다.

#Review #Underwater Depth Estimation #Stereo Matching #State Space Model #Mamba Architecture #ConvSS2D #Data Synthesis #LoRA #Zero-shot Learning #Robotics

2026년 2월 19일

[논문리뷰] Action100M: A Large-scale Video Action Dataset

본 연구는 기존 영상 액션 데이터셋의 규모 및 도메인 다양성 한계를 극복하고, 물리적 세계를 이해하는 AI 모델의 발전을 위한 대규모 오픈-어휘 영상 액션 데이터셋 인 ACTION100M 을 구축하는 것을 목표로 합니다.

#Review #Large-scale Dataset #Video Action Recognition #Open-Vocabulary #Temporal Segmentation #Vision-Language Models #Zero-shot Learning #Data Curation #Self-Refine

2026년 1월 15일

[논문리뷰] Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

본 논문은 LLM이 인간이 인지하는 문항(질문 또는 과제) 난이도를 정확하게 예측할 수 있는지, 특히 초기 데이터 부족 문제(cold-start problem) 상황에서 인간-AI 난이도 정렬(Human-AI Difficulty Alignment) 을 달성할 수 있는지 실증적으로 분석하는 것을 목표로 합니다.

#Review #Large Language Models #Item Difficulty Prediction #Human-AI Alignment #Proficiency Simulation #Metacognition #Curse of Knowledge #Educational Assessment #Zero-shot Learning

2025년 12월 22일

[논문리뷰] In-Video Instructions: Visual Signals as Generative Control

본 논문은 대규모 비디오 생성 모델의 제어 가능성을 탐구하며, 기존 텍스트 프롬프트의 한계인 전역적이고 추상적인 제어를 극복하고자 합니다.

#Review #Video Generation #Controllable AI #Visual Instructions #Image-to-Video #Spatial Control #Zero-shot Learning #Generative Models

2025년 11월 24일

[논문리뷰] Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation

본 논문은 시각 콘텐츠 생성 과정에서 발생하는 장기적인 구성, 다중 엔티티 관계 및 미묘한 지시사항 준수와 같은 문제점을 해결하기 위해, 텍스트 기반 추론(think)을 시각 생성(generate) 과정에 실시간으로 상호 연동(interleaving) 하는 프레임워크인 Thinking-while-Generating (TWIG) 를 제안합니다.

#Review #Visual Generation #Textual Reasoning #Interleaving #Large Multimodal Models (LMMs)#Chain-of-Thought (CoT)#Zero-shot Learning #Supervised Fine-tuning (SFT)#Reinforcement Learning (RL)

2025년 11월 20일

[논문리뷰] Step-Audio-EditX Technical Report

이 논문은 표현력이 풍부하고 반복적인 음성 편집(감정, 말하기 스타일, 운율 포함)과 강력한 제로샷 텍스트-음성 변환(TTS) 기능을 제공하는 최초의 오픈소스 LLM 기반 오디오 모델인 Step-Audio-EditX 를 제안합니다.

#Review #LLM-based Audio Model #Audio Editing #Text-to-Speech (TTS)#Zero-shot Learning #Large-Margin Data #Reinforcement Learning (RLHF)#Emotion Control #Speaking Style Transfer

2025년 11월 9일

[논문리뷰] Video models are zero-shot learners and reasoners

본 논문은 비디오 모델이 대규모 언어 모델(LLM)이 언어 이해 분야에서 이룬 것과 같이, 일반적인 목적의 비전 파운데이션 모델이 될 수 있다는 가설을 제시합니다.

#Review #Video Models #Zero-shot Learning #Visual Reasoning #Foundation Models #Generative AI #Perception #Manipulation #Modeling

2025년 9월 25일

[논문리뷰] MedVista3D: Vision-Language Modeling for Reducing Diagnostic Errors in 3D CT Disease Detection, Understanding and Reporting

3D CT 영상 진단에서 발생하는 오독(under-reading), 부주의로 인한 인지 오류(inattentional blindness), 그리고 커뮤니케이션 오류를 줄이는 것을 목표로 합니다.

#Review #3D CT #Vision-Language Model #Medical Imaging #Diagnostic Error Reduction #Multi-scale Alignment #Semantic Enrichment #Radiology Reporting #Zero-shot Learning

2025년 9월 8일

[논문리뷰] From Editor to Dense Geometry Estimator

본 논문은 기존의 텍스트-투-이미지(T2I) 생성 모델보다 Diffusion Transformer (DiT) 기반의 이미지 편집 모델이 단안 밀집 기하학 추정(depth 및 normal) 작업에 더 적합한 파운데이션 모델임을 증명하고, 이를 기반으로 FE2E 라는 새로운 프레임워크를 개발하여 제한된 훈련 데이터로도 뛰어난 제로샷 성능을 달성하는 것을 목표로 합니다.

#Review #Dense Geometry Estimation #Diffusion Transformer #Image Editing #Zero-shot Learning #Depth Estimation #Normal Estimation #Flow Matching #Logarithmic Quantization

2025년 9월 5일

[논문리뷰] Durian: Dual Reference-guided Portrait Animation with Attribute Transfer

본 논문은 주어진 참조 이미지로부터 대상 인물의 얼굴 속성(예: 헤어스타일, 안경)을 전이하여 동적인 초상화 애니메이션 비디오를 제로샷(zero-shot) 방식으로 생성하는 것을 목표로 합니다.

#Review #Portrait Animation #Attribute Transfer #Diffusion Models #Dual Reference Networks #Zero-shot Learning #Self-Reconstruction #Facial Editing

2025년 9월 5일

[논문리뷰] The Gold Medals in an Empty Room: Diagnosing Metalinguistic Reasoning in LLMs with Camlang

이 논문은 대규모 언어 모델(LLMs)이 언어 학습에서 인간과 유사한 메타언어적 추론 능력 을 진정으로 갖추고 있는지 평가하는 것을 목표로 합니다. LLM의 성공이 단순한 패턴 매칭이 아닌, 명시적인 문법 규칙과 어휘를 통해 낯선 언어를 학습하고 적용 하는 능력에서 비롯되는지 진단하고자 합니다.

#Review #LLMs #Metalinguistic Reasoning #Constructed Language #Camlang #Second Language Acquisition #Zero-shot Learning #Natural Language Understanding #Commonsense Reasoning

2025년 9월 3일

[논문리뷰] GLiClass: Generalist Lightweight Model for Sequence Classification Tasks

본 연구는 기존 제로샷 텍스트 분류 모델(생성형 LLM, 크로스 인코더, 임베딩 기반 모델)의 한계점, 즉 계산 비효율성, 지시 불일치, 확장성 부족 등을 해결하고자 합니다.

#Review #Sequence Classification #Zero-shot Learning #Few-shot Learning #Transformer #Multi-label Classification #PPO #GLiNER #Computational Efficiency

2025년 8월 12일

[논문리뷰] UniFusion: Vision-Language Model as Unified Encoder in Image Generation

기존 이미지 생성 모델들이 이미지와 텍스트에 대해 분리된 인코더를 사용하는 한계를 극복하고, 크로스-모달 추론 및 지식 전이 능력을 향상시키는 것을 목표로 합니다.

#Review #Vision-Language Model #Unified Encoder #Image Generation #Diffusion Models #Multimodal Learning #Text-to-Image #Image Editing #Zero-shot Learning

2025년 10월 15일

[논문리뷰] Detect Anything via Next Point Prediction

본 논문은 MLLM(Multimodal Large Language Model) 기반 객체 감지에서 발생하는 낮은 재현율, 중복 예측, 좌표 불일치 등의 문제를 해결하고, 기존 회귀 기반 모델과 동등하거나 이를 능가하는 제로샷 객체 인식 성능 을 달성하는 것을 목표로 합니다.

#Review #Multimodal Large Language Models #Object Detection #Coordinate Prediction #Reinforcement Learning #Supervised Fine-tuning #Visual Perception #Zero-shot Learning #Spatial Reasoning

2025년 10월 15일