#Data Generation

18개의 포스트

[논문리뷰] RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models

Real-world Degradation 상황에서의 Image Restoration은 자율 주행(Autonomous Driving) 및 객체 탐지(Object Detection)와 같은 Downstream Task에 필수적이다.

#Review #Image Restoration #Real-World Degradation #Large-Scale Image Editing Models #Diffusion Models #Data Generation #RealIR-Bench #Zero-shot Generalization #Transfer Learning

2026년 3월 26일

[논문리뷰] CodePercept: Code-Grounded Visual STEM Perception for MLLMs

이 논문은 MLLMs 가 STEM (과학, 기술, 공학, 수학) 분야에서 시각적 추론에 실패하는 근본적인 원인이 인지 능력 부족인지 추론 능력 부족인지를 규명하는 데서 출발합니다. 연구의 핵심 목표는 MLLMs 의 시각적 인지 능력을 체계적으로 향상시키기 위해 실행 가능한 코드를 강력한 인지 매체로 확립하는 것입니다.

#Review #Multimodal Large Language Models (MLLMs)#STEM Visual Reasoning #Code-Grounded Perception #Image-to-Code Translation #Data Generation #Benchmark #Reinforcement Learning #Matplotlib

2026년 3월 11일

[논문리뷰] From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

본 논문은 기존의 LMM(Large Multimodal Models) 자가 학습 프레임워크가 겪는 해석 가능한 진단 부족과 시각적 다양성 부족이라는 근본적인 한계를 해결하고자 합니다.

#Review #Large Multimodal Models #Iterative Training #Diagnostic-Driven Learning #Reinforcement Learning #Multimodal Reasoning #Data Generation #Agent Systems

2026년 2월 26일

[논문리뷰] SWE-Universe: Scale Real-World Verifiable Environments to Millions

본 논문은 낮은 생산 수율, 취약한 검증기, 과도한 비용 등 기존의 자동화된 소프트웨어 엔지니어링(SWE) 검증 가능 환경 구축의 문제점을 해결하고자 합니다.

#Review #Software Engineering Environments #LLM Agents #Data Generation #Verifiable Tasks #Multilingual #Reinforcement Learning #Self-Verification #Hacking Detection

2026년 2월 2일

[논문리뷰] ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

논문은 도구-증강 언어 모델 에이전트 훈련의 어려움(수동 개입, 검증 불가능한 시뮬레이션 환경, 불안정한 장기/다중 턴 학습)을 해결하기 위해 완전히 자동화된 종단 간 프레임워크 ASTRA 를 제안합니다.

#Review #LLM Agent #Tool Use #Trajectory Synthesis #Reinforcement Learning #Environment Synthesis #Data Generation #Multi-turn Interaction #Automated Training

2026년 2월 1일

[논문리뷰] Scaling Open-Ended Reasoning to Predict the Future

본 연구는 불확실한 미래에 대한 개방형 예측 질문에 대해 언어 모델(LLM)이 정확하고 신뢰할 수 있는 예측을 할 수 있도록 훈련하는 것을 목표로 합니다.

#Review #Language Models #Forecasting #Open-Ended Reasoning #Reinforcement Learning (RL)#Data Generation #Calibration #Retrieval-Augmented Generation (RAG)#Future Prediction

2025년 12월 31일

[논문리뷰] Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

본 논문은 투명하거나 반사되는 객체에 대한 깊이 및 법선 추정의 고질적인 문제를 해결하고자 합니다.

#Review #Video Diffusion Model #Depth Estimation #Normal Estimation #Transparent Objects #Robotics #Data Generation #LoRA Fine-tuning

2025년 12월 29일

[논문리뷰] N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

본 연구는 기존 멀티모달 모델이 2D 이미지에 의존하여 3D 공간 이해 능력이 부족하다는 한계를 해결하는 것을 목표로 합니다.

#Review #3D Grounding #Spatial Reasoning #Vision-Language Models #Depth Estimation #3D Object Detection #Chain-of-Thought #Data Generation #Multimodal AI

2025년 12월 18일

[논문리뷰] GigaWorld-0: World Models as Data Engine to Empower Embodied AI

본 논문은 GigaWorld-0 라는 통합 월드 모델 프레임워크를 개발하여 Embodied AI 를 위한 확장 가능하고 데이터 효율적인 데이터 엔진 으로 활용하는 것을 목표로 합니다.

#Review #World Models #Embodied AI #Data Generation #Video Generation #3D Scene Reconstruction #Robotics #Vision-Language-Action

2025년 11월 25일

[논문리뷰] Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset

본 연구는 흉부 X-ray(CXR)에서 병변 분할 모델의 제한적인 타겟 레이블 수와 전문가 수준의 상세 텍스트 입력 의존성을 해결하고자 합니다.

#Review #Medical Imaging #Chest X-ray #Lesion Segmentation #Vision-Language Models #Instruction Following #Data Generation #MIMIC-CXR

2025년 11월 19일

[논문리뷰] LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls

기존 LLM 툴 학습의 정적 합성 데이터 파이프라인 이 모델의 약점에 적응하지 못하고 노이즈 있는 레이블을 유지하여 훈련 효율성을 저해하는 문제를 해결합니다.

#Review #Large Language Models (LLMs)#Tool Learning #Data Generation #Model Training #Closed-Loop Framework #Reinforcement Learning (RL)#Data Refinement #Self-Correction

2025년 11월 12일

[논문리뷰] VideoSSR: Video Self-Supervised Reinforcement Learning

본 연구는 Multimodal Large Language Models (MLLMs)의 비디오 이해 능력을 향상시키기 위해, 기존 비디오 데이터셋의 높은 주석 비용, 복잡성 부족, 그리고 주석 과정에서의 편향성이라는 한계를 극복하는 것을 목표로 합니다.

#Review #Video Understanding #Self-Supervised Learning #Reinforcement Learning #MLLMs #Pretext Tasks #Verifiable Rewards #Data Generation #Temporal Grounding

2025년 11월 11일

[논문리뷰] ChartM^3: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension

본 연구는 기존 멀티모달 대규모 언어 모델(MLLM)이 실제 복잡한 차트 이해 작업에서 겪는 한계(제한된 차트 유형 및 복잡성, 낮은 질문 복잡성, 해석력 부족 등)를 해결하고자 합니다.

#Review #Chart Comprehension #Visual Reasoning #Data Generation #Code-Driven Pipeline #Multimodal LLMs #Retrieval-Augmented Generation #Reinforcement Learning #Synthetic Data

2025년 11월 9일

[논문리뷰] Mano Report

본 논문은 시각적 복잡성, 동적 환경, 다단계 추론 요구사항으로 인해 어려운 GUI 상호작용 자동화 문제를 해결하는 것을 목표로 합니다.

#Review #GUI Agent #Multi-modal Foundation Model #Reinforcement Learning #Supervised Fine-tuning #Simulated Environment #Data Generation #Error Recovery #Web Automation

2025년 9월 23일

[논문리뷰] WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

본 논문은 복잡한 정보 탐색과 다단계 웹 탐색을 요구하는 장기 웹 에이전트 를 훈련하기 위한 핵심 과제인 고품질 훈련 데이터 부족 문제 를 해결하고자 합니다.

#Review #Web Agents #Long-Horizon Reasoning #Large Language Models (LLMs)#Data Generation #Reinforcement Learning (RL)#Supervised Fine-tuning (SFT)#Web Navigation #Information Retrieval

2025년 9월 9일

[논문리뷰] NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks

본 논문은 기존 3D 객체 편집 방법들이 비효율적이고 일관성이 부족하며, 편집되지 않은 영역을 보존하는 데 실패하는 문제를 해결하고자 합니다.

#Review #3D Object Editing #Training-Free #FlowEdit #Mask-Free #Deep Generative Models #TRELLIS #Data Generation #Geometric Consistency

2025년 10월 20일

[논문리뷰] Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

대규모 언어 모델(LLM)이 모방 학습의 한계(훈련-추론 격차, 견고한 추론 능력 부족)를 극복하고 강화 학습(RL)을 통해 더 강력한 능력을 얻도록 하는 것이 목표입니다. 하지만 기존 RL 데이터셋은 웹 스케일 사전 훈련 코퍼스에 비해 규모와 다양성 면에서 현저히 작다는 병목 현상을 해결하고자 합니다.

#Review #Reinforcement Learning (RL)#Large Language Models (LLMs)#Data Pipeline #Web-scale Data #Question-Answering (QA)#Data Generation #Data Diversity #Data Efficiency

2025년 10월 13일

[논문리뷰] MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

현재 Multimodal Large Language Models (MLLM) 은 복잡한 실제 문제 해결에 필수적인 긴 추론 체인(long-chain reflective reasoning) 및 반복적 사고(iterative thinking) 능력에서 한계를 보입니다.

#Review #Multimodal LLMs #Reflective Reasoning #Long-Chain Reasoning #Benchmark #Policy Optimization #Data Generation #Reinforcement Learning #Backtracking

2025년 10월 10일