최신 포스트

[논문리뷰] MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts

기존 3D 도시 생성 방법론의 한계인 텍스트 기반 생성의 창의적 유연성과 객체 수준 편집 가능성 및 구조적 일관성 부족 문제를 해결하는 것을 목표로 합니다.

#Review #3D City Generation #Natural Language Processing #Aesthetic Adaptation #Controllable Assets #Layout Generation #Interactive Editing #Diffusion Models #Multimodal Dataset

2025년 11월 25일

[논문리뷰] HunyuanOCR Technical Report

기존 파이프라인 기반 OCR 시스템의 에러 전파 및 높은 유지보수 비용 문제를 해결하고, 대규모 일반 VLM의 높은 컴퓨팅 자원 요구사항 과 OCR 특화 VLM의 불완전한 엔드투엔드 최적화 한계를 극복하는 것을 목표로 합니다.

#Review #Optical Character Recognition #Multimodal Large Language Model #End-to-End Learning #Reinforcement Learning #Document Parsing #Information Extraction #Text Spotting

2025년 11월 25일

[논문리뷰] GigaWorld-0: World Models as Data Engine to Empower Embodied AI

본 논문은 GigaWorld-0 라는 통합 월드 모델 프레임워크를 개발하여 Embodied AI 를 위한 확장 가능하고 데이터 효율적인 데이터 엔진 으로 활용하는 것을 목표로 합니다.

#Review #World Models #Embodied AI #Data Generation #Video Generation #3D Scene Reconstruction #Robotics #Vision-Language-Action

2025년 11월 25일

[논문리뷰] GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms

이 논문은 LLM(대규모 언어 모델) 기반 진화 컴퓨테이션 을 위한 확장 가능한 오픈소스 프레임워크인 GigaEvo 를 소개하는 것을 목표로 합니다.

#Review #LLM-driven Evolutionary Computation #Quality-Diversity #MAP-Elites #Program Synthesis #Open-source Framework #Algorithmic Discovery #Genetic Algorithms

2025년 11월 25일

[논문리뷰] Fara-7B: An Efficient Agentic Model for Computer Use

본 논문은 컴퓨터 사용 에이전트(CUA) 훈련을 위한 고품질 상호작용 데이터의 부족 문제 를 해결하고, 적은 연산 자원으로 온디바이스에서 실행 가능한 효율적인 에이전트 모델 을 개발하는 것을 목표로 합니다. 이를 통해 CUA 기술의 상업적 활용 가능성을 확장하고 범용 개인 디지털 비서의 길을 열고자 합니다.

#Review #Computer Use Agents #Synthetic Data Generation #Multi-modal LLM #On-device AI #Web Automation #Pixel-in Action-out #Fara-7B #WebTailBench

2025년 11월 25일

[논문리뷰] Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward

본 논문은 통합 멀티모달 모델(UMMs)에서 '이해' 능력이 '생성' 과정에 실제로 정보를 제공하고 안내하는지 여부를 조사합니다.

#Review #Unified Multimodal Models #Understanding-Generation Gap #Reasoning #Knowledge Transfer #Chain-of-Thought #Self-Training #Synthetic Data #Evaluation Framework

2025년 11월 25일

[논문리뷰] DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection

이 논문은 AI 생성 콘텐츠(AIGC) 탐지에서 전체 이미지 분류에 집중하는 기존 방식의 한계를 극복하고, 확산 모델 기반의 로컬 편집 에 대한 동시적인 편집 영역 위치 파악(localization) 및 모델 귀속(attribution) 을 목표로 합니다.

#Review #AIGC Detection #Diffusion Models #Image Editing #Semantic Segmentation #Localization #Model Attribution #Benchmark #Multi-turn Editing

2025년 11월 25일

[논문리뷰] Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning

본 논문은 기존 비전-언어 에이전트가 인간 주석 기반 지도 학습의 한계와 복잡한 시각적 추론 단계 검증의 어려움, 그리고 평가 환각 문제로 인해 연속적인 자가 발전이 어렵다는 문제를 해결하고자 합니다.

#Review #Self-Evolving Agent #Vision-Language Models #Tool-Integrated Reasoning #Reinforcement Learning #Self-Correction #Multimodal AI #Generative AI

2025년 11월 25일

[pydantic-ai] anthropic_cache_messages 설정 추가 및 캐시 포인트 자동 제한

메시지 자동 캐싱과 Anthropic의 4-캐시-포인트 제한을 자동으로 관리하는 기능 추가

#Python #Pydantic AI #Anthropic #Feature #Caching

2025년 11월 25일

[triton] Triton JIT 컴파일러 최적화: `inspect.getclosurevars` 제거를 통한 10,000배 성능 향상

Triton JIT 컴파일러에서 `inspect.getclosurevars`를 제거하여 캡처 스코프 조회 속도를 10,000배 향상시켰습니다.

#Triton #JIT #성능 최적화 #Python #컴파일러 #inspect

2025년 11월 25일

[논문리뷰] UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios

본 논문은 기존 Diffusion Transformer(DiT) 모델을 다양한 종횡비(AR)의 4K 해상도 로 확장할 때 발생하는 한계를 극복하는 것을 목표로 합니다.

#Review #Text-to-Image Generation #Diffusion Transformers #4K Resolution #Aspect Ratio Extrapolation #Data-Model Co-Design #VAE Post-training #Positional Encoding #Diffusion Models

2025년 11월 24일

[논문리뷰] Target-Bench: Can World Models Achieve Mapless Path Planning with Semantic Targets?

본 논문은 최신 세계 모델(World Models, WMs)이 텍스트로 지정된 암묵적인 의미론적 목표를 가진 길 없는 경로 계획(mapless path planning) 작업을 실제 환경에서 얼마나 잘 수행하는지 정량적으로 평가하는 것을 목표로 합니다.

#Review #World Models #Mapless Navigation #Semantic Path Planning #Robot Learning #Video Prediction #Benchmark #Trajectory Generation

2025년 11월 24일

[논문리뷰] SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis

본 논문은 단일 뷰(single-view) HOI 비디오 생성의 기하학적 왜곡 및 비현실적인 모션 문제와 3D HOI 방법론의 제한된 일반화 능력 문제를 해결하고자 합니다.

#Review #Hand-Object Interaction #Multi-view Video Generation #4D Motion Synthesis #Diffusion Models #Spatio-temporal Consistency #Geometric Consistency #Appearance and Motion Joint Modeling

2025년 11월 24일

[논문리뷰] Plan-X: Instruct Video Generation via Semantic Planning

기존 비디오 확산 모델(DiT)이 복잡한 사용자 지시 및 장기 계획에서 겪는 높은 수준의 의미론적 추론 및 계획 능력 부족 문제를 해결하는 것이 목표입니다.

#Review #Video Generation #Semantic Planning #Multimodal LLM #Diffusion Transformer #Spatio-temporal Guidance #Visual Hallucination #Prompt Alignment #Instruction Following

2025년 11월 24일

[논문리뷰] Pillar-0: A New Frontier for Radiology Foundation Models

본 논문은 급증하는 영상 판독량과 인력 부족으로 인한 의료 시스템의 부담을 해결하기 위해, 기존 의료 AI 모델의 한계를 극복하는 새로운 방사선과 파운데이션 모델 Pillar-0 을 제안합니다.

#Review #Radiology Foundation Model #Volumetric Imaging #Multi-window Tokenization #Multi-scale Attention #Contrastive Learning #Clinical Evaluation #Data Efficiency #Medical Imaging

2025년 11월 24일

[논문리뷰] PRInTS: Reward Modeling for Long-Horizon Information Seeking

본 논문은 기존 Process Reward Model (PRM) 의 한계, 즉 짧은 추론 단위에 대한 이진 판단과 급증하는 컨텍스트 처리의 어려움을 극복하는 것을 목표로 합니다.

#Review #Reward Modeling #Long-Horizon Tasks #Information Seeking #Large Language Models #Trajectory Summarization #Reinforcement Learning #Tool Use #Process Reward Models

2025년 11월 24일

[논문리뷰] Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

본 논문은 대규모 언어 모델(LLM) 기반 멀티 에이전트 시스템이 특정 도메인에서 비일관적인 성능을 보이는 문제를 해결하고자 합니다.

#Review #Multi-Agent Systems #Reinforcement Learning #LLM Training #Hierarchical Credit Assignment #Trajectory Alignment #Group Relative Policy Optimization #Tool-Augmented Reasoning #Vertical Architecture

2025년 11월 24일

[논문리뷰] MIST: Mutual Information Via Supervised Training

본 논문은 고차원, 제한된 샘플, 복잡한 분포, 높은 MI(Mutual Information) 설정에서 기존 MI 추정기들이 겪는 성능 저하 문제를 해결하고자 합니다.

#Review #Mutual Information Estimation #Supervised Learning #Meta-Learning #Neural Networks #Uncertainty Quantification #SetTransformer #Quantile Regression

2025년 11월 24일

[논문리뷰] MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models

본 연구는 기존 Vision-Language Models (VLMs) 이 3D 공간 레이아웃, 움직임 패턴, 시간적 동역학을 포함하는 물리 기반 추론에서 한계를 보이는 문제를 해결하고자 합니다.

#Review #Vision-Language Models #Physics Reasoning #Motion Tracking #Spatial-Temporal Grounding #Video QA #AIGC Analysis #Reinforcement Learning

2025년 11월 24일

[논문리뷰] M3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent Benchmark

본 연구는 기존 LLM 도구 사용 벤치마크 들이 주로 텍스트 기반이고 선형적인 API 계획 에 초점을 맞추는 한계를 넘어, 멀티모달 LLM(MLLM) 에이전트 의 실제와 같은 도구 사용 능력을 평가하기 위한 첫 번째 벤치마크인 M³-Bench 를 제안합니다.

#Review #Multimodal LLM #Tool Use #Agent Benchmark #Model Context Protocol #Multi-Hop Reasoning #Multi-Threaded Execution #Evaluation Metrics #Similarity Alignment

2025년 11월 24일