최신 포스트

[논문리뷰] StyleBench: Evaluating thinking styles in Large Language Models

본 연구는 LLM이 사용하는 추론 전략, 즉 '사고 방식'이 모델 아키텍처 및 태스크 유형과 어떻게 상호작용하는지에 대한 이해 부족을 해결하는 것을 목표로 합니다.

#Review #Large Language Models #Reasoning Strategies #Prompt Engineering #LLM Evaluation #Benchmark #Thinking Styles #Scaling Laws #Meta-Reasoning

2025년 9월 26일

[논문리뷰] Seedream 4.0: Toward Next-generation Multimodal Image Generation

본 논문은 텍스트-투-이미지(T2I) 합성, 이미지 편집, 다중 이미지 합성 기능을 단일 프레임워크 내에서 통합하는 효율적이고 고성능의 차세대 멀티모달 이미지 생성 시스템 Seedream 4.0 을 개발하는 것을 목표로 합니다.

#Review #Multimodal Image Generation #Diffusion Transformer #VAE #Image Editing #Text-to-Image #Model Acceleration #Human Evaluation

2025년 9월 26일

[논문리뷰] SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines

이 논문은 이질적인 과학적 표현과 자연어를 통합하여 다양한 과학 분야에 걸친 복잡한 과학적 추론을 수행하는 최초의 과학 추론 대규모 언어 모델(LLM) 인 SciReasoner 를 제안합니다.

#Review #Scientific Reasoning #Foundation Models #Multi-modal Learning #Cross-domain Generalization #Chain-of-Thought #Reinforcement Learning #Scientific Discovery #Molecular Design

2025년 9월 26일

[논문리뷰] SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

이 논문은 기존 3D 장면 합성 방법론들이 고정된 카테고리, 부족한 객체 디테일, 물리적 불일치, 복잡한 사용자 지시와의 낮은 정합성 등의 한계를 가지는 문제를 해결하고자 합니다.

#Review #3D Scene Synthesis #Agentic Framework #LLMs #Self-Reflection #Tool-Use #Physical Plausibility #Iterative Refinement #Embodied AI

2025년 9월 26일

[논문리뷰] ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning

본 논문은 복잡한 추론 능력을 향상시키기 위해 어려운 수학 문제 의 생성을 확장하는 효율적인 파이프라인인 ScaleDiff 를 제안합니다. 기존의 문제 생성 방식이 높은 비용, 복잡한 프롬프트 엔지니어링, 그리고 제한적인 난이도 수준으로 인해 확장성이 부족하다는 한계를 극복하고자 합니다.

#Review #Mathematical Reasoning #Large Reasoning Models (LRMs)#Difficulty Scaling #Data Augmentation #Supervised Fine-Tuning (SFT)#Problem Generation #Solution Distillation

2025년 9월 26일

[논문리뷰] SD3.5-Flash: Distribution-Guided Distillation of Generative Flows

본 논문은 최첨단 생성 모델, 특히 Rectified Flow 모델 의 높은 연산 요구량으로 인해 발생하는 접근성 문제를 해결하고자 합니다.

#Review #Generative AI #Image Generation #Diffusion Models #Rectified Flow #Model Distillation #Few-Step Generation #Computational Efficiency #Prompt Alignment

2025년 9월 26일

[논문리뷰] Residual Off-Policy RL for Finetuning Behavior Cloning Policies

본 논문은 행동 복제(BC) 기반 정책의 한계(데이터 품질, 수동 데이터 수집, 성능 포화)와 실제 로봇에서의 직접적인 강화 학습(RL)의 어려움(샘플 비효율성, 안전성, 희소 보상)을 해결하는 것을 목표로 합니다.

#Review #Reinforcement Learning (RL)#Behavior Cloning (BC)#Residual Learning #Off-Policy RL #Robot Manipulation #Real-World Robotics #High-DoF Systems #Sample Efficiency

2025년 9월 26일

[논문리뷰] Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution

본 논문은 실세계 웹 페이지에서 멀티턴, 장기적 궤적(long-horizon trajectories) 을 따르는 작업 수행 시 기존 브라우저 에이전트의 행동 시퀀싱 혼란 과 과도한 시행착오 문제를 해결하는 것을 목표로 합니다.

#Review #Multi-Agent System #Browser Automation #Web Reconnaissance #Tool Generation #Task Execution #Self-Evolving AI #LLM/VLM #VisualWebArena

2025년 9월 26일

[논문리뷰] Quantized Visual Geometry Grounded Transformer

대규모 Visual Geometry Grounded Transformers (VGGTs) 모델의 과도한 연산 및 메모리 비용 문제를 해결하고, 실세계 배포를 위한 효율적인 저비트 양자화 프레임워크를 개발하는 것이 목표입니다.

#Review #Quantization #Post-Training Quantization #3D Reconstruction #Visual Transformer #Model Compression #Efficient Inference #Hadamard Rotation #Calibration Sampling

2025년 9월 26일

[논문리뷰] MOSS-ChatV: Reinforcement Learning with Process Reasoning Reward for Video Temporal Reasoning

비디오 기반 MLLM(Multimodal Large Language Models)에서 발생하는 프로세스 불일치(process inconsistency) 문제를 해결하여, 모델이 올바른 최종 답변을 도출하더라도 중간 추론 과정이 비디오의 시간적 역동성에서 벗어나는 한계를 극복하는 것을 목표로 합니다.

#Review #Video Temporal Reasoning #Reinforcement Learning #Process Supervision #Dynamic Time Warping #Multimodal Large Language Models #Video State Prediction #Reward Hacking

2025년 9월 26일

[논문리뷰] MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

본 논문은 대규모 multimodal 추론 모델의 발전을 저해하는 두 가지 주요 한계를 해결하고자 합니다.

#Review #Multimodal Reasoning #Reinforcement Learning #Variance-Aware Sampling #Gradient Vanishing #Data Curation #Chain-of-Thought #GRPO

2025년 9월 26일

[논문리뷰] MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model

논문은 소스 도메인 데이터가 없고, 강력한 LALM(Large Audio-Language Model) 이 API 를 통해서만 접근 가능한 현실적인 SFUDA(Source-Free Unsupervised Domain Adaptation) 시나리오를 해결하는 것을 목표로 합니다.

#Review #Speech Emotion Recognition #Source-Free Unsupervised Domain Adaptation #Large Audio-Language Models #Label Fusion #Mutual Information #API-Only Models #Domain Mismatch

2025년 9월 26일

[논문리뷰] Interactive Recommendation Agent with Active User Commands

본 논문은 기존 추천 시스템의 수동적 피드백 메커니즘이 사용자의 미묘한 의도와 만족도를 정확히 포착하지 못하여 발생하는 '사용자 의도-시스템 해석' 간의 간극을 해결하고자 합니다.

#Review #Interactive Recommendation #Large Language Models #Multi-Agent System #Natural Language Processing #Knowledge Distillation #User Control

2025년 9월 26일

[논문리뷰] Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets

기존 3D 생성 모델이 이미지 또는 텍스트 조건화에 주로 의존하며 세분화된 크로스-모달 제어가 부족 하여 실용적 적용이 제한되는 문제를 해결하고자 합니다. 다양한 형태의 제어 신호 를 통합하는 통일된 프레임워크를 통해 3D 에셋 생성의 제어 가능성 과 기하학적 정확도 를 향상시키는 것을 목표로 합니다.

#Review #3D Generation #Controllable Generation #Multi-modal Conditioning #Diffusion Models #Point Clouds #Voxels #Bounding Boxes #Skeletons #Hunyuan3D

2025년 9월 26일

[논문리뷰] Does FLUX Already Know How to Perform Physically Plausible Image Composition?

본 연구는 복잡한 조명, 그림자, 물 반사 등 물리적으로 사실적인 이미지 합성 을 사전 훈련된 텍스트-투-이미지(T2I) 확산 모델 을 활용하여 훈련 없이 수행하는 것을 목표로 합니다. 기존 모델들이 가진 객체 포즈 고정, 부적절한 해상도 처리, 그리고 컨텍스트에 맞지 않는 조명 생성 등의 한계를 극복하고자 합니다.

#Review #Image Composition #Diffusion Models #Training-Free #Physically Plausible #FLUX #Adapter #Guidance #Benchmark

2025년 9월 26일

[논문리뷰] Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving

본 논문은 자율주행 시스템에서 기존 모방 학습 기반 VLA(Vision-Language-Action) 모델 이 물리적 규칙 및 안전 제약 조건을 내재적으로 인코딩하는 데 어려움을 겪는 문제를 해결하고자 합니다.

#Review #Autonomous Driving #Vision-Language-Action Models #Discrete Diffusion #Reflection Mechanism #Trajectory Generation #Safety Constraints #Imitation Learning

2025년 9월 26일

[논문리뷰] CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling

본 연구는 기존 사실적인 헤어 모델링 기법으로는 다루기 어려운, 고도로 양식화된 3D 애니메이션 헤어스타일 의 효율적인 모델링 및 생성 문제를 해결하고자 합니다.

#Review #3D Anime Hairstyle #Autoregressive Modeling #Control Points #Parametric Representation #Transformer #Generative AI #Dataset (AnimeHair)#Computer Graphics

2025년 9월 26일

[논문리뷰] CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning

본 논문은 LLM (Large Language Model) 을 위한 강화 학습(RL) 과정에서 정책 엔트로피(policy entropy) 의 불안정성을 해결하는 것을 목표로 합니다.

#Review #Reinforcement Learning #Large Language Models #Policy Optimization #PPO #Entropy Control #Gradient Clipping #Exploration-Exploitation

2025년 9월 26일

[논문리뷰] Blueprints of Trust: AI System Cards for End to End Transparency and Governance

본 논문은 AI 시스템의 개발 및 배포 과정에서 투명성과 책임성을 강화하기 위한 새로운 프레임워크인 Hazard-Aware System Card (HASC) 를 소개합니다.

#Review #AI Governance #Transparency #AI System Card #Hazard-Aware System Card #Data Provenance #AI Safety #AI Risk Management #ISO/IEC 42001

2025년 9월 26일

[논문리뷰] Behind RoPE: How Does Causal Mask Encode Positional Information?

본 논문은 Transformer 디코더 에서 Rotary Positional Embeddings (RoPE) 와 같은 명시적인 위치 인코딩 외에 인과 마스크(causal mask) 가 어떻게 위치 정보를 인코딩하는지 그 메커니즘을 규명하는 것을 목표로 합니다.

#Review #Transformer Decoder #Causal Mask #Positional Encoding #RoPE #Attention Mechanism #Length Generalization #Large Language Models

2025년 9월 26일