최신 포스트

[논문리뷰] T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables

본 논문은 대규모 언어 모델(LLMs)의 테이블 추론 능력을 산업 애플리케이션에 적용하는 데 있어, 테이블 정보를 포괄적인 보고서로 변환하는 핵심 과제를 해결하고자 합니다. 특히, 복잡하고 다양한 테이블로 인한 추론 성능 저하와 기존 벤치마크의 실제 적용 평가 능력 부족이라는 두 가지 주요 문제를 다룹니다.

#Review #Table-to-Report Generation #Large Language Models (LLMs)#Benchmark Dataset #Industrial Applications #Table Reasoning #Evaluation Metrics #Real-world Data

2025년 9월 2일

[논문리뷰] PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

본 연구는 에이전트 추론(agentic reasoning)을 위한 critic-free 강화 학습 방법론, 특히 그룹 정책(group policies)의 한계를 해결하는 것을 목표로 합니다.

#Review #Reinforcement Learning #Critic-Free RL #Agentic Reasoning #Policy Optimization #Large Language Models (LLMs)#Advantage Estimation #Group Sampling #Static Value Estimation

2025년 9월 2일

[논문리뷰] No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes

본 논문은 기존 표면 결함 감지 모델들이 특정 감독 시나리오에 제한되거나 다양한 데이터 주석 유형(비지도, 약지도, 혼합, 완전 지도)에 적응하기 어려운 문제를 해결하고자 합니다.

#Review #Surface Defect Detection #Anomaly Detection #Mixed Supervision #Deep Learning #Industrial Inspection #Unified Model

2025년 9월 2일

[논문리뷰] How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench

본 논문은 복잡하고 동적인 다중 턴 환경(예: τ-bench )에서 대규모 언어 모델(LLM) 에이전트 가 도구를 사용하는 과정에서 발생하는 일관성 없는 추론, 도메인 정책 미준수, 장기적인 정보 추출 실패와 같은 문제들을 해결하는 것을 목표로 합니다.

#Review #LLM Agents #Tool Use #Function Calling #Input Reformulation #Dynamic Environments #τ-bench #Context Engineering #Multi-Agent Framework

2025년 9월 2일

[논문리뷰] From reactive to cognitive: brain-inspired spatial intelligence for embodied agents

본 논문은 기존의 반응적(reactive) 접근 방식이 가진 공간 기억의 부재와 그로 인한 복잡한 실세계 환경에서의 일반화 및 적응성 부족 문제를 해결하는 것을 목표로 합니다.

#Review #Spatial Cognition #Embodied Agents #Brain-inspired AI #Cognitive Map #Spatial Memory #MLLMs #Navigation

2025년 9월 2일

[논문리뷰] UItron: Foundational GUI Agent with Advanced Perception and Planning

이 논문은 Mobile/PC 환경에서 복잡한 작업을 자동화하는 GUI 에이전트 의 핵심 역량을 강화하는 오픈소스 파운데이션 모델, Ultron 을 제시합니다.

#Review #GUI Agent #Foundational Model #Multimodal LLM #Perception #Planning #Reinforcement Learning #Data Engineering #Chinese App Scenarios

2025년 9월 1일

[논문리뷰] TiKMiX: Take Data Influence into Dynamic Mixture for Language Model Pre-training

언어 모델 사전 훈련 과정에서 고정된 데이터 혼합 전략은 모델의 학습 선호도가 동적으로 변화함에 따라 최적의 성능을 달성하지 못합니다. 본 논문은 이러한 진화하는 데이터 선호도를 효율적으로 관찰 하고, 이를 기반으로 데이터 혼합 비율을 동적으로 조정 하여 모델 성능을 극대화하는 것을 목표로 합니다.

#Review #Language Model Pre-training #Dynamic Data Mixing #Data Influence #Group Influence #Optimization #Regression Model #LLM Training

2025년 9월 1일

[논문리뷰] Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

대규모 언어 모델(LLM)이 복잡한 추론 작업에는 능숙하지만, 인간 아이들이 쉽게 수행하는 간단한 상호작용 작업에서는 어려움을 겪는 문제를 해결하고자 합니다.

#Review #Large Language Models #Reinforcement Learning #Game AI #Procedural Knowledge #Declarative Knowledge #Explainable AI #Strategic Decision-Making

2025년 9월 1일

[논문리뷰] TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis

기존 오디오 기반 Talking Head 합성 모델들이 인종, 언어, 연령대 등 다양한 인간 특성에 대한 일반화 능력이 부족하여 발생하는 성능 저하 문제를 해결하는 것을 목표로 합니다.

#Review #Audio-Driven Talking Head Synthesis #Large-Scale Dataset #Data Diversity #Data Curation #Evaluation Benchmark #Generalization Gap #Algorithmic Fairness

2025년 9월 1일

[논문리뷰] R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

본 논문은 복잡한 추론 문제에서 뛰어난 성능을 보이는 기존 MLLM의 step-by-step 사고(thinking) 과정이 단순 문제에서는 불필요한 연산 오버헤드를 유발하는 비효율성을 해결하고자 합니다.

#Review #Multimodal Large Language Models (MLLMs)#Auto-Thinking #Reinforcement Learning (RL)#Bi-mode Annealing #Bi-mode Policy Optimization (BPO)#General-Purpose AI #Reasoning #Efficiency

2025년 9월 1일

[논문리뷰] Morae: Proactively Pausing UI Agents for User Choices

본 논문은 기존 UI 에이전트들이 맹인 및 저시력(BLV) 사용자들에게 중요한 의사결정 시 선택권을 주지 않고 자동으로 작업을 완료하여 사용자 주도성을 저해하는 문제를 해결하고자 합니다.

#Review #UI Agents #Accessibility #Human-Agent Interaction #Mixed-Initiative AI #Large Multimodal Models #Proactive AI #User Choice #Blind and Low-Vision Users

2025년 9월 1일

[논문리뷰] Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery

본 논문은 기존의 단일 모달(symbolic regression 또는 LLM) 접근법이 물리학자들이 현상학적 시각적 표현을 활용하는 점을 간과하여 동적 현상 내재의 시공간 패턴을 해석하는 능력이 약하다는 문제를 해결하고자 합니다.

#Review #Physics Formula Discovery #Multimodal AI #Vision-Language Models #Symbolic Regression #Causal Chain of Thought #Reinforcement Learning #Agentic AI

2025년 9월 1일

[논문리뷰] HERMES: Human-to-Robot Embodied Learning from Multi-Source Motion Data for Mobile Dexterous Manipulation

이 논문은 복잡한 다지(multi-fingered) 로봇 핸드를 활용한 모바일 양손 로봇 조작(mobile bimanual dexterous manipulation)에서 다양한 소스의 인간 동작 데이터를 실제 로봇 행동으로 효과적으로 변환 하는 도전 과제를 해결하는 것을 목표로 합니다.

#Review #Dexterous Manipulation #Mobile Manipulation #Human-to-Robot Learning #Sim2Real #Reinforcement Learning #Depth Image #Visual Localization #Bimanual Control

2025년 9월 1일

[논문리뷰] EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

본 연구는 기존 VLA 모델들이 가진 제한된 도메인 및 유연성 문제를 해결하고, 개방형 환경에서 인간 수준의 유연한 다중 모달 추론 및 물리적 상호작용 을 가능하게 하는 일반ist 로봇 제어를 목표로 합니다.

#Review #Embodied AI #Robot Control #Vision-Language-Action Models #Multimodal Pretraining #Flow Matching #Foundation Models #Generalization #Real-world Robotics

2025년 9월 1일

[논문리뷰] Efficient Code Embeddings from Code Generation Models

본 논문은 기존 코드 임베딩 모델들이 겪는 지도 학습 데이터 부족 문제 와 대규모 비정렬 코드/자연어 데이터의 활용 미흡 을 해결하고자 합니다.

#Review #Code Embeddings #Code Generation Models #Autoregressive Backbones #Last-Token Pooling #Instruction Tuning #Contrastive Learning #Retrieval-Augmented Generation #MTEB Benchmark

2025년 9월 1일

[논문리뷰] Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation

3D 데이터 부족 문제를 해결하기 위해 대규모 비디오 데이터에서 얻은 상식 사전(commonsense priors) 을 활용하여 3D 생성 모델의 일반화 능력을 향상시키는 것을 목표로 합니다.

#Review #3D Generation #Video Diffusion Models #Spatial Consistency #Semantic Knowledge #Multi-view Synthesis #Large-scale Dataset #Image-to-3D #Text-to-3D

2025년 9월 1일

[논문리뷰] CLIPSym: Delving into Symmetry Detection with CLIP

본 논문은 기존 대규모 비전-언어 모델(Vision-Language Models, VLMs)인 CLIP 을 활용하여 이미지 내의 반사 및 회전 대칭을 더욱 정확하고 견고하게 탐지하는 것을 목표로 합니다.

#Review #Symmetry Detection #Vision-Language Models #CLIP #Equivariant Networks #Prompt Engineering #Geometric Deep Learning

2025년 9월 1일

[논문리뷰] AHELM: A Holistic Evaluation of Audio-Language Models

오디오-언어 모델(ALMs)의 표준화된 벤치마크 부족 문제를 해결하고, 기존 평가들이 제한된 기능에만 초점을 맞추며 공정성 및 안전성 같은 중요한 측면을 간과하는 한계를 극복하는 것을 목표로 합니다.

#Review #Audio-Language Models #Holistic Evaluation #Benchmarking #Multimodality #Fairness #Robustness #Reasoning #Bias Detection

2025년 9월 1일

[논문리뷰] A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

이 논문은 과학 분야 대규모 언어 모델(Sci-LLMs)의 발전 과정을 데이터 기반과 에이전트 프론티어 관점에서 종합적으로 분석하는 것을 목표로 합니다.

#Review #Scientific LLMs #AI for Science #Scientific Data #Agentic AI #Multimodal Integration #Knowledge Representation #Autonomous Discovery #Data Ecosystems

2025년 9월 1일

[논문리뷰] A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

본 논문은 기존의 LLM 코드 생성 평가 벤치마크가 단편적인 코드 스니펫 에 집중하고, 불안정한 평가 방식 을 사용하며, 실제 리포지토리 컨텍스트 를 반영하지 못하여 AI 생성 코드의 보안을 충분히 평가하지 못하는 문제를 해결하고자 합니다.

#Review #AI-Generated Code Security #LLM Evaluation #Repository-Level Benchmark #Code Security #Vulnerability Detection #Static Analysis #Reproducibility #Context-Awareness

2025년 9월 1일