최신 포스트

[논문리뷰] MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments

기존 모바일 GUI 에이전트 벤치마크인 AndroidWorld 의 포화 상태(90% 이상의 성공률)와 현실적이지 않은 태스크 복잡성 한계를 극복하는 것을 목표로 합니다.

#Review #Mobile Agents #GUI Benchmarking #Agent-User Interaction #Tool-Augmented Agents #Model Context Protocol (MCP)#Long-Horizon Tasks #Reproducible Evaluation #Android Environment

2025년 12월 22일

[논문리뷰] MatSpray: Fusing 2D Material World Knowledge on 3D Geometry

본 논문은 2D 이미지 기반의 물질 예측 모델을 활용하여 3D 형상에 물리 기반 렌더링(PBR) 속성을 부여하고, 여러 시점(multi-view)에서 일관성을 유지하며 다시 조명 가능한(relightable) 3D 객체 를 재구성하는 것을 목표로 합니다.

#Review #3D Reconstruction #Material Estimation #Diffusion Models #Gaussian Splatting #Inverse Rendering #PBR #Relighting #Neural Merger

2025년 12월 22일

[논문리뷰] LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding

Diffusion Large Language Models (dLLM)은 병렬 추론 잠재력이 높음에도 불구하고, 현재 confidence-driven 디코딩 전략은 1-3 TPF (Tokens Per Forward pass) 에 머물러 실제 병렬성을 충분히 활용하지 못합니다.

#Review #dLLM #Parallel Decoding #Lookahead #Inference Acceleration #Token Filling Order #Branch Parallelism #Diffusion Models

2025년 12월 22일

[논문리뷰] LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

이 논문은 전통적인 모듈형 내비게이션 파이프라인의 지연 시간과 오류 누적 문제를 해결하고, 기존 end-to-end 방식의 명시적 localization 의존성 한계를 극복하는 것을 목표로 합니다.

#Review #Autonomous Navigation #End-to-end Learning #Localization Grounded #Visual Geometry #Metric-aware Perception #Diffusion Policy #RGB-D

2025년 12월 22일

[논문리뷰] Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation

논문은 카메라 제어 가능한 동적 장면 비디오 생성에서 높은 카메라 포즈 충실도 와 뷰 일관성 을 유지하며, 가려진 기하학에 대해 추론하는 문제를 해결하는 것을 목표로 합니다. 특히, 기존의 깊이 재투영 기반 방법론의 부정확한 깊이 추정 오류와 궤적 조건부 모델의 데이터셋 편향 문제를 극복하고자 합니다.

#Review #Video Generation #Camera Control #Homography #Diffusion Models #Data Augmentation #Novel View Synthesis #Pose Fidelity

2025년 12월 22일

[논문리뷰] GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

본 논문은 대규모 언어 모델(LLM) 에이전트 훈련의 주요 병목인 높은 비용과 실세계 상호작용 데이터의 정적인 특성을 해결하고자 합니다.

#Review #LLM Agents #Environment Simulation #Co-evolution #Curriculum Learning #Data Efficiency #Reinforcement Learning #Adaptive Simulation #Difficulty Alignment

2025년 12월 22일

[논문리뷰] Does It Tie Out? Towards Autonomous Legal Agents in Venture Capital

본 연구는 벤처 캐피탈 자본금 내역 검증('cap table tie-out')이라는 복잡한 법률 워크플로우를 자동화하는 것을 목표로 합니다.

#Review #Legal AI #Venture Capital #Due Diligence #Capitalization Table #Multi-document Reasoning #Knowledge Graph #World Model #Neuro-Symbolic AI

2025년 12월 22일

[논문리뷰] DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

본 논문은 대규모 언어 모델(LLM)을 위한 고품질 데이터 준비 파이프라인의 파편화된 현상 과 표준화 부족 문제 를 해결하고자 합니다. 특히, LLM 기반의 데이터 합성 및 반복적인 의미론적 정제 를 효과적으로 지원하는 통합적이고 확장 가능한 LLM 구동 데이터 준비 프레임워크 를 구축하는 것이 목표입니다.

#Review #LLM Data Preparation #Workflow Automation #Data-Centric AI #Synthetic Data #Multi-Agent System #Framework #Reproducibility

2025년 12월 22일

[논문리뷰] Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

본 논문은 LLM이 인간이 인지하는 문항(질문 또는 과제) 난이도를 정확하게 예측할 수 있는지, 특히 초기 데이터 부족 문제(cold-start problem) 상황에서 인간-AI 난이도 정렬(Human-AI Difficulty Alignment) 을 달성할 수 있는지 실증적으로 분석하는 것을 목표로 합니다.

#Review #Large Language Models #Item Difficulty Prediction #Human-AI Alignment #Proficiency Simulation #Metacognition #Curse of Knowledge #Educational Assessment #Zero-shot Learning

2025년 12월 22일

[논문리뷰] Brain-Grounded Axes for Reading and Steering LLM States

본 연구는 LLM(대규모 언어 모델)의 해석 가능성 방향이 종종 외부 접지(external grounding)가 부족하다는 문제에 주목합니다. 이를 해결하기 위해 인간의 뇌 활동을 LLM의 내부 상태를 해석하고 조종하기 위한 안정적이고 외부적으로 접지된 좌표계로 정의하는 것을 목표로 합니다.

#Review #LLM Interpretability #Brain-Grounded AI #MEG #Phase-Locking Value #ICA #LLM Steering #Neural Decoding #Latent Space

2025년 12월 22일

[Triton] SWP 루프 로우어링에서 barrier 위치 결정 로직 수정

MMA의 non-pipelined operand와 tmem_load 간 barrier 위치를 linearized schedule 기반으로 정확히 결정

#Triton #NVIDIA #Pipelining #SWP #Bug Fix

2025년 12월 22일

[Triton] AMD RDNA에서 matmul_ogs 설정 최적화 — 최대 46% 성능 향상

RDNA3/4 GPU에서 block_m/block_n/block_k 설정을 조정하여 레지스터 스필링 해결

#Triton #AMD #RDNA #Performance #Kernel Tuning

2025년 12월 22일

[논문리뷰] When Reasoning Meets Its Laws

이 논문은 대규모 추론 모델(LRMs) 의 비직관적이고 최적화되지 않은 추론 행동을 체계적으로 이론화하고, 바람직한 추론 패턴을 특성화하기 위한 Laws of Reasoning (LORE) 프레임워크를 제안합니다.

#Review #Large Reasoning Models #Reasoning Behaviors #Compute Law #Accuracy Law #Monotonicity #Compositionality #Fine-tuning #LORE-BENCH

2025년 12월 21일

[논문리뷰] Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

본 논문은 멀티턴 LLM 에이전트 학습에서 기존 GRPO(Group Relative Policy Optimization) 의 불안정성과 비효율성을 해결하고자 합니다. 특히 긴 추론이 필요한 시나리오에서 샘플링 분산이 높고, 턴별 기여도가 불균등하여 불정확한 어드밴티지 추정이 발생하는 문제를 개선하는 것을 목표로 합니다.

#Review #Multi-Turn Reinforcement Learning #LLM Agents #Proximal Policy Optimization (PPO)#Turn-Level MDP #Advantage Estimation #Generative AI #Deep Reinforcement Learning

2025년 12월 21일

[논문리뷰] StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models

Visual Autoregressive (VAR) 모델은 고품질 이미지 생성을 가능하게 하지만, 특히 대규모 스케일 단계에서 상당한 연산 복잡도와 긴 런타임으로 어려움을 겪습니다.

#Review #Visual Autoregressive Models #Image Generation #Model Acceleration #Low-Rank Approximation #Semantic Irrelevance #Stage-Aware Optimization #Text-to-Image Synthesis

2025년 12월 21일

[논문리뷰] Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

본 논문은 학부 및 대학원 수준 이상의 수학 문제에 대한 형식적 정리 증명(Formal Theorem Proving)의 효율성과 성능을 개선하는 것을 목표로 합니다. 특히, LLM 기반의 형식적 증명에서 나타나는 높은 계산 비용과 도전 과제를 해결하며, 자연어 증명과 형식어 증명 간의 간극을 효과적으로 연결하고자 합니다.

#Review #Formal Theorem Proving #Large Language Models #Reinforcement Learning #Agentic Prover #Lean Theorem Prover #Mathematical Reasoning #Test-Time Scaling

2025년 12월 21일

[논문리뷰] SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories

이 논문은 기존의 LLM 기반 소프트웨어 엔지니어링 벤치마크(예: SWE-bench)가 수동 큐레이션, 정적 데이터셋, Python 버그 수정에 대한 집중, 그리고 데이터 오염 위험과 같은 한계를 가진다는 문제점을 해결하고자 합니다.

#Review #Software Engineering Benchmarks #Large Language Models (LLMs)#Code Generation #Automated Benchmark Generation #Multilingual #GitHub Pull Requests #Test Oracle #Fine-tuning

2025년 12월 21일

[논문리뷰] Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

본 논문은 Multimodal Large Language Models (MLLMs)가 실제 환경의 극심한 시각적 열화(visual degradations) 조건에서 성능이 크게 저하되는 문제를 해결하고자 합니다.

#Review #Multimodal Large Language Models (MLLMs)#Visual Degradation #Robustness #Reasoning Chains #Supervised Fine-Tuning (SFT)#Reinforcement Learning (RL)#Degradation-Aware Reasoning #Interpretability

2025년 12월 21일

[논문리뷰] RadarGen: Automotive Radar Point Cloud Generation from Cameras

본 연구는 자동차 레이더 포인트 클라우드 생성이 지닌 고유한 데이터 특성(희소성, 무질서성, RCS/Doppler 속성)으로 인한 어려움을 해결하고자 합니다.

#Review #Radar Point Cloud Generation #Diffusion Models #Camera-to-Radar #BEV Representation #Autonomous Driving #Multi-modal Generative Models #Scene Editing

2025년 12월 21일

[논문리뷰] Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

본 논문은 대규모 언어 모델(LLM)의 과학적 일반 지능(SGI) 평가를 위한 체계적인 프레임워크와 정의가 부족하다는 문제를 해결합니다.

#Review #Scientific General Intelligence (SGI)#LLMs #Benchmarking #Scientist-Aligned Workflows #Practical Inquiry Model #Multi-modal Reasoning #Code Generation #Test-Time Reinforcement Learning (TTRL)

2025년 12월 21일