Review

[논문리뷰] VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs

arXiv에 게시된 'VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs' 논문에 대한 자세한 리뷰입니다.

#Review #Vision-Language Models #Object Grounding #Fine-grained Perception #Hybrid Region Encoder #Plug-and-play #Two-stage Training #Visual Reasoning

2025년 10월 2일

[논문리뷰] VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Zirui Ge이 arXiv에 게시한 'VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators' 논문에 대한 자세한 리뷰입니다.

#Review #Vision-Language-Action Models #Reinforcement Learning #World Models #Fine-tuning #Embodied AI #Robotics #Reward Design #Distribution Shift

2025년 10월 2일

[논문리뷰] Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned

arXiv에 게시된 'Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned' 논문에 대한 자세한 리뷰입니다.

#Review #Vision-Language Models (VLMs)#Process Reward Models (PRMs)#Multimodal Reasoning #Test-Time Scaling (TTS)#Process Supervision #Dataset Construction #Perception Errors #MCTS

2025년 10월 2일

[논문리뷰] ReSWD: ReSTIR'd, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance Reduction

arXiv에 게시된 'ReSWD: ReSTIR'd, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance Reduction' 논문에 대한 자세한 리뷰입니다.

#Review #Sliced Wasserstein Distance #Reservoir Sampling #Variance Reduction #Distribution Matching #Diffusion Guidance #Color Correction #Monte Carlo Estimation

2025년 10월 2일

[논문리뷰] PIPer: On-Device Environment Setup via Online Reinforcement Learning

arXiv에 게시된 'PIPer: On-Device Environment Setup via Online Reinforcement Learning' 논문에 대한 자세한 리뷰입니다.

#Review #Environment Setup #LLMs #Reinforcement Learning #Supervised Fine-tuning #On-device AI #Software Engineering #Verifiable Rewards

2025년 10월 2일

[논문리뷰] On Predictability of Reinforcement Learning Dynamics for Large Language Models

Yuqing Huang이 arXiv에 게시한 'On Predictability of Reinforcement Learning Dynamics for Large Language Models' 논문에 대한 자세한 리뷰입니다.

#Review #Reinforcement Learning #Large Language Models #Parameter Dynamics #Rank-1 Dominance #Linear Dynamics #SVD #Model Acceleration #Predictability

2025년 10월 2일

[논문리뷰] Making, not Taking, the Best of N

arXiv에 게시된 'Making, not Taking, the Best of N' 논문에 대한 자세한 리뷰입니다.

#Review #LLM Aggregation #Generative Fusion #Best-of-N #Synthetic Data Generation #Test-Time Scaling #Multilingual Models #Ensemble Learning

2025년 10월 2일

[논문리뷰] Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

arXiv에 게시된 'Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation' 논문에 대한 자세한 리뷰입니다.

#Review #Large Language Models (LLMs)#Reinforcement Learning (RL)#Exploration Budget Allocation #Knapsack Problem #Group Relative Policy Optimization (GRPO)#Mathematical Reasoning #Resource Optimization

2025년 10월 2일

[논문리뷰] JoyAgent-JDGenie: Technical Report on the GAIA

arXiv에 게시된 'JoyAgent-JDGenie: Technical Report on the GAIA' 논문에 대한 자세한 리뷰입니다.

#Review #Generalist Agent #Multi-Agent System #Plan-Execute #ReAct #Hierarchical Memory #Tool Integration #GAIA Benchmark #LLM Agent

2025년 10월 2일

[논문리뷰] Infusing Theory of Mind into Socially Intelligent LLM Agents

arXiv에 게시된 'Infusing Theory of Mind into Socially Intelligent LLM Agents' 논문에 대한 자세한 리뷰입니다.

#Review #Theory of Mind #Large Language Models #Social Agents #Dialogue Systems #Mental State Modeling #Look-ahead Planning #Supervised Fine-tuning #Sotopia Benchmark

2025년 10월 2일

[논문리뷰] In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning

Chaehyeon Chung이 arXiv에 게시한 'In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning' 논문에 대한 자세한 리뷰입니다.

#Review #LLM Feedback #Multi-turn Reasoning #In-place Editing #Token Efficiency #Error Correction #Human-AI Interaction #Reasoning Tasks

2025년 10월 2일

[논문리뷰] Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures

Andrea Passerini이 arXiv에 게시한 'Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures' 논문에 대한 자세한 리뷰입니다.

#Review #LLM Interpretability #Vector Symbolic Architectures #Neural Probing #Information Decoding #Hyperdimensional Computing #Latent Representations

2025년 10월 2일

[논문리뷰] GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness

Chien-Sheng Wu이 arXiv에 게시한 'GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness' 논문에 대한 자세한 리뷰입니다.

#Review #GUI Agents #KV Cache Compression #Spatio-Temporal Awareness #Vision-Language Models #Efficiency #Attention Sparsity #QR Decomposition

2025년 10월 2일

[논문리뷰] GEM: A Gym for Agentic LLMs

arXiv에 게시된 'GEM: A Gym for Agentic LLMs' 논문에 대한 자세한 리뷰입니다.

#Review #Agentic LLMs #Reinforcement Learning #Environment Simulator #Multi-turn Interactions #Return Batch Normalization #Tool Integration #Benchmarking

2025년 10월 2일

[논문리뷰] Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution

arXiv에 게시된 'Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution' 논문에 대한 자세한 리뷰입니다.

#Review #LLM Agents #Parallel Execution #DAG-based Planning #Tool Orchestration #Web Agents #Reasoning Framework #Efficiency

2025년 10월 2일

[논문리뷰] Eliciting Secret Knowledge from Language Models

Neel Nanda이 arXiv에 게시한 'Eliciting Secret Knowledge from Language Models' 논문에 대한 자세한 리뷰입니다.

#Review #Language Models #Secret Elicitation #Mechanistic Interpretability #Black-box Methods #White-box Methods #AI Auditing #Model Organisms #Prefill Attacks

2025년 10월 2일

[논문리뷰] DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

arXiv에 게시된 'DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search' 논문에 대한 자세한 리뷰입니다.

#Review #Reinforcement Learning with Verifiable Rewards (RLVR)#Monte Carlo Tree Search (MCTS)#Mathematical Reasoning #Large Language Models (LLMs)#Systematic Exploration #Adaptive Training #Tree-GRPO

2025년 10월 2일

[논문리뷰] CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs

Hengyi Cai이 arXiv에 게시한 'CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs' 논문에 대한 자세한 리뷰입니다.

#Review #Curriculum Learning #LLMs #Reasoning #Gradient Optimization #Reinforcement Learning #Bayesian Inference #Sample Efficiency

2025년 10월 2일

[논문리뷰] Code2Video: A Code-centric Paradigm for Educational Video Generation

arXiv에 게시된 'Code2Video: A Code-centric Paradigm for Educational Video Generation' 논문에 대한 자세한 리뷰입니다.

#Review #Educational Video Generation #Code-centric AI #Multi-agent Framework #Manim #Vision-Language Models #Knowledge Transfer #Code Generation #MMMC Benchmark

2025년 10월 2일

[논문리뷰] BroRL: Scaling Reinforcement Learning via Broadened Exploration

arXiv에 게시된 'BroRL: Scaling Reinforcement Learning via Broadened Exploration' 논문에 대한 자세한 리뷰입니다.

#Review #Reinforcement Learning #LLMs #Scaling Laws #Exploration #Rollout Size #Verifiable Rewards #PPO #Mass Balance Equation

2025년 10월 2일