#Exploration Efficiency

4개의 포스트

[논문리뷰] Language-based Trial and Error Falls Behind in the Era of Experience

Large Language Models (LLMs)가 언어 기반이 아닌 새로운 환경(예: 상징적, 공간적 태스크)에서 낮은 성능을 보이는 문제를 해결하는 것이 목표입니다.

#Review #Large Language Models #Reinforcement Learning #Exploration Efficiency #Sub-Scale Collaboration #Out-of-Distribution Tasks #Agentic AI #Supervised Fine-Tuning

2026년 1월 29일

[논문리뷰] Controlled Self-Evolution for Algorithmic Code Optimization

논문은 기존 LLM 기반 코드 생성 모델 이 기능적으로는 정확하지만 비효율적인 코드를 생성하며, 현재의 자가 진화(self-evolution) 방식이 낮은 탐색 효율성으로 인해 제한된 예산 내에서 최적의 알고리즘적 코드를 찾지 못하는 문제를 해결하고자 합니다.

#Review #Self-Evolution #Code Optimization #Large Language Models #Genetic Algorithms #Hierarchical Memory #Algorithmic Code Generation #Exploration Efficiency

2026년 1월 14일

[논문리뷰] Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding

대규모 언어 모델(LLM)의 추론 능력 강화를 위한 기존 확인 가능한 보상 강화 학습(RLVR) 방법론이 겪는 탐색 비효율성 문제를 해결하는 것이 목표입니다.

#Review #RLVR #LLM Reasoning #Adaptive Learning #Hint Scaffolding #Item Response Theory #Exploration Efficiency #Problem Difficulty #Policy Optimization

2025년 9월 10일

[논문리뷰] ParallelMuse: Agentic Parallel Thinking for Deep Information Seeking

본 논문은 심층 정보 탐색(Deep Information Seeking, IS) 에이전트의 기존 병렬 사고 방식이 지닌 비효율성(반복적인 롤아웃)과 장기 추론 궤적 통합의 어려움(제한된 컨텍스트)을 해결하는 것을 목표로 합니다.

#Review #Agentic AI #Parallel Thinking #Information Seeking #LLM Agents #Context Window Optimization #Exploration Efficiency #Reasoning Aggregation #Tool Use

2025년 10월 29일