#Verification

8개의 포스트

[논문리뷰] The Verification Horizon: No Silver Bullet for Coding Agent Rewards

본 논문은 최신 Coding Agent의 성능이 발전함에 따라, 생성된 코드의 정확성을 신뢰할 수 있게 검증하는 문제가 생성 자체보다 훨씬 어려워진 현실을 지적합니다.

#Review #Coding Agents #Reward Design #Reward Hacking #Alignment #Verification #Systematic Evaluation

2026년 6월 25일

[논문리뷰] Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models

본 논문은 LLM의 유창함 이면에 존재하는 사실적 부정확성 및 환각(Hallucination) 문제를 해결하기 위해 DAVinCI 프레임워크를 제안한다.

#Review #Attribution #Verification #Dual Framework #Hallucination #Confidence Calibration #Natural Language Inference

2026년 4월 23일

[SGLang] Tree Search & Verification: 트리 기반 추측과 검증

SGLang의 트리 탐색과 검증 알고리즘을 분석한다. 후보 토큰을 트리 구조로 구성하여 병렬 검증하는 방식, 트리 구축 전략, acceptance 판정을 코드와 함께 살펴본다.

#sglang #Tree Search #Verification #Token Tree #Acceptance

2026년 4월 13일

[논문리뷰] MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

최근 Large Language Models (LLMs)는 유창한 텍스트 생성 및 광범위한 질문 답변 능력에서 상당한 발전을 이루었지만, scientific analysis, financial reasoning, open-ended research와 같은 많은 real-world 문제들은 단순한 conversational ability를 넘어선다.

#Review #Research Agents #Long-Horizon Reasoning #Verification #Agentic LLM #Multi-Step Problem Solving #Reinforcement Learning

2026년 3월 17일

[논문리뷰] Thinking with Drafting: Optical Decompression via Logical Reconstruction

본 논문은 멀티모달 대규모 언어 모델(MLLM)이 시각적 입력에 대한 복잡한 추론 작업에서 겪는 '정밀도 역설'을 해결하는 것을 목표로 합니다.

#Review #Multimodal Reasoning #Visual Algebra #Domain-Specific Language #Optical Decompression #Logical Reconstruction #Bar Model #MLLMs #Verification

2026년 2월 12일

[논문리뷰] LawThinker: A Deep Research Legal Agent in Dynamic Environments

법률 추론 태스크에서 정확한 최종 결과뿐만 아니라, 절차적으로도 적합한 추론 과정 을 보장하는 것을 목표로 합니다.

#Review #Legal Reasoning #AI Agent #Large Language Models #Verification #Knowledge Management #Dynamic Environments #Procedural Compliance #Tool Use

2026년 2월 12일

[논문리뷰] DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems

LLM 기반 다중 에이전트 시스템의 복잡한 디버깅 문제를 해결하는 것을 목표로 합니다.

#Review #LLM Multi-Agent Systems #Debugging #Intervention-Driven #Failure Attribution #Automated Debugging #Verification #AI Agents #Reliability

2025년 12월 8일

[논문리뷰] SkillFactory: Self-Distillation For Learning Cognitive Behaviors

본 논문은 기반 언어 모델(LLM)이 처음부터 갖추지 못한 인지적 스킬(예: 검증, 백트래킹, 재시도) 을 외부의 더 강력한 모델 없이 스스로 학습하도록 하는 SkillFactory 프레임워크를 제안합니다. 이를 통해 모델이 복잡한 추론 태스크에서 더 잘 일반화하고 견고성을 갖추도록 하는 것을 목표로 합니다.

#Review #Self-Distillation #Cognitive Skills #Reinforcement Learning #Supervised Fine-Tuning #Language Models #Reasoning #Verification #Retrying

2025년 12월 3일