[논문리뷰] SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM AgentsHuayu Sha이 arXiv에 게시한 'SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Tool-use#Scientific Reasoning#Benchmarking#Interactive Environment#Data Synthesis#Error Recovery#Multi-step Tasks2026년 2월 15일댓글 수 로딩 중
[논문리뷰] Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse SupervisionarXiv에 게시된 'Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision' 논문에 대한 자세한 리뷰입니다.#Review#LLM#Scientific Reasoning#Co-evolution#Reinforcement Learning#Sparse Supervision#Geometric Consensus#Self-Play#Verifier2026년 2월 12일댓글 수 로딩 중
[논문리뷰] P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics OlympiadsarXiv에 게시된 'P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models#Reinforcement Learning#Curriculum Learning#Physics Olympiads#Scientific Reasoning#Agentic AI#Multimodal AI#Physics2026년 2월 10일댓글 수 로딩 중
[논문리뷰] BABE: Biology Arena BEnchmarkarXiv에 게시된 'BABE: Biology Arena BEnchmark' 논문에 대한 자세한 리뷰입니다.#Review#Biology Benchmark#Large Language Models#Experimental Reasoning#Causal Inference#Cross-Scale Inference#Multimodal AI#Scientific Reasoning#Research Agents2026년 2월 5일댓글 수 로딩 중
[논문리뷰] Innovator-VL: A Multimodal Large Language Model for Scientific DiscoveryarXiv에 게시된 'Innovator-VL: A Multimodal Large Language Model for Scientific Discovery' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal LLM#Scientific AI#Data Efficiency#Reinforcement Learning#Vision-Language Model#Scientific Reasoning#Reproducible AI2026년 1월 28일댓글 수 로딩 중
[논문리뷰] Beyond Static Tools: Test-Time Tool Evolution for Scientific ReasoningarXiv에 게시된 'Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Test-Time Tool Evolution#Scientific Reasoning#Large Language Models#Dynamic Tool Synthesis#Tool Adaptation#AI for Science#Autonomous Agents2026년 1월 15일댓글 수 로딩 중
[논문리뷰] A^3-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor ActivationKai He이 arXiv에 게시한 'A^3-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation' 논문에 대한 자세한 리뷰입니다.#Review#Scientific Reasoning#Memory-Driven AI#Benchmarking#Large Language Models (LLMs)#Anchor-Attractor Activation#Episodic Memory#Knowledge Retrieval2026년 1월 14일댓글 수 로딩 중
[논문리뷰] ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific ReasoningYuqiang Li이 arXiv에 게시한 'ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Benchmark#LLMs#Scientific Reasoning#Multidisciplinary#AI4S#Data Contamination#Evaluation#LRM-as-Judge2025년 11월 18일댓글 수 로딩 중
[논문리뷰] MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language ModelBo Yan이 arXiv에 게시한 'MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language Model' 논문에 대한 자세한 리뷰입니다.#Review#Microscopy VQA#Multimodal LLM#Weak Supervision#Graph Neural Networks#Dataset Generation#Biomedical Imaging#Scientific Reasoning#Cross-Modal Consistency2025년 11월 17일댓글 수 로딩 중
[논문리뷰] Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward MechanismShuang Gu이 arXiv에 게시한 'Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism' 논문에 대한 자세한 리뷰입니다.#Review#Scientific Reasoning#Bio-experimental Protocol Generation#LLM#Structured Reward#SciRecipe Dataset#Sketch-and-Fill#Reinforcement Learning#Thoth2025년 10월 22일댓글 수 로딩 중
[논문리뷰] ExpVid: A Benchmark for Experiment Video Understanding & ReasoningarXiv에 게시된 'ExpVid: A Benchmark for Experiment Video Understanding & Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Experiment Video Understanding#Multimodal Large Language Models (MLLMs)#Scientific Reasoning#Benchmark#Wet-Lab Experiments#Procedural Understanding#Fine-grained Perception#Video QA2025년 10월 15일댓글 수 로딩 중
[논문리뷰] SciReasoner: Laying the Scientific Reasoning Ground Across DisciplinesJiabei Xiao이 arXiv에 게시한 'SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines' 논문에 대한 자세한 리뷰입니다.#Review#Scientific Reasoning#Foundation Models#Multi-modal Learning#Cross-domain Generalization#Chain-of-Thought#Reinforcement Learning#Scientific Discovery#Molecular Design2025년 9월 26일댓글 수 로딩 중
[논문리뷰] Demystifying Scientific Problem-Solving in LLMs by Probing Knowledge and ReasoningArman Cohan이 arXiv에 게시한 'Demystifying Scientific Problem-Solving in LLMs by Probing Knowledge and Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Large Language Models#Scientific Reasoning#Knowledge Retrieval#Reasoning Probing#Benchmarks#Chain-of-Thought#Fine-tuning2025년 8월 27일댓글 수 로딩 중
[논문리뷰] CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter PhysicsDongchen Huang이 arXiv에 게시한 'CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics' 논문에 대한 자세한 리뷰입니다.#Review#Large Language Models#Condensed Matter Physics#Benchmark#Scientific Reasoning#Evaluation Metric#Expression Edit Distance#Problem Solving2025년 8월 27일댓글 수 로딩 중
[논문리뷰] T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image GenerationXihui Liu이 arXiv에 게시한 'T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation' 논문에 대한 자세한 리뷰입니다.#Review#Text-to-Image Generation#Reasoning Benchmark#Idiom Interpretation#Textual Image Design#Entity Reasoning#Scientific Reasoning#Multimodal LLM Evaluation2025년 8월 26일댓글 수 로딩 중