본문으로 건너뛰기

#Large Language Models (LLMs)

163개의 포스트

[논문리뷰] MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants

댓글 수 로딩 중

[논문리뷰] AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

댓글 수 로딩 중

[논문리뷰] InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem

댓글 수 로딩 중

[논문리뷰] BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation

댓글 수 로딩 중

[논문리뷰] HalluCitation Matters: Revealing the Impact of Hallucinated References with 300 Hallucinated Papers in ACL Conferences

댓글 수 로딩 중

[논문리뷰] Facilitating Proactive and Reactive Guidance for Decision Making on the Web: A Design Probe with WebSeek

댓글 수 로딩 중

[논문리뷰] Controllable Memory Usage: Balancing Anchoring and Innovation in Long-Term Human-Agent Interaction

댓글 수 로딩 중

[논문리뷰] X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework

댓글 수 로딩 중

[논문리뷰] Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs

댓글 수 로딩 중

[논문리뷰] SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories

댓글 수 로딩 중

[논문리뷰] SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

댓글 수 로딩 중

[논문리뷰] Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates

댓글 수 로딩 중

[논문리뷰] MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning

댓글 수 로딩 중

[논문리뷰] MisSynth: Improving MISSCI Logical Fallacies Classification with Synthetic Data

댓글 수 로딩 중

[논문리뷰] Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

댓글 수 로딩 중

[논문리뷰] ComProScanner: A multi-agent based framework for composition-property structured data extraction from scientific literature

댓글 수 로딩 중

[논문리뷰] AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library

댓글 수 로딩 중

[논문리뷰] Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models

댓글 수 로딩 중

[논문리뷰] ERGO: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models

댓글 수 로딩 중

[논문리뷰] MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-Augmented Generation Systems

댓글 수 로딩 중

[논문리뷰] Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization

댓글 수 로딩 중

[논문리뷰] DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

댓글 수 로딩 중

[논문리뷰] Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

댓글 수 로딩 중

[논문리뷰] Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels

댓글 수 로딩 중

[논문리뷰] MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook

댓글 수 로딩 중

[논문리뷰] The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward

댓글 수 로딩 중

[논문리뷰] Attributes as Textual Genes: Leveraging LLMs as Genetic Algorithm Simulators for Conditional Synthetic Data Generation

댓글 수 로딩 중

[논문리뷰] Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD

댓글 수 로딩 중

[논문리뷰] OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models

댓글 수 로딩 중