[논문리뷰] Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior TracesYunfei Zhang이 arXiv에 게시한 'Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces' 논문에 대한 자세한 리뷰입니다.#Review#Large Language Models#User Simulation#Human Behavior Modeling#Long-horizon#Cross-scenario#Benchmark2026년 4월 9일댓글 수 로딩 중
[논문리뷰] KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent EvaluationGuocheng Shao이 arXiv에 게시한 'KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation' 논문에 대한 자세한 리뷰입니다.#Review#Mobile Agent#Personalization#Proactive Assistance#Interactive Benchmarking#User Simulation#GUI Automation2026년 4월 9일댓글 수 로딩 중
[논문리뷰] Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive AssistantsYinfei Yang이 arXiv에 게시한 'Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants' 논문에 대한 자세한 리뷰입니다.#Review#Proactive Assistant#User Simulation#Finite State Machine#Stackelberg POMDP#Multi-app Orchestration#Asymmetric Evaluation2026년 4월 1일댓글 수 로딩 중
[논문리뷰] MeepleLM: A Virtual Playtester Simulating Diverse Subjective ExperiencesJianwen Sun이 arXiv에 게시한 'MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences' 논문에 대한 자세한 리뷰입니다.#Review#Large Language Models#Board Games#Virtual Playtester#User Simulation#Persona Modeling#MDA Framework#Human-AI Collaboration#Critique Generation2026년 1월 25일댓글 수 로딩 중
[논문리뷰] User-Oriented Multi-Turn Dialogue Generation with Tool Use at scalearXiv에 게시된 'User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale' 논문에 대한 자세한 리뷰입니다.#Review#Multi-Turn Dialogue Generation#Tool Use#Autonomous Agents#Large Reasoning Models#User Simulation#Synthetic Data Generation#SQL-based Tools#Agentic Benchmarks2026년 1월 13일댓글 수 로딩 중