#Memory Benchmarking

2개의 포스트

[논문리뷰] MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments

본 논문은 기존 모바일 GUI 에이전트 벤치마크가 메모리 능력을 체계적으로 평가하지 못하고 메모리 관련 태스크 비중이 5.2-11.8%에 불과 하며 교차 세션 학습 평가가 부재하다는 문제를 제기합니다.

#Review #Mobile GUI Agents #Memory Benchmarking #Short-Term Memory #Long-Term Memory #LLM-as-Judge #Dynamic Environments #Evaluation Metrics #Task Automation

2026년 2월 8일

[논문리뷰] KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

이 논문은 기존의 LLM 메모리 벤치마크가 단순한 정보 검색에 치우쳐 '인물 이해(Person Understanding)'를 직접적으로 측정하지 못하는 문제를 해결하고자 합니다.

#Review #Person Understanding #Lifelong Digital Companions #Memory Benchmarking #Autobiographical Narratives #Cognitive Stream #Flashback Handling #LLM Evaluation #Hierarchical Reasoning

2026년 1월 13일