#UI Grounding

4개의 포스트

[논문리뷰] GPA: Learning GUI Process Automation from Demonstrations

본 논문은 기존 RPA의 취약성과 대규모 비전 언어 모델(VLM) 기반 GUI 에이전트의 비결정론적 한계를 극복하기 위해 GPA 를 제안합니다. 전통적인 RPA는 DOM 요소나 고정 좌표에 의존하므로 사소한 레이아웃 변화에도 스크립트가 파손되는 문제가 발생합니다.

#Review #GUI Process Automation #Robotic Process Automation #Sequential Monte Carlo #UI Grounding #Demonstration-based Learning #Computer-use Agent

2026년 4월 2일

[논문리뷰] FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection

본 논문은 고해상도 UI 스크린샷에서 발생하는 수천 개의 시각 토큰으로 인한 Vision-Language Models (VLMs) 의 UI Grounding 작업의 높은 계산 오버헤드와 주의 분산 문제를 해결하는 것을 목표로 합니다.

#Review #UI Grounding #Visual Token Reduction #Position-Preserving #Vision-Language Models (VLMs)#Saliency Scoring #Computational Efficiency #Human-Computer Interaction

2026년 1월 14일

[논문리뷰] Grounding Computer Use Agents on Human Demonstrations

이 연구는 컴퓨터 사용 에이전트(CUA)의 핵심 과제인 'grounding'의 신뢰성을 높이는 것을 목표로 합니다.

#Review #Computer Use Agents #UI Grounding #Desktop Applications #Human Demonstrations #Large-Scale Dataset #Vision-Language Models #Supervised Fine-tuning #Reinforcement Learning

2025년 11월 11일

[논문리뷰] UI-Venus Technical Report: Building High-performance UI Agents with RFT

본 논문은 스크린샷만을 입력으로 받는 고성능 UI 에이전트인 UI-Venus 를 구축하는 것을 목표로 합니다. 기존 지도 미세 조정(SFT) 방식의 한계인 일반화 능력 부족과 높은 데이터 수집 비용을 극복하고, 복잡한 UI 환경에서의 탐색 및 추론 능력을 향상시키는 데 중점을 둡니다.

#Review #UI Agent #MLLM #RFT #UI Grounding #UI Navigation #GRPO #Data Cleaning #Self-Evolving Trajectory

2025년 8월 15일