#Human-Computer Interaction

22개의 포스트

[논문리뷰] iOSWorld: A Benchmark for Personally Intelligent Phone Agents

본 논문은 기존 모바일 에이전트 벤치마크가 사용자의 지속적인 데이터와 상호 연관된 개인적 문맥을 결여하고 있다는 점을 지적하며, '개인 지능(Personal Intelligence)'을 갖춘 에이전트 평가의 필요성을 제기합니다.

#Review #iOSWorld #Mobile Agents #Personal Intelligence #Human-Computer Interaction #LLM-as-a-Judge #Multi-app Reasoning #Simulator Benchmark

2026년 6월 17일

[논문리뷰] WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces

기존의 컴퓨터 에이전트 평가 벤치마크는 주로 단일 브라우저 기반 작업에 국한되어 있어, 실제 데스크톱 환경의 복잡한 Long-Horizon 작업 수행 능력을 평가하는 데 한계가 있습니다.

#Review #Computer-Use Agent #Long-Horizon #Real-World Benchmark #Hybrid Interface #Human-Computer Interaction #Agent Evaluation

2026년 6월 11일

[논문리뷰] DrawMotion: Generating 3D Human Motions by Freehand Drawing

본 논문은 텍스트 기술만으로는 사용자가 의도한 복잡하고 세밀한 3D 모션을 정밀하게 제어하기 어렵다는 점을 해결하고자 합니다. 기존 연구들은 복잡한 텍스트 묘사에 의존하거나, 추가적인 모델링을 통해 모션을 수정하지만 이는 사용자에게 상당한 시간적 비용과 입력 부담을 초래합니다.

#Review #Diffusion Models #Motion Generation #Human-Computer Interaction #Freehand Drawing #Multi-Condition Fusion #Intermediate Feature Guidance #Neural Collapse

2026년 5월 20일

[논문리뷰] What if AI systems weren't chatbots?

본 논문은 인공지능이 대화형 챗봇 인터페이스로 지나치게 빠르게 수렴하고 있다는 점을 지적하며, 이 패러다임이 가져오는 구조적인 사회적, 경제적, 환경적 폐해를 분석한다.

#Review #Conversational AI #Chatbots #User Agency #Sociotechnical Systems #Human-Computer Interaction #AI Governance #Environmental Justice

2026년 5월 10일

[논문리뷰] VenusBench-Mobile: A Challenging and User-Centric Benchmark for Mobile GUI Agents with Capability Diagnostics

본 논문은 사용자 의도 중심의 10가지 범주, 149개의 작업, 그리고 80개의 환경 변이를 포함하는 VenusBench-Mobile을 제안한다. 에이전트의 실패 원인을 세밀하게 분석하기 위해 PUDAM 역량 분류 체계를 도입하여 각 작업의 난이도를 4단계(Level 1-4)로 구분하였다.

#Review #Mobile GUI Agents #User-Centric Benchmark #Capability Diagnostics #Human-Computer Interaction #Performance Evaluation #Robustness

2026년 4월 8일

[논문리뷰] PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents

현재 명시적 지시에만 반응하는 GUI 에이전트 의 한계를 극복하고, 사용자의 암묵적인 의도를 연속적인 시각 입력(스크린샷)으로부터 예측 하여 시기적절한 추천을 제공하는 능동형(Proactive) AI 비서 를 개발하는 것을 목표로 합니다.

#Review #Proactive Agents #GUI Automation #Intent Recommendation #Multimodal LLMs #Benchmark #Memory-aware Framework #Human-Computer Interaction

2026년 3월 9일

[논문리뷰] Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

본 논문은 기존 비디오 월드 모델이 가진 제한적인 제어 신호(텍스트 또는 키보드) 의 한계를 극복하고, 사용자의 머리 및 손 움직임 추적 데이터 를 활용하여 사람 중심의 인터랙티브 가상 환경 을 생성하는 것을 목표로 합니다.

#Review #Video Generation #Extended Reality (XR)#Diffusion Models #Human-Computer Interaction #Hand Pose Estimation #Camera Control #World Simulation #Interactive AI

2026년 2월 22일

[논문리뷰] Continual GUI Agents

본 연구는 GUI(Graphical User Interface) 에이전트가 새로운 도메인이나 해상도 변화와 같은 동적인 디지털 환경(데이터 분포의 변화)에서 성능 저하 없이 지속적으로 학습(continual learning) 할 수 있도록 하는 새로운 태스크인 Continual GUI Agents 를 정의합니다.

#Review #Continual Learning #GUI Agents #Reinforcement Learning #Grounding #Domain Adaptation #Resolution Adaptation #Reward Shaping #Human-Computer Interaction

2026년 2월 1일

[논문리뷰] FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection

본 논문은 고해상도 UI 스크린샷에서 발생하는 수천 개의 시각 토큰으로 인한 Vision-Language Models (VLMs) 의 UI Grounding 작업의 높은 계산 오버헤드와 주의 분산 문제를 해결하는 것을 목표로 합니다.

#Review #UI Grounding #Visual Token Reduction #Position-Preserving #Vision-Language Models (VLMs)#Saliency Scoring #Computational Efficiency #Human-Computer Interaction

2026년 1월 14일

[논문리뷰] ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands

기존 GUI 에이전트들이 주로 이산적인 클릭 예측에 의존하여 연속적이고 자유로운 형태의 드래그(예: 그림 그리기, 캡차 풀이)와 같이 즉각적인 시각적 인지와 조정이 필요한 복잡한 GUI 상호작용을 수행하기 어렵다는 문제를 해결합니다.

#Review #GUI Automation #Flow-based Generative Models #Continuous Control #Vision-Language Models #Human-Computer Interaction #ScreenDrag Benchmark #Dexterous Manipulation

2026년 1월 13일

[논문리뷰] DreamOmni3: Scribble-based Editing and Generation

본 논문은 통합 생성 및 편집 모델에서 텍스트 프롬프트의 한계, 즉 사용자의 의도된 편집 위치 및 미세한 시각적 세부 사항을 정확히 포착하지 못하는 문제를 해결하고자 합니다.

#Review #Image Editing #Image Generation #Scribble-based Control #Multimodal AI #Diffusion Models #Data Synthesis #Human-Computer Interaction #Instruction-based Editing

2025년 12월 30일

[논문리뷰] Step-GUI Technical Report

논문은 GUI 자동화 분야에서 고품질 훈련 데이터를 효율적이고 신뢰성 있게 확보하는 근본적인 문제를 해결하고자 합니다. 또한, 이종 기기 간의 표준화된 인터페이스를 구축하여 사용자 개인 정보를 보호하고, 실제 일상적인 사용 패턴에 기반한 평가 벤치마크를 통해 에이전트의 실용성을 검증하는 것을 목표로 합니다.

#Review #GUI Automation #Self-Evolving Pipeline #Reinforcement Learning #Multimodal LLMs #Privacy-Preserving AI #Human-Computer Interaction #Model Context Protocol #Benchmarking

2025년 12월 17일

[논문리뷰] StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

본 연구는 대규모 언어 모델(MLLMs)이 스트리밍 비디오 환경에서 인간의 시선(gaze) 신호를 활용하여 시간적 추론 및 선제적 이해를 얼마나 효과적으로 수행하는지 평가하는 것을 목표로 합니다.

#Review #Streaming Video Understanding #Gaze-Guided AI #Temporal Reasoning #Proactive AI #MLLMs #Eye Tracking #Benchmark #Human-Computer Interaction

2025년 12월 1일

[논문리뷰] Computer-Use Agents as Judges for Generative User Interface

현재 인간 중심적으로 설계된 GUI 가 Computer-Use Agent (CUA)의 비효율적인 태스크 수행을 강제하는 문제를 해결하는 것이 목표입니다.

#Review #Computer-Use Agents #Generative UI #AI-assisted Design #Human-Computer Interaction #LLM #AUI-Gym #Feedback Loop #Agent-centric Design

2025년 11월 24일

[논문리뷰] Aligning Generative Music AI with Human Preferences: Methods and Challenges

본 논문은 생성형 음악 AI 시스템이 계산적 최적화와 인간의 미적 감각 사이의 근본적인 격차로 인해 발생하는 문제를 해결하고, 인간의 미묘한 음악적 선호도에 더욱 잘 부합하도록 정렬하는 방법을 모색합니다.

#Review #Generative Music AI #Preference Alignment #Reinforcement Learning from Human Feedback (RLHF)#Direct Preference Optimization (DPO)#Inference-Time Optimization #Music Generation #Human-Computer Interaction

2025년 11월 19일

[논문리뷰] PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits

본 논문은 인간 행동 특성 분석을 위한 멀티모달 데이터셋의 부족 문제 를 해결하고, LLM(Large Language Model)을 통해 추론된 행동 특성을 시각 및 전기적 속성과 결합하여 체계적인 교차 모달 및 인과 관계 연구를 가능하게 하는 것을 목표로 합니다.

#Review #Multimodal Dataset #LLM Inference #Behavioral Traits #Causal Representation Learning #Big Five #Multimodal AI #Causal Discovery #Human-Computer Interaction

2025년 9월 16일

[논문리뷰] 'Does the cafe entrance look accessible? Where is the door?' Towards Geospatial AI Agents for Visual Inquiries

본 논문은 기존 지도 시스템이 구조화된 GIS 데이터에 의존하여 시각적-공간적 질의(예: '카페 입구가 접근 가능한가요?', '문은 어디에 있고 어떻게 생겼나요?')에 답변하는 데 한계가 있음을 지적합니다.

#Review #Geospatial AI #Multimodal AI Agents #Visual Question Answering #Accessibility #Street View Imagery #Spatial Reasoning #Human-Computer Interaction

2025년 8월 22일

[논문리뷰] InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

본 논문은 MLLM(Multimodal Large Language Model) 기반 GUI 에이전트 의 핵심 과제인 자연어 지시문 GUI Grounding 에서 의미론적 정렬(Semantic Alignment) 의 비효율적인 탐색 문제 해결을 목표로 합니다.

#Review #GUI Grounding #MLLMs #Reinforcement Learning #Policy Optimization #Exploration Strategy #Semantic Alignment #Adaptive Exploration Reward #Human-Computer Interaction

2025년 8월 11일

[논문리뷰] Sel3DCraft: Interactive Visual Prompts for User-Friendly Text-to-3D Generation

텍스트-3D(T23D) 생성 과정에서 발생하는 '블라인드 시행착오' 프롬프트 문제와 그로 인한 예측 불가능한 결과 및 비효율적인 워크플로우를 해결하는 것이 주 목표입니다.

#Review #Text-to-3D Generation #Prompt Engineering #Visual Analytics #Human-Computer Interaction #Multi-modal Large Language Models #3D Model Evaluation

2025년 8월 7일

[논문리뷰] C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations

본 연구는 현존하는 음성 대화 모델(SDM)들이 인간의 복잡한 대화, 특히 음운론적/의미론적 모호성 과 맥락 의존성 (생략, 공참조, 다중 턴 상호작용)을 얼마나 효과적으로 이해하고 모방하는지에 대한 종합적인 벤치마킹의 부족을 해결하고자 합니다.

#Review #Spoken Dialogue Models #Bilingual Benchmark #Complex Conversations #Ambiguity Resolution #Context Understanding #LLM Evaluation #Human-Computer Interaction

2025년 8월 2일

[논문리뷰] Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games

논문은 OpenAI의 ChatGPT Atlas 에이전트 가 웹 환경에서 상호작용하는 능력을, 특히 웹 기반 게임을 통해 평가하는 것을 목표로 합니다.

#Review #Web Agent #Large Language Models #Multimodal AI #Browser Automation #Game AI #ChatGPT Atlas #Performance Evaluation #Human-Computer Interaction

2025년 10월 31일

[논문리뷰] Paper2Web: Let's Make Your Paper Alive!

이 논문은 학술 논문을 레이아웃 인식적이고 상호작용적이며 멀티미디어 가 풍부한 웹 페이지로 변환하는 PAPER2WEB 이라는 새로운 태스크를 제안합니다.

#Review #Academic Webpage Generation #Multi-Agent Systems #Large Language Models #Model Context Protocol #Interactive Content #Multimedia Dissemination #Evaluation Benchmark #Human-Computer Interaction

2025년 10월 20일