#Multimodal Large Language Model

5개의 포스트

[논문리뷰] GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

본 연구는 Multimodal Large Language Model(MLLM) 기반 게임 에이전트의 체계적인 평가를 가로막는 표준화된 인터페이스의 부재와 검증 방식의 한계를 극복하기 위해 수행되었습니다.

#Review #Multimodal Large Language Model #Game Agent #Benchmark #Standardized Evaluation #Computer-Use Agent #Semantic Action Parsing #Outcome-based Evaluation

2026년 4월 15일

[논문리뷰] InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

통합 멀티모달 모델(UMM)이 강한 의미론적 이해와 강력한 생성 능력 사이에서 겪는 본질적인 상충 관계를 해결하고자 합니다. 이 논문은 InternVL-U 라는 경량의 4B 매개변수 UMM을 제안하여, 이해, 추론, 생성, 편집 능력을 하나의 통합 프레임워크 내에서 민주화하는 것을 목표로 합니다.

#Review #Unified Multimodal Models #Multimodal Large Language Model #Image Generation #Image Editing #Chain-of-Thought #Data Synthesis #Low-parameter Models

2026년 3월 10일

[논문리뷰] HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices

현재 멀티모달 대규모 언어 모델(MLLM)이 가진 높은 연산 및 메모리 요구사항으로 인한 온디바이스 배포의 어려움을 해결하는 것을 목표로 합니다.

#Review #Multimodal Large Language Model #Edge AI #Efficient Inference #Visual Resolution Compressor #Dual Consistency Learning #Vision Transformer #Quantization #Low-Latency

2025년 12월 17일

[논문리뷰] DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry

본 논문은 기존 MLLM이 치과 영상 데이터의 미세한 시각적 특징을 포착하고 정밀한 진단을 위한 충분한 추론 능력을 갖추지 못하는 한계를 해결하고자 합니다. 이를 위해 치과 분야에 특화된 DentalGPT 를 개발하여 자동화된 구강 건강 관리에서 멀티모달 복합 추론 능력을 향상시키는 것을 목표로 합니다.

#Review #Multimodal Large Language Model #Dental Imaging #Complex Reasoning #Domain Adaptation #Reinforcement Learning #Medical VQA #Dental Healthcare

2025년 12월 14일

[논문리뷰] HunyuanOCR Technical Report

기존 파이프라인 기반 OCR 시스템의 에러 전파 및 높은 유지보수 비용 문제를 해결하고, 대규모 일반 VLM의 높은 컴퓨팅 자원 요구사항 과 OCR 특화 VLM의 불완전한 엔드투엔드 최적화 한계를 극복하는 것을 목표로 합니다.

#Review #Optical Character Recognition #Multimodal Large Language Model #End-to-End Learning #Reinforcement Learning #Document Parsing #Information Extraction #Text Spotting

2025년 11월 25일