최신 포스트

[논문리뷰] Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

본 논문은 대규모 언어 모델(LLM)이 다양한 도덕적 스펙트럼, 특히 악역 캐릭터를 얼마나 설득력 있게 연기할 수 있는지 체계적으로 조사하는 것을 목표로 합니다.

#Review #LLM #Role-playing #Safety Alignment #Villain #Persona Simulation #Moral Alignment #Benchmark #Character Fidelity

2025년 11월 9일

[논문리뷰] Real-Time Reasoning Agents in Evolving Environments

본 논문은 실시간으로 변화하는 환경에서 대규모 언어 모델(LLM) 기반 에이전트가 논리적이고 시의적절한 판단을 내리는 실시간 추론(Real-Time Reasoning) 이라는 근본적인 과제를 해결하는 것을 목표로 합니다.

#Review #Real-time Reasoning #LLM Agents #Dynamic Environments #Dual-System AI #AgileThinker #Reactive Planning #Cognitive Load #Time Pressure

2025년 11월 9일

[논문리뷰] Jailbreaking in the Haystack

본 연구는 장문(long-context) 언어 모델(LMs)의 확장된 컨텍스트 창이 가지는 안전성 함의를 분석하고, 심지어 양성(benign) 컨텍스트 내에서도 안전 기능이 어떻게 저하되는지 탐구하는 것을 목표로 합니다.

#Review #Jailbreaking #LLM Safety #Long-Context Models #Positional Bias #Attack Success Rate (ASR)#Prompt Engineering #Compute Efficiency #AI Agents

2025년 11월 9일

[논문리뷰] HAFixAgent: History-Aware Automated Program Repair Agent

본 연구는 기존 LLM 기반 프로그램 자동 수정(APR) 시스템이 로컬 코드 스냅샷에만 의존하여 복잡한 다중-hunk 버그 수정 시 저장소 이력 정보 를 간과하는 문제를 해결하고자 합니다.

#Review #Automated Program Repair #AI Agent #Large Language Models #Repository Mining #Historical Context #Bug Fixing #Defects4J

2025년 11월 9일

[논문리뷰] Dense Motion Captioning

본 논문은 3D 휴먼 모션 시퀀스 내에서 의미 있는 액션을 시간적으로 정확히 감지하고, 해당 액션에 대한 상세한 캡션을 생성하는 새로운 태스크인 Dense Motion Captioning (DMC) 을 제안합니다.

#Review #3D Human Motion #Dense Captioning #Large Language Models #Motion Understanding #Temporal Localization #Human-Language Datasets #Motion Generation

2025년 11월 9일

[논문리뷰] DeepEyesV2: Toward Agentic Multimodal Model

본 논문은 텍스트와 이미지를 단순히 이해하는 것을 넘어, 코드 실행 환경 및 웹 검색 과 같은 외부 도구를 능동적으로 호출하고 이러한 도구 작업을 추론 과정에 원활하게 통합할 수 있는 Agentic 멀티모달 모델 을 구축하는 것을 목표로 합니다.

#Review #Agentic AI #Multimodal Models #Tool Use #Reinforcement Learning #Supervised Fine-tuning #Multimodal Reasoning #Web Search #Code Execution

2025년 11월 9일

[논문리뷰] CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?

본 연구는 대규모 언어 모델(LLM)의 자연어 기반 신뢰도 표현(verbalized confidence) 의 정확한 보정(calibration)을 개선하는 것을 목표로 합니다.

#Review #LLM Calibration #Confidence Calibration #Uncertainty Estimation #Critique Learning #Supervised Fine-Tuning #Natural Language Processing #Self-Critique

2025년 11월 9일

[논문리뷰] V-Thinker: Interactive Thinking with Images

본 논문은 대규모 멀티모달 모델(LMM)이 긴 추론 과정에서 시각적 정보로부터 벗어나 환각을 일으키는 문제를 해결하고자 합니다.

#Review #Large Multimodal Models #Interactive Reasoning #Vision-Centric Thinking #Reinforcement Learning #Data Synthesis #Visual Tools #Curriculum Learning #Multimodal AI

2025년 11월 9일

[논문리뷰] Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

기존의 'Thinking with Text' 및 'Thinking with Images' 패러다임이 가진 정적 이미지의 한계와 모달리티 분리 문제를 극복하고자 합니다.

#Review #Video Generation #Multimodal Reasoning #Temporal Understanding #Spatial Reasoning #Foundation Models #AI Benchmarking #In-Context Learning #Self-Consistency

2025년 11월 9일

[논문리뷰] The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms

이 논문은 기존 연구에서 다루지 않았던 트랜스포머 아키텍처 의 핵심 구성 요소인 Multi-Head Attention (MHA) 메커니즘 에 대한 Strong Lottery Ticket Hypothesis (SLTH) 를 이론적으로 확립하는 것을 목표로 합니다.

#Review #Strong Lottery Ticket Hypothesis #Multi-Head Attention #Transformers #Neural Network Pruning #Overparameterization #Weight Initialization #Model Compression

2025년 11월 9일

[논문리뷰] Scaling Agent Learning via Experience Synthesis

대규모 언어 모델(LLM) 에이전트의 강화 학습(RL) 훈련이 직면한 높은 비용, 제한된 태스크 다양성, 불안정한 보상 신호, 복잡한 인프라와 같은 문제들을 해결하는 것을 목표로 합니다. 현실 환경 상호작용의 필요성을 줄이면서도 효과적이고 확장 가능한 RL 훈련을 가능하게 하는 통합 프레임워크를 제안합니다.

#Review #Reinforcement Learning #LLM Agents #Experience Synthesis #World Models #Curriculum Learning #Sim-to-Real Transfer #Web Agents

2025년 11월 9일

[논문리뷰] SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding

멀티모달 대규모 언어 모델(MLLM)이 비디오에서 시공간 추론을 수행하는 데 어려움을 겪는 문제를 해결하는 것을 목표로 합니다.

#Review #Spatial Reasoning #Video Understanding #Simulated Data #Instruction Tuning #Multimodal LLMs #Sim-to-Real Transfer #AI2-THOR

2025년 11월 9일

[논문리뷰] SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning

MLLM(Multimodal Large Language Models)의 추론 능력 향상을 목표로 합니다.

#Review #Multimodal Large Language Models #Reinforcement Learning #Post-training #Reasoning #Dual-Reward System #Thinking Reward #Judging Reward #Hallucination Reduction

2025년 11월 9일

[논문리뷰] RDMA Point-to-Point Communication for LLM Systems

LLM 시스템에서 필요한 유연한 지점 간 통신(point-to-point communication) 을 제공하고, 기존 RDMA 구현이 특정 NIC(Network Interface Controller) 에 종속되어 발생하는 벤더 종속성(vendor lock-in) 및 하드웨어 이식성(portability) 문제를 해결하는 것을 목표로 합니다.

#Review #RDMA #LLM #Point-to-Point Communication #Disaggregated Inference #MoE Routing #KvCache #AWS EFA #NVIDIA ConnectX

2025년 11월 9일

[논문리뷰] NVIDIA Nemotron Nano V2 VL

Nemotron Nano V2 VL은 강력한 실세계 문서 이해 , 긴 비디오 이해 , 그리고 추론 태스크 를 위해 설계된 최신 비전-언어 모델입니다.

#Review #Vision-Language Model #Hybrid Architecture #Mamba-Transformer #Long-Context Understanding #Quantization #Efficient Inference #Document AI #Video AI

2025년 11월 9일

[논문리뷰] Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots

본 연구는 기존 로봇 제어 시스템의 모듈 분리(decoupled modules)로 인한 지연된 반응과 비일관적인 행동 문제를 해결하고자 합니다.

#Review #Humanoid Robot #Reinforcement Learning #RoboCup #Soccer Skills #Vision-Driven Control #Adversarial Motion Priors #Sim-to-Real #Perception-Action Coordination

2025년 11월 9일

[논문리뷰] How to Evaluate Speech Translation with Source-Aware Neural MT Metrics

자동 음성-텍스트 번역(ST) 시스템 평가에서 텍스트 소스 가 없는 한계로 인해 소스 인식 신경 기계 번역(MT) 지표 를 적용하기 어렵습니다.

#Review #Speech Translation #Neural MT Metrics #Source-Aware Evaluation #Automatic Speech Recognition (ASR)#Back-Translation (BT)#Cross-lingual Re-segmentation #COMET #MetricX

2025년 11월 9일

[논문리뷰] GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents

본 논문은 데스크톱 컴퓨터 사용 에이전트(CUAs) 연구의 세 가지 주요 격차(실세계 CUA 태스크 부족, 자동화된 데이터 수집 및 주석 파이프라인 부재, 통합 벤치마크 부족)를 해결하는 것을 목표로 합니다.

#Review #Computer-Using Agents #GUI Grounding #Screen Parsing #Action Prediction #Desktop Automation #Dataset #Benchmark #Multimodal Learning #LLM-augmented Data

2025년 11월 9일

[논문리뷰] EVTAR: End-to-End Try on with Additional Unpaired Visual Reference

본 연구는 기존 가상 착용(virtual try-on) 모델들이 agnostic person images , human pose , densepose 등 복잡한 입력에 의존하고 레퍼런스 이미지 지원이 부족하여 현실성이 떨어지는 문제를 해결하고자 합니다.

#Review #Virtual Try-on #Diffusion Models #End-to-End Learning #Reference Images #Unpaired Data #Flow Matching #Transformer Architecture #Generative AI

2025년 11월 9일

[논문리뷰] Contamination Detection for VLMs using Multi-Modal Semantic Perturbation

본 연구는 Vision-Language Models(VLMs)에서 데이터 오염(test-set leakage) 으로 인한 성능 과대평가 문제를 해결하기 위한 신뢰성, 실용성, 일관성 있는 탐지 방법론 을 개발하는 것을 목표로 합니다.

#Review #VLM Contamination #Test-set Leakage #Multi-modal Perturbation #Generative Models #Generalization #Model Memorization #VLMs

2025년 11월 9일