Review

[논문리뷰] DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains

본 논문은 전문 도메인에서 담화 수준 번역의 평가가 불충분하다는 문제를 해결하고자 합니다. 기존 벤치마크들이 문장 수준의 정확성과 유창성에 초점을 맞춰 담화 일관성, 엄격한 용어 정밀도, 전문가 스타일 표준을 평가하는 데 한계가 있음을 지적합니다.

#Review #Discourse-Level Translation #Expert Domains #Benchmarking #LLM Evaluation #Reference-Free Metric #Chinese-English Translation #Contextual Coherence #Domain-Specific Terminology

2025년 11월 16일

[논문리뷰] CATS-V2V: A Real-World Vehicle-to-Vehicle Cooperative Perception Dataset with Complex Adverse Traffic Scenarios

본 논문은 기존 V2V 협력 인지 데이터셋이 주로 일반적인 교통 시나리오에 초점을 맞추어 Complex Adverse Traffic Scenarios (CATS) 하에서의 협력 인지 연구에 한계가 있음을 지적합니다.

#Review #Cooperative Perception #Vehicle-to-Vehicle (V2V)#Autonomous Driving #Dataset #Adverse Traffic Scenarios #Sensor Fusion #Temporal Alignment #3D Bounding Box Annotation

2025년 11월 16일

[논문리뷰] A Meta-Heuristic Load Balancer for Cloud Computing Systems

클라우드 시스템에서 노드 과부하를 방지하고 시스템 안정성을 유지하며 최소 비용으로 서비스를 할당하는 전략을 개발하는 것이 목표입니다. 특히, 다양한 유형의 자원 활용 및 서비스 마이그레이션 비용을 고려한 추상적인 클라우드 자원 모델을 제시하고 이를 효율적으로 관리할 로드 밸런서의 성능을 평가하고자 합니다.

#Review #Cloud Computing #Load Balancing #Meta-Heuristic #Genetic Algorithm #Simulated Annealing #Tabu Search #Resource Management #Service Migration

2025년 11월 16일

[논문리뷰] UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

본 논문은 전문화된 비디오 AI 모델과 실제 비디오 워크플로우 간의 격차를 해소하여 차세대 비디오 일반 인공지능을 구현하는 것을 목표로 합니다.

#Review #Video Agents #Multi-modal AI #Plan-Act Architecture #Tool-Use #Long-horizon Reasoning #Open-source #Video Generation #Video Understanding

2025년 11월 13일

[논문리뷰] Superpositional Gradient Descent: Harnessing Quantum Principles for Model Training

본 연구는 대규모 언어 모델(LLM) 훈련 시 고차원, 비볼록(non-convex) 손실 함수 공간에서 기존 경사 하강법(Gradient Descent) 의 한계(지역 최적해 수렴, 느린 수렴 속도)를 극복하고자 합니다.

#Review #Quantum Computing #Optimization #Machine Learning #Transformers #Gradient Descent #Superposition #Large Language Models #Hybrid Quantum-Classical

2025년 11월 13일

[논문리뷰] SliderEdit: Continuous Image Editing with Fine-Grained Instruction Control

기존 instruction-based image editing 모델들이 고정된 강도로 편집을 적용하여 개별 편집에 대한 정밀하고 연속적인 제어가 불가능하다는 한계를 해결하고자 합니다.

#Review #Image Editing #Continuous Control #Fine-Grained Control #Instruction-based #Low-Rank Adaptation #Disentanglement #Generative Models

2025년 11월 13일

[논문리뷰] Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following

본 논문은 복잡하고 다중 턴, 시스템 프롬프트 기반의 지시를 따르는 LLM의 능력을 향상시키는 것을 목표로 합니다. 특히, 이러한 고급 Instruction Following (IF) 기능을 평가하고 훈련하기 위한 고품질의 인간 주석 벤치마크와 신뢰할 수 있고 해석 가능한 보상 신호가 부족하다는 문제를 해결하고자 합니다.

#Review #LLM #Instruction Following #Reinforcement Learning #Rubric-based Evaluation #Benchmarking #Reward Shaping #Rubric Verifier #AdvancedIF

2025년 11월 13일

[논문리뷰] ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents

본 연구는 개방형 질문에 대한 심층 연구(Deep Research, DR) 에이전트의 평가가 응답의 길이, 다양성, 동적 정보원 의존성 등으로 인해 어렵다는 문제를 제기합니다.

#Review #Deep Research Agents #LLM Evaluation #Benchmark #Rubrics #Multi-step Reasoning #Cross-document Synthesis #AI Performance #Task Complexity

2025년 11월 13일

[논문리뷰] One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models

본 논문은 기존 확산 모델이 고해상도 이미지를 직접 샘플링할 때 발생하는 속도 저하, 비용 증가, 아티팩트 발생 문제를 해결하고, 사후 픽셀 공간 초해상도(SR) 방식의 추가 지연 및 아티팩트를 극복하는 것을 목표로 합니다.

#Review #Latent Diffusion Models #Super-Resolution #Upscaling Adapter #Image Generation #Latent Space #Multi-scale Learning #Cross-VAE

2025년 11월 13일

[논문리뷰] Music Flamingo: Scaling Music Understanding in Audio Language Models

이 논문은 기존 오디오-언어 모델(ALM)의 표면적인 인식 수준을 넘어 인간과 유사한 심층적인 음악 이해 및 추론 능력을 갖춘 모델을 개발하는 것을 목표로 합니다. 특히 고품질 음악 데이터 부족과 기존 모델의 제한적인 음악 이해 능력을 극복하고자 합니다.

#Review #Audio Language Models #Music Understanding #Chain-of-Thought #Reinforcement Learning #Data Curation #Multimodal AI #Music Information Retrieval

2025년 11월 13일

[논문리뷰] MuSc-V2: Zero-Shot Multimodal Industrial Anomaly Classification and Segmentation with Mutual Scoring of Unlabeled Samples

이 논문은 훈련 데이터셋의 라벨링 없이 산업 제품의 2D 이미지와 3D 포인트 클라우드에서 제로샷(zero-shot) 이상 분류(AC) 및 세분화(AS) 를 수행하는 것을 목표로 합니다.

#Review #Zero-Shot Learning #Anomaly Detection #Anomaly Segmentation #Multimodal #Industrial Inspection #Mutual Scoring #Unsupervised Learning #Transformer

2025년 11월 13일

[논문리뷰] MM-CRITIC: A Holistic Evaluation of Large Multimodal Models as Multimodal Critique

본 논문은 대규모 멀티모달 모델(LMMs) 의 멀티모달 비판 능력에 대한 포괄적이고 신뢰성 있는 평가의 필요성을 제기하며, LMMs의 자가 개선 및 신뢰성 향상을 목표로 합니다. 기존 벤치마크의 이진 선호도 예측 한계를 넘어, 기본, 교정, 비교의 세 가지 비판 차원에서 MM-CRITIC 벤치마크를 제안합니다.

#Review #LMMs #Multimodal Critique #Benchmark #Evaluation #Reward Model #GPT-4o #Scaling Law

2025년 11월 13일

[논문리뷰] Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO

이 논문은 Large Language Models (LLMs) 의 후처리 훈련에 사용되는 분산형 Group Relative Policy Optimization (GRPO) 시스템의 보안 취약점을 탐구합니다.

#Review #Decentralized RL #GRPO #LLM Post-training #Adversarial Attacks #Data Poisoning #Defense Mechanisms #In-context Attack #Out-of-context Attack

2025년 11월 13일

[논문리뷰] Depth Anything 3: Recovering the Visual Space from Any Views

논문은 단일 이미지, 다중 뷰 또는 비디오 스트림과 같은 임의의 시각 입력 으로부터 공간적으로 일관된 3D 기하 정보를 복구 하는 것을 목표로 합니다.

#Review #Depth Estimation #Multi-view Geometry #Transformer Architecture #Teacher-Student Learning #Pose Estimation #3D Reconstruction #Novel View Synthesis #Visual Space Recovery

2025년 11월 13일

[논문리뷰] CC30k: A Citation Contexts Dataset for Reproducibility-Oriented Sentiment Analysis

본 논문은 AI/ML 논문 내 인용 문맥에서 재현성(reproducibility) 지향 감성을 식별하기 위한 CC30k 데이터셋 을 구축하는 것을 목표로 합니다. 이는 계산적 재현성 연구를 위한 자원 부족 문제를 해결하고, 대규모 언어 모델(LLM)이 재현성 관련 감성을 효과적으로 예측하도록 훈련하는 기반을 마련합니다.

#Review #Citation Contexts #Reproducibility #Sentiment Analysis #Large Language Models #Crowdsourcing #Dataset #Machine Learning #Science of Science

2025년 11월 13일

[논문리뷰] Black-Box On-Policy Distillation of Large Language Models

본 논문은 내부 로짓이나 파라미터에 접근할 수 없는 블랙박스(black-box) 대규모 언어 모델(LLM) 을 대상으로, 학생 모델이 교사 모델의 텍스트 출력만을 학습하는 온-정책(on-policy) 증류(distillation) 방법을 개발하는 것을 목표로 합니다.

#Review #Large Language Models (LLMs)#Knowledge Distillation (KD)#Black-box Distillation #Generative Adversarial Networks (GANs)#On-policy Learning #Reinforcement Learning #Minimax Game #Model Compression

2025년 11월 13일

[논문리뷰] Benchmarking Diversity in Image Generation via Attribute-Conditional Human Evaluation

현재 텍스트-투-이미지(T2I) 모델이 종종 동질적인 이미지를 생성하며 다양성이 부족하다는 문제를 해결하고자 합니다.

#Review #Text-to-Image Models #Diversity Evaluation #Human Evaluation #Attribute-Conditional #Vendi Score #Generative AI #Benchmarking

2025년 11월 13일

[논문리뷰] AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models

본 논문은 3D 환경에서 자연어 명령을 기반으로 물체의 상호작용 가능한 요소(affordance elements)를 식별하고, 해당 요소의 3D 마스크 , 동작 유형 , 동작 축 방향 을 포함하는 구조화된 트립렛을 예측하는 Fine-grained 3D Embodied Reasoning 이라는 새로운 태스크를 제안합니다.

#Review #3D Embodied Reasoning #Multimodal Large Language Models (MLLMs)#Chain-of-Thought (CoT)#Affordance Grounding #Motion Estimation #View Synthesis #Active Perception

2025년 11월 13일

[논문리뷰] WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation

본 논문은 기존 Vision-Language Models (VLMs) 기반의 UI-to-Code 접근 방식이 정적인 HTML/CSS 코드만 생성하고 GUI 상호작용을 지원하지 못하는 한계를 극복하고자 합니다.

#Review #UI-to-Code #Vision-Language Models #Agentic Framework #Interactive UI #Web Automation #Code Generation #UI Verification #Supervised Fine-Tuning

2025년 11월 12일

[논문리뷰] WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

VLA 모델이 로봇 조작에 큰 잠재력을 보이지만, 전문가 데모에 의존하여 실패로부터 학습하고 스스로 수정하는 능력이 제한적이라는 문제를 해결하고자 합니다.

#Review #Vision-Language-Action (VLA)#Reinforcement Learning (RL)#Model-based RL #World Models #Policy Optimization #Robotics #Sample Efficiency #Self-correction

2025년 11월 12일