Review

[논문리뷰] DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion

본 논문은 Diffusion Transformer (DiT) 모델을 재훈련 없이 초고해상도 이미지(예: 16M+ 픽셀 )를 생성할 수 있도록 하는 것을 목표로 합니다.

#Review #Diffusion Models #Transformer Architecture #Positional Encoding #High-Resolution Image Generation #Extrapolation #Dynamic Adaptation #Training-Free

2025년 10월 24일

[논문리뷰] Diff-XYZ: A Benchmark for Evaluating Diff Understanding

본 논문은 대규모 언어 모델(LLM)이 코드 diff를 얼마나 효과적으로 이해하고 처리하는지 평가하기 위한 Diff-XYZ 벤치마크를 제안합니다.

#Review #Diff Understanding #Code Diff #Benchmark #LLMs #Code Editing #Software Engineering #Unified Diff Format #Search-Replace

2025년 10월 24일

[논문리뷰] Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence

본 논문은 멀티모달 대규모 언어 모델(MLLMs)이 순수 텍스트 추론이나 부정확한 증거 지역화로 인해 종종 발생시키는 근거 없는/환각적 결론의 문제를 해결하고, 다단계 비디오 추론 능력을 강화하는 것을 목표로 합니다.

#Review #Video Reasoning #Multimodal Large Language Models (MLLMs)#Reinforcement Learning (RLVR)#Evidence Grounding #Multi-step Reasoning #Frame Retrieval #Dataset Construction #Progressive Learning

2025년 10월 24일

[논문리뷰] ComProScanner: A multi-agent based framework for composition-property structured data extraction from scientific literature

본 논문은 과학 문헌에서 화학 조성-물성 구조 데이터와 합성 정보를 추출하기 위한 자동화되고 사용자 친화적인 멀티 에이전트 기반 프레임워크 를 개발하는 것을 목표로 합니다.

#Review #Multi-agent Systems #Large Language Models (LLMs)#Information Extraction #Scientific Literature #Materials Science #Data Curation #Piezoelectric Materials #RAG (Retrieval-Augmented Generation)

2025년 10월 24일

[논문리뷰] AlphaFlow: Understanding and Improving MeanFlow Models

본 논문은 MeanFlow 모델의 성공 원리를 심층적으로 분석하고, MeanFlow 훈련 목표 내에 존재하는 trajectory flow matching 및 trajectory consistency 두 구성 요소 간의 음의 상관관계 로 인한 최적화 충돌 및 수렴 지연 문제를 해결하는 것을 목표로 합니다.

#Review #Generative Models #Flow Matching #Consistency Models #MeanFlow #Curriculum Learning #Few-Step Generation #Image Generation

2025년 10월 24일

[논문리뷰] AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

본 논문은 대규모 언어 모델(LLM) 추론 속도 향상을 위한 Speculative Decoding (SD) 과정에서 드래프트 모델과 타겟 모델 간의 불일치 문제를 해결하는 것을 목표로 합니다.

#Review #Speculative Decoding #Knowledge Distillation #LLM Inference #Model Acceleration #Token Filtering #Draft Model #Acceptance Rate

2025년 10월 24일

[논문리뷰] ARGenSeg: Image Segmentation with Autoregressive Image Generation Model

본 논문은 기존 MLLM 기반 분할 방법론이 픽셀 수준의 미세한 시각적 디테일을 포착하는 데 한계가 있음을 지적하며, Autoregressive Generation 기반의 새로운 패러다임인 ARGenSeg 를 제안합니다.

#Review #Image Segmentation #Autoregressive Generation #Multimodal Large Language Models (MLLMs)#Visual Understanding #VQ-VAE #Multi-scale Prediction #Referring Expression Segmentation #Image Generation

2025년 10월 24일

[논문리뷰] olmOCR 2: Unit Test Rewards for Document OCR

본 논문은 인쇄된 문서를 깨끗하고 자연스럽게 정렬된 일반 텍스트로 변환하는 OCR 시스템인 OLMOCR 2 를 제안합니다. 특히, 강화 학습(RL) 과 검증 가능한 보상(RLVR) 을 활용하여 수학 공식, 테이블 파싱, 다단 레이아웃과 같은 복잡한 문서 구조 처리 성능을 대폭 개선하는 것을 목표로 합니다.

#Review #Document OCR #Vision Language Model #Reinforcement Learning #Unit Tests #Synthetic Data Generation #RLVR #Document Parsing #State-of-the-Art OCR

2025년 10월 23일

[논문리뷰] VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos

본 연구는 GUI(Graphical User Interface) 에이전트 훈련에 필요한 대규모의 수동 주석된 상호작용 데이터 확보의 어려움을 해결하고자 합니다.

#Review #GUI Agents #Video Pretraining #Inverse Dynamics #Action Recognition #Computer Use Automation #Data Synthesis #Multimodal Learning

2025년 10월 23일

[논문리뷰] Unified Reinforcement and Imitation Learning for Vision-Language Models

본 논문은 대규모 Vision-Language Models (VLMs) 의 비효율성을 해결하기 위해, 리소스가 제한된 환경에서도 강력하고 경량화된 VLM을 구축하는 효율적인 훈련 알고리즘 Unified Reinforcement and Imitation Learning (RIL) 을 제안합니다.

#Review #Vision-Language Models #Reinforcement Learning #Imitation Learning #Model Distillation #Lightweight VLMs #LLM-as-a-Judge #Multimodal Learning

2025년 10월 23일

[논문리뷰] RIR-Mega: a large-scale simulated room impulse response dataset for machine learning and room acoustics modeling

본 논문은 반향음 제거, 강건한 음성 인식, 음원 위치 추정, 음향 환경 추정 등 다양한 AI/ML 태스크를 위한 대규모 시뮬레이션된 Room Impulse Response (RIR) 데이터셋의 부족 문제를 해결하는 것을 목표로 합니다.

#Review #Room Impulse Response #Dataset #Room Acoustics #Machine Learning #Dereverberation #Speech Recognition #Simulation #Hugging Face

2025년 10월 23일

[논문리뷰] ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge

본 논문은 기존 LLM 평가 벤치마크가 쉬운 검증 태스크에 국한되어 있다는 한계를 극복하고, 전문가 수준의 지식 을 요구하는 복잡한 실세계 다중 도메인 태스크 에 대한 LLM 성능을 평가하기 위한 ProfBench 벤치마크를 제안합니다.

#Review #LLM Evaluation #Rubric-based Benchmark #Professional Knowledge #Multi-domain Tasks #LLM-Judge Bias Mitigation #Cost Reduction #Reasoning Assessment #Open-weight Models

2025년 10월 23일

[논문리뷰] Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

본 논문은 대규모, 고품질, 공개적으로 접근 가능한 텍스트 기반 이미지 편집 데이터셋의 부족으로 인해 제한되었던 연구 발전을 해소하는 것을 목표로 합니다. 실제 이미지를 기반으로 한 포괄적이고 다양한 데이터셋을 제공하여 차세대 텍스트 기반 이미지 편집 모델의 훈련 및 벤치마킹을 위한 견고한 기반을 구축하고자 합니다.

#Review #Text-Guided Image Editing #Large-Scale Dataset #Multimodal Models #Dataset Curation #Quality Control #Prompt Engineering #Preference Learning #Multi-Turn Editing

2025년 10월 23일

[논문리뷰] OmniNWM: Omniscient Driving Navigation World Models

본 논문은 기존 자율주행 월드 모델이 가진 제한된 상태 모달리티, 짧은 시퀀스 길이, 부정확한 액션 제어, 보상 인식 부족 등의 문제를 해결하여, 자율주행을 위한 종합적이고 전지적인(omniscient) 파노라마 내비게이션 월드 모델 을 개발하는 것을 목표로 합니다.

#Review #Autonomous Driving #World Models #Multi-modal Generation #3D Occupancy #Plücker Ray-maps #Action Control #Dense Rewards #Long-term Forecasting

2025년 10월 23일

[논문리뷰] Machine Text Detectors are Membership Inference Attacks

본 연구는 멤버십 추론 공격(MIAs)과 기계 생성 텍스트 감지(MGTD)라는 두 가지 관련 연구 분야가 독립적으로 연구되어 발생하는 비효율성을 해결하고자 합니다.

#Review #Membership Inference Attacks #Machine-Generated Text Detection #Transferability #Likelihood Ratio Test #Large Language Models #Zero-Shot Detection #Model Security #AI Safety

2025년 10월 23일

[논문리뷰] MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for Large Multimodal Models

본 연구는 대규모 멀티모달 모델(LMM)이 시간에 따라 변화하는 사실적 지식을 정확하게 이해하는 데 어려움을 겪는 문제를 해결하고자 합니다.

#Review #Large Multimodal Models (LMMs)#Time-Sensitive Knowledge #Temporal Reasoning #Knowledge Editing #Multimodal Benchmarking #Temporal Awareness #Dynamic Knowledge

2025년 10월 23일

[논문리뷰] LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

대규모 언어 모델(LLMs)이 긴 컨텍스트에 대한 고급 추론 능력을 갖추도록 하는 것이 목표입니다. 기존 RL 방법론들이 주로 짧은 컨텍스트 추론에 초점을 맞추고 있으며, 특히 높은 난이도의 긴 컨텍스트 RL 데이터가 부족하다는 문제를 해결하고자 합니다.

#Review #Reinforcement Learning #Long Context Reasoning #Large Language Models #Multi-hop QA #Data Synthesis #Retrieval-Augmented Generation #Chain-of-Thought

2025년 10월 23일

[논문리뷰] Learning from the Best, Differently: A Diversity-Driven Rethinking on Data Selection

대규모 언어 모델(LLMs) 사전 훈련 시, 기존의 점수 기반 데이터 선택 방식이 다양성 부족으로 인해 성능 저하를 초래하는 문제를 해결하고자 합니다.

#Review #Data Selection #Large Language Models (LLMs)#Data Diversity #Data Quality #Principal Component Analysis (PCA)#Orthogonal Dimensions #Pre-training

2025년 10월 23일

[논문리뷰] Language Models are Injective and Hence Invertible

논문은 비선형 활성화 함수와 정규화 등으로 인해 Transformer 언어 모델이 정보를 손실하고, 입력 텍스트를 숨겨진 표현에서 정확하게 복구하기 어렵다는 기존의 인식을 비판합니다.

#Review #Language Models #Injectivity #Invertibility #Transformer #Representation Learning #Exact Recovery #SIPIT Algorithm #Real Analysis

2025년 10월 23일

[논문리뷰] KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints

대규모 멀티모달 모델(LMM)의 고정적이고 제한적인 지식 문제를 해결하고, 새로운 지식 주입 시 발생하는 치명적 망각(Catastrophic Forgetting)을 완화하는 것을 목표로 합니다.

#Review #Knowledge Injection #Large Multimodal Models #Catastrophic Forgetting #Data Augmentation #Parameter-Efficient Fine-Tuning #Null Space #Continual Learning

2025년 10월 23일