최신 포스트

[논문리뷰] LATO.2: Factorized 3D Mesh Generation with Vertex and Topology Flow

기존의 3D 메시 생성 모델들은 정점의 공간적 위치와 위상적 연결성을 하나의 공유된 latent space에서 동시에 학습하려는 경향이 있어, 통계적으로 이질적인 두 신호를 효율적으로 처리하는 데 한계가 있다.

#Review #3D Mesh Generation #Flow Matching #Factorized Representation #Vertex Flow #Topology Flow #Latent Representation

2026년 7월 13일

[논문리뷰] EgoSteer: A Full-Stack System Towards Steerable Dexterous Manipulation from Egocentric Videos

본 논문은 일반적인 로봇 조작 모델이 실시간 Steerability를 확보하지 못하고, 특정 로봇 환경에 국한되는 한계를 해결하고자 한다.

#Review #Steerable Dexterous Manipulation #VLA Models #Egocentric Videos #World Model #Robot Learning #DAgger

2026년 7월 13일

[논문리뷰] CtrlVTON: Controllable Virtual Try-On via Visual-Instance-Prompt Segmentation

본 논문은 기존 가상 착장(VTO) 시스템이 의류의 스타일, 크기, 공간적 배치와 같은 사용자 수준의 미세한 제어를 지원하지 못한다는 한계를 해결하고자 한다.

#Review #Virtual Try-On #Image Editing #Visual-Instance-Prompt Segmentation #Segmentation Masks #Diffusion Transformer #Controllability

2026년 7월 13일

[논문리뷰] AdvancedMathBench: A Benchmark Suite for Advanced Mathematical Proof Generation and Verification

본 논문은 기존의 수학 벤치마크가 고등 수학 및 연구 수준의 증명 능력을 평가하기에는 범위와 입도가 부족하다는 문제를 해결하고자 합니다.

#Review #Advanced Mathematics #Proof Generation #Process Verification #LLM-as-Judge #Mathematical Reasoning #Benchmark #Automatic Verification Pipeline

2026년 7월 13일

[논문리뷰] ABot-N1: Toward a General Visual Language Navigation Foundation Model

본 논문은 기존의 단일 통합 정책(Monolithic Policy)이 가진 navigation의 한계점과 확장성 문제를 해결하기 위해 ABot-N1을 제안합니다 .

#Review #Visual Language Navigation #Foundation Model #Slow-Fast Architecture #Chain-of-Thought #Pixel Goal #Embodied AI #Cross-Task Generalization

2026년 7월 13일

[논문리뷰] ABot-AgentOS: A General Robotic Agent OS with Lifelong Multi-modal Memory

본 연구는 고수준의 semantic reasoning을 물리적인 다단계 실행(multi-step physical execution)으로 연결하는 과정에서 발생하는 'reasoning-execution gap'을 해결하고자 합니다 .

#Review #Embodied Intelligence #Agent Operating System #Multi-modal Memory #Lifelong Self-Evolution #Robot Learning #Hierarchical Reasoning #EmbodiedWorldBench

2026년 7월 13일

[논문리뷰] 4D Human-Scene Reconstruction from Low-Overlap Captures

본 논문은 소수의 low-overlap 카메라만으로도 고품질의 4D 인간-장면 복원(Human-Scene Reconstruction)을 구현하는 문제를 해결합니다.

#Review #4D Reconstruction #Gaussian Splatting #Sparse-view #Video Diffusion #Human-Scene Decomposition #Multi-view Pose Estimation

2026년 7월 13일

[논문리뷰] Video Generation Models are General-Purpose Vision Learners

본 논문은 컴퓨터 비전 분야가 여전히 개별 과제에 특화된 모델(Specialized Model) 단계에 머물러 있는 문제를 해결하고자 합니다 .

#Review #Video Generation #Foundation Models #Generalist Vision Intelligence #Diffusion Models #Spatiotemporal Priors #Perception Task-Agnostic #Synthetic Data

2026년 7월 12일

[논문리뷰] VaseMuseum: Digital Intelligent Museum for Ancient Greek Pottery

본 연구는 고대 그리스 도자기와 같은 문화유산 분야에서 VLM 기반의 디지털 박물관 가이드 시스템이 직면한 신뢰성 부족 문제를 해결하고자 합니다. 기존 모델들은 파편화되거나 불완전한 정보를 바탕으로 과도하게 확신에 찬 답변(Hallucination)을 생성하거나, 검증되지 않은 외부 참조를 인용하는 한계가 있습니다.

#Review #Vision-Language Models #Digital Museum #Cultural Heritage #Multimodal Agent #Retrieval-Augmented Generation #Inference-time Reliability Control #GRPO

2026년 7월 12일

[논문리뷰] Trust Region Policy Distillation

본 논문은 기존 On-Policy Distillation (OPD) 방식이 가진 구조적 불안정성과 낮은 샘플 효율성 문제를 해결하기 위해 고안되었습니다.

#Review #On-Policy Distillation #Trust Region #Policy Gradient #Proximal Teacher #Gradient Variance #Mathematical Reasoning #Post-training

2026년 7월 12일

[논문리뷰] Towards Mechanistically Understanding Why Memorized Knowledge Fails to Generalize in Large Language Model Finetuning

본 논문은 LLM이 새로운 지식을 성공적으로 기억함에도 불구하고, 이를 활용한 downstream 추론 작업에서는 낮은 성능을 보이는 문제를 다룬다 . 기존 연구들은 주로 모델의 파라미터 업데이트나 지식 편집에 집중했으나, 지식 저장과 추론 간의 인과적 단절을 메커니즘적으로 설명하는 데에는 한계가 있었다.

#Review #LLM Finetuning #Knowledge Generalization #Mechanistic Interpretability #Self-Patching #Knowing-Using Gap #Knowledge-Circuit Misalignment

2026년 7월 12일

[논문리뷰] Self-Guided Test-Time Training for Long-Context LLMs

본 논문은 긴 문맥을 처리하는 LLM의 성능이 문맥의 길이에 따라 저하되는 현상이 단순히 문맥을 모두 담지 못해서가 아니라, 질문에 필요한 핵심 증거를 식별하고 활용하는 능력이 부족하기 때문임을 지적합니다.

#Review #Long-Context LLMs #Test-Time Training (TTT)#Evidence Selection #Parameter Adaptation #Context Reasoning #Signal-to-Noise Ratio

2026년 7월 12일

[논문리뷰] Scalable Visual Pretraining for Language Intelligence

본 연구는 대규모 언어 모델이 문서의 시각적 요소를 평문으로 변환할 때 발생하는 정보 손실을 해결하기 위해 시각적 문서 자체를 직접 학습하는 VP를 제안한다.

#Review #Visual Pretraining #Foundation Models #Multimodal Learning #Scientific Reasoning #Representation Alignment #Autoregressive Training

2026년 7월 12일

[논문리뷰] Phone Segmentation and Recognition through Phonological Activation Mapping

본 논문은 현대의 음성학적 분석 모델이 분절(segmentation)과 인식(recognition)을 별도의 복잡한 모델로 다루며, 다량의 라벨링 데이터와 계산 자원을 요구하는 문제를 해결하고자 한다.

#Review #Self-supervised Speech Models #Phonological Activation Mapping #Phone Segmentation #Phone Recognition #Gradient-descent-free #Sample-efficiency #Generalization

2026년 7월 12일

[논문리뷰] PanoWorld: Real-World Panoramic Generation

본 논문은 기존 파노라마 월드 모델들이 복잡한 야외 환경에서 공간적·시간적 일관성 및 물리적 정확성을 유지하는 데 한계가 있다는 문제 의식에서 출발한다.

#Review #Panoramic Generation #World Model #Diffusion Model #Rotation Equivariance #Dense Panoramic Ray-Conditioning #Geometry-aware Memory #World360

2026년 7월 12일

[논문리뷰] MedPMC: A Systematic Framework for Scaling High-Fidelity Medical Multimodal Data for Foundation Models

본 논문은 의료 AI 모델의 성능을 제한하는 핵심 원인인 대규모 고품질 의료 멀티모달 데이터의 부족 문제를 해결하고자 합니다.

#Review #Multimodal Foundation Models #Medical Data Curation #PubMed Central #Image-Text Pairs #Vision-Language Models #Clinical Transfer Validation #High-Fidelity Pipeline

2026년 7월 12일

[논문리뷰] Long-Horizon-Terminal-Bench: Testing the Limits of Agents on Long-Horizon Terminal Tasks with Dense Reward-Based Grading

본 논문은 기존의 에이전트 벤치마크가 지나치게 단기적인 작업에 치중되어 있으며, 평가 방식이 최종 결과에만 의존한다는 한계를 해결하고자 한다.

#Review #Autonomous Agents #Long-Horizon Tasks #Terminal Benchmarks #Dense Reward-Based Grading #Subtask-based Evaluation #Failure Analysis #Agentic Workflow

2026년 7월 12일

[논문리뷰] KronQ: LLM Quantization via Kronecker-Factored Hessian

본 연구는 기존의 PTQ 방법들이 입력 activation 통계량(HX)만을 활용하여 출력 채널 간의 비대칭적인 민감도를 간과한다는 점을 핵심 문제로 지적합니다.

#Review #Post-Training Quantization #Kronecker-Factored Hessian #Gradient Covariance #Mixed-Precision Allocation #Bidirectional Incoherence Processing #LLM

2026년 7월 12일

[논문리뷰] From RGB Generation to Dense Field Readout: Pixel-Space Dense Prediction with Text-to-Image Models

본 논문은 대규모 T2I 모델의 우수한 사전 학습 지식을 활용하면서도, 불필요한 generative output interface를 제거하는 최적의 dense prediction 구조를 정의하고자 한다 .

#Review #Dense Prediction #Text-to-Image Models #Field Readout #LoRA #Vision Transformers #RGB-native

2026년 7월 12일

[논문리뷰] Flow-ERD: Agent-type Aware Flow Matching with Entropy-Regularized Distillation for Diverse Traffic Simulation

본 논문은 자율주행 시뮬레이션에서 realism과 diversity라는 두 핵심 요소가 서로 상충되는 현상을 해결하고자 합니다 .

#Review #Multi-Agent Simulation #Flow Matching #Entropy-Regularized Distillation #Autonomous Driving #Traffic Simulation #Realism-Diversity Pareto

2026년 7월 12일