최신 포스트

[논문리뷰] COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami

본 논문은 현대의 생성형 AI가 물리적인 제약 조건과 인간의 미적 취향을 동시에 만족시키는 물리적 예술 작품을 설계하는 데 한계가 있다는 점을 지적합니다.

#Review #Computational Origami #Flat-Foldable #Reinforcement Learning #Vision-Language Model #Neuro-symbolic Pipeline #Box Pleating #Crease Pattern

2026년 6월 25일

[sglang] SGLang의 Qwen3.5 성능 극대화: Fused QK GemmaRMSNorm + RoPE 커널 최적화 분석

Qwen3.5 모델의 어텐션 레이어 연산을 Triton 커널로 통합하여 메모리 접근을 줄이고 추론 성능을 최대 9.4% 향상시킨 최적화 기법을 소개합니다.

#SGLang #Triton #LLM #Optimization #Qwen3.5

2026년 6월 25일

[논문리뷰] When Lower Privileges Suffice: Investigating Over-Privileged Tool Selection in LLM Agents

본 논문은 LLM 에이전트가 작업 수행 시 불필요하게 높은 권한의 도구를 선택하는 Over-privileged Tool Selection의 심각성과 그 기저의 행동적 원인을 규명합니다 .

#Review #LLM Agents #Tool Selection Bias #Least Privilege #Privilege-Aware Post-Training #Agent Safety #ToolPrivBench

2026년 6월 24일

[논문리뷰] What Intermediate Layers Know: Detecting Jailbreaks from Entropy Dynamics

본 논문은 대규모 언어 모델(LLMs)의 안전성을 위협하는 Jailbreak 공격을 모델 내부의 활성화 상태(internal representations) 분석을 통해 효율적으로 탐지하고자 합니다. 기존 연구들은 주로 프롬프트 수준의 필터링이나 외부 분류기에 의존하여 모델 내부의 의미적 변화를 간과하는 한계가 있습니다.

#Review #Jailbreak Detection #Large Language Models #Predictive Entropy #Logit Lens #Intermediate Layers #Adversarial Robustness #Uncertainty Dynamics

2026년 6월 24일

[논문리뷰] Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models

본 논문은 실시간 오디오-비디오 인터랙션의 단절성과 모듈 간의 지연 시간 문제를 해결하기 위해 Wan-Streamer를 제안한다. 기존 연구들은 VAD, ASR, LLM, TTS 등을 결합한 캐스케이드(cascaded) 방식을 사용하여, 모듈 경계에서의 대기 시간과 오차 누적 문제에 직면해 있다 .

#Review #End-to-End #Real-time Interaction #Multimodal Foundation Models #Full-duplex #Streaming Inference #Block-causal Attention #Thinker-Performer Pipeline

2026년 6월 24일

[논문리뷰] V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning

본 논문은 Fine-grained visual reasoning 분야에서 요구되는 고비용의 RL 기반 탐색 및 대규모 텍스트 레이블 의존 문제를 해결하기 위해 제안되었습니다.

#Review #Multimodal Large Language Models #On-Policy Distillation #Fine-Grained Visual Reasoning #Contrastive Evidence Gating #Visual Grounding

2026년 6월 24일

[논문리뷰] UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating

본 논문은 기존의 다중 샷(multi-shot) 비디오 생성 모델들이 샷 간의 일관성(cross-shot coherence) 유지와 장기적인 내러티브 확장성이라는 두 가지 핵심 과제를 해결하지 못하는 문제를 다룹니다.

#Review #Multi-shot Video Generation #Memory-driven #Boundary-aware Gating #Diffusion Transformer #Audio-Visual Generation

2026년 6월 24일

[논문리뷰] TryOnCrafter: Unleashing Camera Trajectories for Realistic Video Virtual Try-on via a Renderable 4D Try-on Proxy

기존의 Video Virtual Try-on (VVT) 연구들은 입력 영상의 카메라 궤적에 종속되어 있어, 사용자가 원하는 다양한 각도에서의 의류 확인이 불가능하다는 구조적 한계가 존재함.

#Review #Video Virtual Try-on #Camera-controllable #4D Try-on Proxy #3DGS #Diffusion Transformer #CaM-VVTBench

2026년 6월 24일

[논문리뷰] The Hitchhiker's Guide to Agentic AI: From Foundations to Systems

이 가이드는 현대 AI 시스템의 전체 스택을 이해하고 구축하고자 하는 연구자와 실무자를 위해, LLM의 기초 아키텍처부터 autonomous agentic 시스템까지를 통합적으로 설명합니다.

#Review #LLM #Reinforcement Learning #Agentic AI #System Architecture #Retrieval-Augmented Generation #Chain-of-Thought #Multi-Agent Systems

2026년 6월 24일

[논문리뷰] ShutterMuse: Capture-Time Photography Guidance with MLLMs

본 연구는 실제 촬영 현장에서 필요한 실시간 가이던스 기능을 기존의 MLLM과 사진 모델들이 효과적으로 제공하지 못한다는 문제 의식에서 출발한다.

#Review #MLLM #Photography Guidance #Capture-Time Guidance #Composition #Pose Recommendation #Reinforcement Fine-Tuning

2026년 6월 24일

[논문리뷰] RoPE-Aware Bit Allocation for KV-Cache Quantization

본 논문은 기존 KV-Cache 양자화 기법들이 Key를 단순한 평면 벡터(Flat Vector)로 취급하여 발생하는 정보 손실 문제를 해결하고자 합니다.

#Review #KV-Cache Quantization #RoPE #Bit Allocation #LLM Inference #Long-Context #TurboQuant #Block-GTQ

2026년 6월 24일

[논문리뷰] ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation

본 논문은 표준 OPD 및 OPSD가 모든 SGO를 균등하게 취급하여 효율적인 학습 기회를 놓치고 있다는 점을 문제로 지적합니다.

#Review #On-Policy Distillation #Language Model Post-training #Sample Reweighting #Negative Trajectory #Reasoning #Knowledge Distillation #Prefix-based Training

2026년 6월 24일

[논문리뷰] RL-Index: Reinforcement Learning for Retrieval Index Reasoning

본 논문은 복잡한 논리적 추론이 필요한 검색 과제에서 기존 모델들이 겪는 한계를 극복하기 위해 제안되었다. 기존의 Query Rewriting 기반 접근 방식은 실시간으로 LLM을 호출해야 하므로 상당한 Online Latency를 유발하는 문제가 있다 .

#Review #Retrieval-Augmented Generation #Reinforcement Learning #Agentic Indexing #Group Relative Policy Optimization #Document Augmentation #Latency Optimization

2026년 6월 24일

[논문리뷰] MVTrack4Gen: Multi-View Point Tracking as Geometric Supervision for 4D Video Generation

죄송합니다. 현재 제공해주신 논문 URL(https://arxiv.org/html/2606.26087)은 시스템상 접근이 제한되어 내용을 직접 읽고 분석할 수 없습니다.

2026년 6월 24일

[논문리뷰] Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do

본 논문은 Multimodal CoT가 과연 모든 멀티모달 작업에서 일관되게 성능을 향상시키는지, 그리고 그 한계점은 무엇인지를 체계적으로 분석합니다. 최근 텍스트 중심 LLM에서는 CoT가 추론 능력을 극대화하는 표준으로 자리 잡았으나, 이를 멀티모달 영역으로 확장했을 때의 효용성은 여전히 불분명합니다.

#Review #Multimodal Chain-of-Thought #Visual Reasoning #LLM #Test-Time Scaling #Visual Reflection #Attention Bias

2026년 6월 24일

[논문리뷰] Improved Large Language Diffusion Models

본 논문은 기존 Autoregressive 패러다임이 지배적인 LLM 생태계에서 Diffusion 기반 언어 모델의 한계를 극복하고 그 가능성을 입증하고자 한다.

#Review #Diffusion Language Models #Bidirectional Attention #Masked Diffusion #Instruction Tuning #Large Language Models #Variable-Length Generation

2026년 6월 24일

[논문리뷰] IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation

본 논문은 현대의 통합형 MLLM 기반 이미지 생성 모델들이 복잡한 구조적 요구사항(객체 수, 공간적 관계, 속성 결합 등)을 따르는 데 어려움을 겪는 구조적 불투명성 문제를 해결하고자 합니다.

#Review #IV-CoT #Chain-of-Thought #Structure-Aware #Text-to-Image Generation #MLLM-DiT #Latent Reasoning

2026년 6월 24일

[논문리뷰] EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

본 논문은 기존의 로봇 조작 벤치마크들이 단일 성공률(Success Rate) 스칼라 값에 의존하여 모델의 진정한 역량을 가리고 있다는 점을 해결하고자 합니다. 현재의 Generalist Manipulation 모델들은 유사한 성공률을 보고하지만, 실제 배포 시 성능이 크게 달라지는 구조적 한계를 가지고 있습니다.

#Review #EBench #Generalist Mobile Manipulation #VLA (Vision-Language-Action)#Capability Profiling #Embodied AI #Benchmark #Generalization

2026년 6월 24일

[논문리뷰] DomainShuttle: Freeform Open Domain Subject-driven Text-to-video Generation

본 논문은 기존의 Subject-driven Video Generation (S2V) 모델들이 고정된 도메인 내의 충실도(In-domain fidelity)에는 집중하지만, 스타일이나 도메인 속성이 변하는 Cross-domain 환경에서의 유연성과 편집 능력이 부족하다는 문제를 해결하고자 합니다 .

#Review #Subject-driven Video Generation #Open Domain #Domain-MoT #DualRoPE #Cross-Pair Consistent Loss #Video Diffusion Models

2026년 6월 24일

[논문리뷰] Distill Once, Adapt Life-Long: Exploring Dataset Distillation for Continual Test-Time Adaptation

본 논문은 CTTA 환경에서 발생하는 Catastrophic Forgetting과 Self-training 오류의 누적 문제를 해결하기 위해 DO-ALL 프레임워크를 제안합니다.

#Review #Continual Test-Time Adaptation #Dataset Distillation #Catastrophic Forgetting #Stability #Source-Free #Plug-and-Play #Representation Alignment

2026년 6월 24일