Review

[논문리뷰] AnalogRetriever: Learning Cross-Modal Representations for Analog Circuit Retrieval

본 논문은 아날로그 회로 설계 시 발생하는 이질적인 표현(Netlist, Schematic, Description) 간의 검색 어려움을 해결하고자 AnalogRetriever를 제안한다.

#Review #Analog Circuit Retrieval #Cross-Modal Alignment #SPICE Netlists #Relational Graph Convolutional Network (RGCN)#Retrieval-Augmented Generation (RAG)#Curriculum Contrastive Learning

2026년 5월 3일

[논문리뷰] Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising

본 논문은 기존 Unified World Model들이 2D 픽셀 공간에만 국한되어 기하학적 구조에 대한 이해가 부족하며, 고차원 비디오 생성과 저차원 행동 예측 사이의 효율적인 균형을 맞추지 못한다는 문제를 해결하고자 한다.

#Review #Embodied AI #World Models #Diffusion Transformer #3D Reconstruction #Robotic Manipulation #Asynchronous Denoising #Unified Modeling

2026년 4월 29일

[논문리뷰] FASH-iCNN: Making Editorial Fashion Identity Inspectable Through Multimodal CNN Probing

본 연구는 현대 패션 AI 시스템이 특정 패션 하우스나 에디터의 심미적 논리를 데이터 내에 내재화하면서도, 이를 사용자에게 투명하게 공개하지 않는 불투명성 문제를 해결하고자 합니다.

#Review #Fashion AI #Multimodal CNN #Visual Channel Probing #Editorial Identity Encoding #Hierarchical Color Prediction #Transparency

2026년 4월 29일

[논문리뷰] Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion

본 논문은 기존 controllable diffusion 모델들의 파편화로 인한 시스템적 병목 현상을 해결하고자 합니다. 현재의 제어 방법들은 특정 백본에 종속적인 구조를 가지며, 각기 다른 학습 파이프라인과 런타임 훅을 사용하여 인프라 재사용이나 다중 제어 기법의 결합이 매우 어렵습니다.

#Review #Diffusion Models #Controllable Generation #Plugin Framework #KV-Cache #Template Model #Modular Design

2026년 4월 29일

[논문리뷰] A Survey on LLM-based Conversational User Simulation

본 논문은 LLM의 발달로 가능해진 사용자 시뮬레이션 기술의 체계적인 분류와 분석이 부재한 문제를 해결하고자 한다. 기존의 사용자 시뮬레이션은 특정 도메인(예: 추천 시스템)에 한정되거나 대규모 데이터 수집의 어려움으로 인해 확장성에 한계가 있었다.

#Review #Conversational User Simulation #Large Language Models #Persona Modeling #Synthetic Data Generation #Multi-agent Systems #Dialogue Evaluation

2026년 4월 29일

[논문리뷰] GoClick: Lightweight Element Grounding Model for Autonomous GUI Interaction

본 연구는 GUI Agent를 모바일 기기와 같은 자원 제약 환경에 효과적으로 배포하기 위해, 기존 대규모 VLM이 가진 과도한 연산 비용과 메모리 요구사항 문제를 해결하고자 한다. 대부분의 최신 VLM은 2.5B 이상의 파라미터를 사용하여 온디바이스 환경에서 활용하기 어렵다는 한계가 있다.

#Review #GUI Agent #Vision-Language Model #Visual Grounding #Data Refinement #Model Compression #Encoder-Decoder Architecture

2026년 4월 28일

[논문리뷰] AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding Benchmark

본 논문은 현재 GUI 에이전트 평가 방식이 단순한 시각적 요소 매칭에 치중되어 있어, 실제 디지털 환경에서의 복잡한 상태 변화와 GUI 동역학을 이해하는 능력을 측정하지 못한다는 문제를 해결하고자 한다.

#Review #GUI Agents #Multi-Modal Benchmarking #Functional Understanding #Interaction Outcome Prediction #Vision-Language Models #Hierarchical Decomposition

2026년 4월 28일

[논문리뷰] WorldMark: A Unified Benchmark Suite for Interactive Video World Models

저자들은 Interactive I2V 모델들을 위한 최초의 표준화된 벤치마크인 WorldMark를 제안합니다. 이 프레임워크의 핵심은 모델별 제어 방식을 WASD 표준 액션으로 변환하는 Unified Action-mapping Adapter를 통해 6개 주요 모델을 동일 조건에서 비교하는 것입니다.

#Review #Interactive World Models #Image-to-Video #Benchmark #Unified Control Interface #World Consistency #Cross-Model Evaluation

2026년 4월 23일

[논문리뷰] WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

본 연구는 기존 LLM 기반 웹사이트 생성 방식이 겪고 있는 확장성 및 품질 한계를 해결하고자 합니다.

#Review #Reinforcement Learning #Large Language Models #Website Generation #GRPO #Multimodal Reward #React

2026년 4월 23일

[논문리뷰] VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

본 논문은 자율형 GUI 에이전트가 직면한 early stopping과 repetitive loops라는 두 가지 근본적인 문제를 해결하기 위해 VLAA-GUI를 제안합니다. 기존 에이전트들은 작업 완료 여부를 불명확하게 판단하여 미완성 상태에서 성공을 선언하거나, 동일한 실패 행동을 반복적으로 수행하는 한계를 보입니다.

#Review #GUI Automation #Agentic Framework #Completeness Verifier #Loop Breaker #Search Agent #Multimodal LLM

2026년 4월 23일

[논문리뷰] UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling

본 논문은 시각적 앵커링을 통해 이질적인 동작들을 공통 잠재 공간으로 정렬하는 UniT를 제안합니다. UniT는 시각적, 동작적, 융합적 세 가지 브랜치로 구성된 트리 브랜치(tri-branch) 아키텍처를 가지며, 모든 브랜치는 Residual Quantization(RQ-VAE)을 통해 공유 코드북(shared codebook)으로 양자화됩니다 .

#Review #Humanoid Robotics #Vision-Language-Action Models #Cross-Embodiment Transfer #Latent Action Tokenizer #World Modeling #Visual Anchoring #Cross-Reconstruction

2026년 4월 23일

[논문리뷰] UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection

본 논문은 현대 AI 생태계에서 이미지 생성과 생성된 이미지 탐지가 서로 밀접하게 연관되어 있음에도 불구하고, 기존 연구들이 이들을 독립적으로 최적화한다는 점을 핵심 문제로 정의합니다.

#Review #Multimodal Large Language Models #AI-Generated Image Detection #Image Generation #Co-evolutionary Learning #Unified Architecture #Feature Alignment

2026년 4월 23일

[논문리뷰] Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models

본 논문은 LLM의 유창함 이면에 존재하는 사실적 부정확성 및 환각(Hallucination) 문제를 해결하기 위해 DAVinCI 프레임워크를 제안한다.

#Review #Attribution #Verification #Dual Framework #Hallucination #Confidence Calibration #Natural Language Inference

2026년 4월 23일

[논문리뷰] TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale

본 논문은 대규모 클라우드 네이티브 서비스 환경에서 고객 피드백으로부터 실시간으로 위험 이벤트를 탐지하는 시스템의 미흡함을 해결하기 위해 제안되었다.

#Review #Risk Event Discovery #Large Language Models #Incident Management #Signal-to-Noise Ratio #Event Linking #Enterprise Scale

2026년 4월 23일

[논문리뷰] Test-Time Adaptation for EEG Foundation Models: A Systematic Study under Real-World Distribution Shifts

본 논문은 EEG foundation models가 실제 임상 환경에서 직면하는 심각한 Distribution Shift 문제를 해결하고자 합니다.

#Review #Test-Time Adaptation #EEG Foundation Models #Distribution Shift #Benchmark #NeuroAdapt-Bench #T3A

2026년 4월 23일

[논문리뷰] StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition

본 논문은 기존의 identity encoder들이 자연 사진(natural photograph)에 과적합되어 있어, 다양한 스타일로 변환된 인물 사진에서 identity preservation 성능이 심각하게 저하되는 문제를 해결하고자 한다.

#Review #Facial Identity Recognition #Face Stylization #Perception-Aware #Identity Preservation #Deep Learning #Human-Calibration

2026년 4월 23일

[논문리뷰] Seeing Fast and Slow: Learning the Flow of Time in Videos

본 연구는 기존 비디오 모델들이 물리적 세계의 시간 흐름을 이해하고 제어하지 못하는 근본적인 한계를 해결하고자 한다.

#Review #Video Generation #Slow-motion #Temporal Super-resolution #Self-supervised Learning #Video Forensics #Time-frequency Scaling

2026년 4월 23일

[논문리뷰] PersonalAI: A Systematic Comparison of Knowledge Graph Storage and Retrieval Approaches for Personalized LLM agents

본 논문은 LLM 기반 에이전트의 개인화 과정에서 발생하는 장기 기억(Long-term memory) 관리의 복잡성과 구조적 한계를 해결하기 위해 구조화된 지식 그래프 메모리 프레임워크를 제안한다.

#Review #GraphRAG #Knowledge Graph #Personalized LLM Agents #Graph Traversal #Question Answering #Memory Framework

2026년 4월 23일

[논문리뷰] LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics

본 논문은 L1(수치 읽기), L2(패턴 인식), L3(의미론적 추론) 단계로 구성된 계층적 교육 과정을 통해 LLaTiSA를 학습시킨다. 제안 모델인 LLaTiSA는 시계열 시각화 그래프와 정밀한 인덱스-값 테이블을 동시에 입력받는 이중 뷰(dual-view) 프레임워크를 채택하여, 시각적 직관과 수치적 정확성을 동시에 확보한다 .

#Review #Time Series Reasoning #Large Language Models #Vision-Language Models #Chain-of-Thought #Curriculum Learning #Data Taxonomy

2026년 4월 23일

[논문리뷰] Hybrid Policy Distillation for LLMs

본 연구는 LLM 압축 과정에서 발생하는 divergence direction, optimization strategy, data regime 간의 복잡한 상호작용 문제를 해결하고자 합니다.

#Review #Knowledge Distillation #Large Language Models #Forward-Reverse KL #Policy Distillation #Logit-level Reweighting #On-policy Sampling

2026년 4월 23일