최신 포스트

[논문리뷰] Generative Modeling with Orbit-Space Particle Flow Matching

본 논문은 현대의 Grid-based 생성 모델(Diffusion, Flow Matching)이 입자 시스템(Particle Systems)의 고유한 특성을 효과적으로 다루지 못한다는 점을 해결하고자 합니다 .

#Review #Generative Modeling #Flow Matching #Particle Systems #Orbit-Space Canonicalization #Geometric Probability Paths #Surface Normals #Arc-Length Terminal Velocity

2026년 5월 4일

[논문리뷰] From Context to Skills: Can Language Models Learn from Context Skillfully?

본 논문은 LLM이 pre-training 과정에서 학습하지 않은 복잡한 맥락을 효과적으로 이해하고 추론하는 능력이 부족하다는 문제를 해결하고자 한다.

#Review #Context Learning #Language Models #Self-evolving Framework #Multi-agent Self-play #Skill Augmentation #Cross-time Replay #Context-specific Skills

2026년 5월 4일

[논문리뷰] ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models

본 논문은 기존 Diffusion 모델이 고차원 데이터의 조합적 구조를 충분히 반영하지 못해 학습 효율과 생성 성능이 제한되는 문제를 해결하고자 한다.

#Review #Diffusion Generative Models #Combinatorial Stochasticity #Structured Data #Asynchronous Inference #Graded Control

2026년 5월 4일

[논문리뷰] AcademiClaw: When Students Set Challenges for AI Agents

기존 OpenClaw 생태계의 벤치마크들은 주로 보조 수준(assistant-level)의 단순 업무 평가에 치중되어 있어, 실제 학술 및 전문 분야의 고난도 업무 수행 능력을 평가하는 데 한계가 있습니다 . 이러한 좁은 평가 범위는 OpenClaw 에이전트의 실제 역량에 대한 편향된 인식을 야기합니다.

#Review #Agent Benchmarking #OpenClaw #Academic-level Tasks #GPU-intensive #Multi-dimensional Evaluation #Behavioral Phenotypes #Autonomous Agents

2026년 5월 4일

[sglang] AMD ROCm 환경에서의 DeepSeek-V4 성능 최적화: Aiter MHC 커널 통합 분석

SGLang의 AMD 지원 강화: DeepSeek-V4 모델의 MHC 연산을 Aiter 전용 커널로 교체하여 추론 성능을 최적화했습니다.

#DeepSeek-V4 #AMD #ROCm #SGLang #Aiter #Performance Optimization

2026년 5월 4일

[transformers] Hugging Face Transformers: PreTrainedTokenizer의 성능 병목 해결기

convert_ids_to_tokens 호출 시 매번 반복되던 all_special_ids 연산을 캐싱하여 성능을 300배 이상 개선한 사례를 분석합니다.

#HuggingFace #Transformers #Python #Optimization #Performance

2026년 5월 4일

[transformers] Hugging Face Transformers: MoE 및 FP8 커널 최적화를 통한 성능 향상

Hugging Face Transformers 라이브러리의 MoE 및 FP8 커널 최적화를 통해 성능을 개선하고 안정성을 높인 PR 분석

#transformers #optimization #MoE #FP8 #performance #kernel

2026년 5월 4일

[논문리뷰] Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction

본 논문은 대규모 웹 정보 탐색에서 깊이 있는 추론과 넓은 범위의 구조화된 데이터 집계라는 두 가지 상충하는 요구를 동시에 만족해야 하는 문제를 해결하고자 합니다.

#Review #Web-to-Table Search #Multi-Agent Framework #Bi-Level Architecture #External Memory #Self-Evolving Agents #Task Decomposition

2026년 5월 3일

[논문리뷰] UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

기존의 비디오 생성 연구들은 각 문제 설정(예: Text-to-Video, Inverse Rendering)에 대해 개별적인 모델을 학습시키는 파편화된 방식을 취하고 있어, 고정된 입력-출력 매핑에 제한되고 모달리티 간의 상호 상관관계를 활용하지 못하는 한계가 있습니다.

#Review #Video Diffusion Models #Multimodal Video Generation #Intrinsic Decomposition #Diffusion Priors #Stochastic Condition Masking #Decoupled Gated LoRA #Cross-Modal Self-Attention

2026년 5월 3일

[논문리뷰] Trees to Flows and Back: Unifying Decision Trees and Diffusion Models

본 연구는 고전적인 데이터 분석 모델인 결정 트리와 현대의 생성 모델인 diffusion model이 각각 수행하는 계층적 정보 정제 과정 사이의 근본적인 수학적 연결고리를 규명하고자 합니다.

#Review #Decision Trees #Diffusion Models #Global Trajectory Score Matching (GTSM)#Probability Flow ODE #Tabular Data #Knowledge Distillation #Flow Matching

2026년 5월 3일

[논문리뷰] Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling

본 논문은 기존의 dual-branch diffusion transformer 구조가 갖는 talking head 생성에서의 한계를 해결하고자 한다.

#Review #Talking head generation #Joint audio-video generation #Autoregressive modeling #Diffusion transformer #Multimodal generation

2026년 5월 3일

[논문리뷰] Online Self-Calibration Against Hallucination in Vision-Language Models

본 논문은 기존의 offline 선호도 정렬 방식이 LVLM의 hallucination 문제를 해결하는 데 오히려 역효과를 낼 수 있다는 Supervision-Perception Mismatch 문제를 제기한다.

#Review #Vision-Language Models #Hallucination #Monte Carlo Tree Search #Preference Alignment #DPO #Generative-Discriminative Gap #Online Learning

2026년 5월 3일

[논문리뷰] Map2World: Segment Map Conditioned Text to 3D World Generation

본 논문은 기존 3D World Generation 연구들이 겪고 있는 고정된 그리드 기반 레이아웃의 제약과 전역적 규모의 일관성 부족 문제를 해결하는 것을 목적으로 합니다.

#Review #3D World Generation #Segment Map Conditioning #Latent Fusion #Structured Latent #Detail Enhancer #Rectified Flow

2026년 5월 3일

[논문리뷰] Let ViT Speak: Generative Language-Image Pre-training

본 논문은 기존 MLLM용 vision encoder 학습 방식인 contrastive learning과 복잡한 encoder-decoder 구조의 한계를 극복하고자 합니다.

#Review #Vision Transformer #Generative Pre-training #Multimodal Large Language Models #Gated Attention #Vision-Language Pre-training #Minimalist Architecture

2026년 5월 3일

[논문리뷰] Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization

본 논문은 분산 환경에서 에이전트들이 handcrafted update rules에 의존하지 않고, historical trajectory를 기반으로 스스로 알고리즘을 설계하는 방식을 연구한다.

#Review #Distributed Black-Box Optimization #Multi-Agent Systems #Large Language Models #Consensus Optimization #Trajectory-Driven Self-Design

2026년 5월 3일

[논문리뷰] LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation

본 논문은 다국어 음성 합성 및 diarization 시스템에서 동일 화자가 언어(스크립트)를 전환할 때 발생하는 스피커 인식 오류 문제를 해결합니다.

#Review #Speaker Encoder #Indic Scripts #Gradient Reversal Layer #Speaker Verification #Language Adversarial Training #Voice Cloning #Diarization

2026년 5월 3일

[논문리뷰] From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills

본 논문은 LLM 에이전트 시스템이 사용하는 기술(Skill)의 표현 방식이 텍스트 중심의 파편화된 구조에 머물러 있어, 기계적 reasoning과 자동화된 검증에 한계가 있다는 문제의식에서 출발합니다.

#Review #LLM Agents #Skill Representation #Scheduling-Structural-Logical (SSL)#Skill Discovery #Risk Assessment #Knowledge Representation

2026년 5월 3일

[논문리뷰] End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer

본 논문은 기존의 2단계 학습 방식이 토크나이저와 생성 모델 간의 비정렬 문제를 야기하여 최종 생성 품질을 제한한다는 점을 해결하고자 한다.

#Review #Autoregressive Image Generation #1D Vision Tokenizer #End-to-End Training #Semantic Alignment #Vision Foundation Models

2026년 5월 3일

[논문리뷰] AnalogRetriever: Learning Cross-Modal Representations for Analog Circuit Retrieval

본 논문은 아날로그 회로 설계 시 발생하는 이질적인 표현(Netlist, Schematic, Description) 간의 검색 어려움을 해결하고자 AnalogRetriever를 제안한다.

#Review #Analog Circuit Retrieval #Cross-Modal Alignment #SPICE Netlists #Relational Graph Convolutional Network (RGCN)#Retrieval-Augmented Generation (RAG)#Curriculum Contrastive Learning

2026년 5월 3일

[cpython] CPython JIT 최적화: 불변 및 불사 객체에 대한 불필요한 의존성 제거하기

CPython JIT 엔진에서 Immutable 및 Immortal 클래스에 대한 감시(Watch)를 생략하여 성능을 개선한 사례를 분석합니다.

#CPython #JIT #Optimization #Python-Internals #Performance

2026년 5월 3일