#Transformer Architecture

53개의 포스트

[논문리뷰] Communication-Inspired Tokenization for Structured Image Representations

본 논문은 기존 이미지 토크나이저들이 재구성 및 압축에만 초점을 맞춰 객체 수준의 의미론적 구조보다는 국부적인 텍스처를 포착하는 한계를 해결하고자 합니다.

#Review #Image Tokenization #Structured Representation #Attentive Encoding #Flow Matching #Semantic Alignment #Compositional Generalization #Transformer Architecture

2026년 2월 24일

[논문리뷰] Large Causal Models for Temporal Causal Discovery

본 논문은 시계열 데이터에 대한 인과 관계 탐색(Causal Discovery, CD)에서 데이터셋별 모델 학습 패러다임의 한계 를 극복하고자 합니다.

#Review #Causal Discovery #Temporal Models #Foundation Models #Transformer Architecture #Zero-shot Learning #Time-series Data #Scalability #Multi-dataset Pretraining

2026년 2월 23일

[논문리뷰] Arcee Trinity Large Technical Report

본 논문은 희소한 Mixture-of-Experts (MoE) 아키텍처를 기반으로 하는 대규모 언어 모델인 Trinity Large 를 개발하고, 효율적인 학습 및 추론 성능과 높은 안정성을 달성하는 것을 목표로 합니다.

#Review #Mixture-of-Experts #Sparse LLM #Training Stability #Load Balancing #MoE #Transformer Architecture #Context Extension #Muon Optimizer

2026년 2월 19일

[논문리뷰] MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models

기존 오디오 토크나이저의 사전 학습된 인코더 , 의미론적 증류 , 이질적인 CNN 기반 아키텍처 의존성으로 인한 재구성 충실도 및 확장성 한계를 극복하는 것이 목표입니다.

#Review #Audio Tokenizer #Transformer Architecture #End-to-End Learning #Residual Vector Quantization #Speech Synthesis #Audio Foundation Models #Scalability #Autoregressive Models

2026년 2월 12일

[논문리뷰] HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing

본 논문은 기존 희소 어텐션(sparse attention) 방법론의 두 가지 근본적인 한계를 해결하고자 합니다. 첫째, 토큰 중요도 예측에 추가적인 프록시(proxy)를 사용하는 복잡성과 성능 저하 문제.

#Review #Sparse Attention #KV Cache Sharing #Hybrid Attention #Long-Context LLMs #Memory Optimization #Token Selection #Transformer Architecture

2026년 2월 4일

[논문리뷰] DeepSeek-OCR 2: Visual Causal Flow

본 논문은 기존 Vision-Language Model (VLM) 이 시각 토큰을 고정된 래스터 스캔 순서로 처리하여 인간의 유연한 시각 인지 방식과 상충하는 문제를 해결하고자 합니다.

#Review #OCR #Vision-Language Model #Causal Reasoning #Transformer Architecture #Attention Mechanism #Document Understanding #DeepEncoder

2026년 1월 28일

[논문리뷰] Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

현재 대규모 언어 모델(LLM)의 스케일링이 한계에 부딪혔으며, 특히 깊이 스케일링은 이론적으로 우수한 표현력을 제공하지만 기존 Transformer 아키텍처는 극심한 깊이에서 안정적으로 훈련하기 어렵습니다.

#Review #Transformer Architecture #Layer Normalization #Depth Scaling #Training Stability #Large Language Models #Gradient Flow #Highway Networks #Post-LayerNorm

2026년 1월 27일

[논문리뷰] SkyReels-V3 Technique Report

본 논문은 SkyReels-V3 를 통해 시각적 참조, 비디오, 오디오 및 텍스트 입력을 통합하여 유연하고 제어 가능한 비디오 생성을 가능하게 하는 통합 멀티모달 조건부 비디오 생성 프레임워크 를 제시하는 것을 목표로 합니다.

#Review #Video Generation #Multimodal AI #Diffusion Models #Transformer Architecture #Reference-guided Generation #Video-to-Video #Audio-driven Animation #Temporal Consistency

2026년 1월 26일

[논문리뷰] Parallel Latent Reasoning for Sequential Recommendation

순차 추천 시스템에서 희소한 사용자 행동 시퀀스로부터 복잡한 사용자 선호를 포착하는 문제를 해결하는 것이 목표입니다.

#Review #Sequential Recommendation #Latent Reasoning #Parallel Processing #Computational Scaling #Mixture of Experts #Contrastive Learning #Transformer Architecture

2026년 1월 6일

[논문리뷰] Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

언어 모델 아키텍처 간의 성능 차이를, 특히 학술 규모의 사전 훈련에서 발생하는 높은 노이즈와 비용 문제 없이 신뢰성 있게 평가하고 이해하는 것을 목표로 합니다.

#Review #Language Models #Transformer Architecture #Canon Layers #Synthetic Pretraining #Reasoning Depth #Linear Attention #State-Space Models #NoPE

2025년 12월 21일

[논문리뷰] REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion

본 논문은 최신 이미지 생성 모델인 Latent Diffusion Models (LDMs) 의 고질적인 문제인 느린 의미론적 정보 학습 및 샘플 품질 제한을 해결하고자 합니다.

#Review #Latent Diffusion Models #Vision Foundation Models #Semantic Compression #Global-Local Semantics #Image Generation #Representation Entanglement #Transformer Architecture

2025년 12월 18일

[논문리뷰] Stronger Normalization-Free Transformers

본 논문은 트랜스포머 아키텍처에서 필수적이었던 정규화 계층(Normalization Layers)의 의존성을 제거 하고, 단순히 기존 정규화 계층의 성능에 필적하는 것을 넘어 이를 능가하는 새로운 점별 함수(point-wise function)를 발견 하는 것을 목표로 합니다.

#Review #Normalization-Free Transformers #Point-wise Functions #Error Function #Deep Learning #Transformer Architecture #Generalization #Normalization Layers

2025년 12월 11일

[논문리뷰] MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos

본 논문은 기존 모션 캡처 파이프라인의 종(species) 또는 템플릿(template) 의존성 문제를 해결하고, 단일 모노큘러 비디오에서 임의의 리깅된 3D 에셋(asset) 에 대한 카테고리 불가지론적 모션 캡처(CAMoCap) 를 달성하는 것을 목표로 합니다.

#Review #3D Motion Capture #Monocular Video #Arbitrary Skeletons #Motion Retargeting #Deep Learning #Inverse Kinematics #Transformer Architecture #Category-Agnostic

2025년 12월 11일

[논문리뷰] Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

논문은 복잡한 동적 장면의 기하학적 구조와 움직임을 비디오로부터 효율적으로 재구성하는 것을 목표로 합니다. 기존의 단편적이고 컴퓨팅 비용이 높은 3D 재구성 접근 방식의 한계를 극복하고, 단일의 통일된 모델로 깊이, 시공간 대응, 전체 카메라 파라미터 추론을 수행하는 4D 이해 프레임워크 를 제시하고자 합니다.

#Review #Dynamic Scene Reconstruction #4D Reconstruction #Point Tracking #Transformer Architecture #Feedforward Model #Query-based Inference #Computer Vision #Geometric Consistency

2025년 12월 9일

[논문리뷰] FMA-Net++: Motion- and Exposure-Aware Real-World Joint Video Super-Resolution and Deblurring

본 논문은 실제 환경에서 발생하는 동적으로 변화하는 노출 과 모션에 의한 복합적인 비디오 열화 문제를 해결하여, 고해상도(HR) 및 선명한 비디오를 복원하는 것을 목표로 합니다. 기존 비디오 복원 방법론들이 고정된 노출 시간을 가정하여 실제 시나리오에 취약하다는 한계를 극복하고자 합니다.

#Review #Video Super-Resolution #Video Deblurring #Joint Restoration #Exposure-Aware #Motion Compensation #Transformer Architecture #Dynamic Filtering #Real-World Degradations

2025년 12월 4일

[논문리뷰] Mixture of Horizons in Action Chunking

본 논문은 Vision-Language-Action (VLA) 모델 에서 고정된 액션 청크 길이(horizon) 가 유발하는 근본적인 한계점을 해결하고자 합니다.

#Review #Vision-Language-Action Models #Action Chunking #Robotic Manipulation #Multi-horizon Planning #Transformer Architecture #Gated Fusion #Dynamic Inference

2025년 12월 2일

[논문리뷰] Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation

본 연구는 오디오-비디오 공동 노이즈 제거 훈련이 비디오 품질에만 중점을 둘 때도 비디오 생성 성능을 향상시키는 근본적인 질문에 답하는 것을 목표로 합니다.

#Review #Video Generation #Audio-Video Multimodal #Joint Denoising #Diffusion Models #Transformer Architecture #World Models #Physical Commonsense #Multimodal Training

2025년 12월 2일

[논문리뷰] Adversarial Flow Models

본 논문은 기존 GANs (Generative Adversarial Networks) 의 훈련 불안정성과 Flow Matching 모델의 저해상도 이산화 오류 및 반복적인 추론 비용 문제를 해결하고자 합니다.

#Review #Generative Models #Adversarial Flow Models #GANs #Flow Matching #Optimal Transport #Single-step Generation #Image Generation #Transformer Architecture

2025년 11월 30일

[논문리뷰] Terminal Velocity Matching

논문은 고품질 샘플을 빠르고 효율적으로 생성하며, 고차원 데이터에 확장 가능한 생성 모델을 단일 훈련 단계로 구축하는 것을 목표로 합니다.

#Review #Generative Models #Flow Matching #Diffusion Models #One-Step Generation #Few-Step Generation #Wasserstein Distance #Transformer Architecture #Lipschitz Continuity

2025년 11월 26일

[논문리뷰] ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding

본 연구는 기존 비디오 리테이크 생성 방법론이 가변 길이 입력, 동적 카메라 모션, 분포 외 카메라 궤적에 취약하며, 종종 워핑 아티팩트나 흐릿한 객체를 생성하는 한계를 해결하고자 합니다.

#Review #Video Retake Generation #Camera Control #Rotary Position Embedding (RoPE)#Rotary Camera Encoding (RoCE)#Geometric Consistency #Video Generative Models #Transformer Architecture #Multi-view Synthesis

2025년 11월 25일

[논문리뷰] MergeDNA: Context-aware Genome Modeling with Dynamic Tokenization through Token Merging

이 논문은 유전체 서열 모델링의 두 가지 난제인 다양한 정보 밀도 와 고유한 어휘 단위 부재 를 해결하고자 합니다.

#Review #Genome Modeling #Dynamic Tokenization #Token Merging #Context-aware Learning #DNA Foundation Models #Transformer Architecture #Multi-omics

2025년 11월 23일

[논문리뷰] Mixture of States: Routing Token-Level Dynamics for Multimodal Generation

본 논문은 멀티모달 확산 모델에서 텍스트 및 시각 신호의 효과적인 정렬 문제를 해결하고자 합니다.

#Review #Multimodal Diffusion #Mixture of States (MoS)#Token-Level Routing #Dynamic Conditional Fusion #Text-to-Image Generation #Image Editing #Transformer Architecture

2025년 11월 19일

[논문리뷰] A Brain Wave Encodes a Thousand Tokens: Modeling Inter-Cortical Neural Interactions for Effective EEG-based Emotion Recognition

본 논문은 기존 EEG 기반 감정 인식 모델들이 간과했던 뇌의 상이한 피질 영역 간의 동적 상호작용을 해결하고자 합니다.

#Review #EEG #Emotion Recognition #Transformer Architecture #Inter-Cortical Neural Interactions #Multi-Head Attention #Brain-Computer Interface #Affective Computing

2025년 11월 18일

[논문리뷰] Depth Anything 3: Recovering the Visual Space from Any Views

논문은 단일 이미지, 다중 뷰 또는 비디오 스트림과 같은 임의의 시각 입력 으로부터 공간적으로 일관된 3D 기하 정보를 복구 하는 것을 목표로 합니다.

#Review #Depth Estimation #Multi-view Geometry #Transformer Architecture #Teacher-Student Learning #Pose Estimation #3D Reconstruction #Novel View Synthesis #Visual Space Recovery

2025년 11월 13일

[논문리뷰] EVTAR: End-to-End Try on with Additional Unpaired Visual Reference

본 연구는 기존 가상 착용(virtual try-on) 모델들이 agnostic person images , human pose , densepose 등 복잡한 입력에 의존하고 레퍼런스 이미지 지원이 부족하여 현실성이 떨어지는 문제를 해결하고자 합니다.

#Review #Virtual Try-on #Diffusion Models #End-to-End Learning #Reference Images #Unpaired Data #Flow Matching #Transformer Architecture #Generative AI

2025년 11월 9일

[논문리뷰] Forget BIT, It is All about TOKEN: Towards Semantic Information Theory for LLMs

본 논문은 LLM(Large Language Model)의 내부 작동 원리를 이론적으로 설명하기 위해 비트(bits) 대신 토큰(token) 기반의 새로운 의미론적 정보 이론 프레임워크 를 구축하는 것을 목표로 합니다.

#Review #Semantic Information Theory #Large Language Models #Directed Information #Rate-Distortion Function #Granger Causality #Token Embedding #Transformer Architecture #Variational Inference

2025년 11월 9일

[논문리뷰] X-Streamer: Unified Human World Modeling with Audiovisual Interaction

컴퓨터 비전, 음성 및 텍스트를 아우르는 다중 모달 인터랙티브 인간 에이전트 시스템에서 기존의 모듈형 파이프라인 방식이 야기하는 컨텍스트 불일치, 지연 및 오류 누적 문제를 해결하고자 합니다.

#Review #Digital Human #Multimodal AI #Real-time Streaming #Video Generation #Diffusion Models #Transformer Architecture #Audiovisual Synchronization #World Modeling

2025년 9월 29일

[논문리뷰] AToken: A Unified Tokenizer for Vision

ATOKEN은 기존 시각 토크나이저들의 모달리티 및 태스크별 분절 문제를 해결하고, 이미지, 비디오, 3D 에셋 전반에서 고품질 재구성 및 심층적인 의미론적 이해를 동시에 달성하는 범용 시각 토크나이저를 개발하는 것을 목표로 합니다.

#Review #Unified Visual Tokenizer #Multimodal AI #Transformer Architecture #4D Representation #Adversarial-free Training #Reconstruction #Semantic Understanding #Generative Models

2025년 9월 19일

[논문리뷰] InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

본 논문은 기존 확산 모델이 고해상도 이미지 생성 시 해상도에 따라 연산 요구량이 제곱으로 증가 하여 4K 이미지 생성에 100초 이상 이 소요되는 문제점을 해결하고자 합니다.

#Review #Image Synthesis #Resolution-Agnostic #Diffusion Models #Latent Space #VAE Decoder #High-Resolution Image Generation #Generative AI #Transformer Architecture

2025년 9월 15일

[논문리뷰] Drawing2CAD: Sequence-to-Sequence Learning for CAD Generation from Vector Drawings

본 연구는 2D 벡터 엔지니어링 도면(SVG 형식)으로부터 파라메트릭 CAD 모델을 자동으로 생성 하는 문제를 해결하는 것을 목표로 합니다.

#Review #CAD Generation #Vector Graphics #Sequence-to-Sequence Learning #Transformer Architecture #Engineering Drawings #Multi-modal Learning #Soft Target Loss #Dual Decoder

2025년 9월 5일

[논문리뷰] MotionFlux: Efficient Text-Guided Motion Generation through Rectified Flow Matching and Preference Alignment

본 논문은 기존 텍스트 기반 모션 생성 방법론이 겪는 언어적 설명과 모션 의미 간의 부정확한 정렬 및 느리고 비효율적인 다단계 추론 과정 의 문제를 해결하고자 합니다. 궁극적으로 강력한 의미론적 정렬, 고품질 모션 생성, 그리고 실시간 합성을 가능하게 하는 프레임워크를 개발하는 것이 목표입니다.

#Review #Text-Guided Motion Generation #Rectified Flow Matching #Preference Alignment #Human Motion Synthesis #Real-time AI #Transformer Architecture #Self-supervised Learning

2025년 8월 28일

[논문리뷰] Wan-S2V: Audio-Driven Cinematic Video Generation

본 연구는 기존 오디오 기반 캐릭터 애니메이션 모델이 복잡한 영화 및 TV 프로덕션 시나리오(미묘한 상호작용, 현실적인 신체 움직임, 다이내믹한 카메라 워크)에서 한계를 보이는 문제를 해결합니다.

#Review #Audio-Driven Video Generation #Cinematic Video #Diffusion Models #Transformer Architecture #Long Video Consistency #Human Animation #Multimodal Control #Data Curation

2025년 8월 27일

[논문리뷰] UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

본 논문은 Mixture of Experts (MoE) 모델이 겪는 높은 메모리 접근 비용 문제를 해결하고, 기존 메모리 레이어 아키텍처인 UltraMem이 8-expert MoE 모델 성능에 미치지 못하는 격차를 해소하는 것을 목표로 합니다.

#Review #Memory Networks #Mixture of Experts (MoE)#Long-Context Learning #Sparse Models #Transformer Architecture #LLMs #Efficient Inference

2025년 8월 27일

[논문리뷰] Advances in Speech Separation: Techniques, Challenges, and Future Trends

본 논문은 '칵테일 파티 문제' 해결을 목표로 하는 DNN 기반 음성 분리 기술 에 대한 포괄적이고 체계적인 조사를 제공합니다. 빠르게 진화하는 이 분야의 파편화된 이해를 해소하고, 다양한 아키텍처, 학습 패러다임 및 공정한 정량적 평가에 대한 기존 조사들의 격차를 메우는 것을 목표로 합니다.

#Review #Speech Separation #Deep Neural Networks #Cocktail Party Problem #Transformer Architecture #Unsupervised Learning #Supervised Learning #Evaluation Metrics #Datasets

2025년 8월 20일

[논문리뷰] NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

이 논문은 텍스트-이미지 생성 분야에서 기존 autoregressive (AR) 모델이 직면한 양자화 손실 및 무거운 확산 모델 의존성 의 한계를 극복하고자 합니다.

#Review #Autoregressive Models #Text-to-Image Generation #Continuous Latent Tokens #Flow Matching #Image Editing #Multimodal Learning #Transformer Architecture

2025년 8월 15일

[논문리뷰] Exploitation Is All You Need... for Exploration

본 논문은 기존 RL에서 탐색을 위해 명시적인 인센티브를 부여하는 방식과 달리, 순수한 탐욕적인(exploitation-only) 목적 만으로도 탐색적 행동이 자연스럽게 나타날 수 있는지 검증하는 것을 목표로 합니다.

#Review #Reinforcement Learning #Exploration-Exploitation #Meta-RL #Transformer Architecture #Emergent Behavior #Multi-Armed Bandits #Gridworlds #Pseudo-Thompson Sampling

2025년 8월 5일

[논문리뷰] PixNerd: Pixel Neural Field Diffusion

이 논문은 Variational Autoencoder (VAE) 기반의 기존 확산 모델이 야기하는 누적 오류와 디코딩 아티팩트 문제를 해결하는 것을 목표로 합니다.

#Review #Diffusion Models #Neural Fields #Pixel Space #Generative Models #Image Synthesis #Transformer Architecture #End-to-End Learning

2025년 8월 4일

[논문리뷰] iLRM: An Iterative Large 3D Reconstruction Model

본 논문은 일반화 가능한 Feed-forward 3D 재구성 모델, 특히 Transformer 아키텍처 를 기반으로 하는 최신 방법론들이 다수의 뷰 또는 고해상도 이미지 처리 시 겪는 확장성 및 높은 연산 비용 문제 를 해결하고자 합니다.

#Review #3D Reconstruction #Gaussian Splatting #Iterative Refinement #Transformer Architecture #Multi-view Learning #Scalability #Feed-forward Models

2025년 8월 2일

[논문리뷰] The End of Manual Decoding: Towards Truly End-to-End Language Models

현재 LLM이 비미분 가능한 디코딩 하이퍼파라미터(온도, top-p)의 수동 튜닝에 의존하여 발생하는 비효율성과 비최적화 문제를 해결하는 것이 목표입니다. 논문은 모델이 자체 디코딩 전략을 학습하여 동적으로 제어함으로써 진정한 엔드-투-엔드 생성 을 가능하게 하는 새로운 아키텍처를 제안합니다.

#Review #Large Language Models (LLMs)#End-to-End Generation #Dynamic Decoding #Hyperparameter Optimization #Stochastic Sampling #Instruction Following #Transformer Architecture

2025년 10월 31일

[논문리뷰] Scaling Latent Reasoning via Looped Language Models

본 논문은 현대 LLM이 명시적 텍스트 생성(Chain-of-Thought) 에 의존하는 추론 방식의 한계를 극복하고자 합니다.

#Review #Looped Language Models #Latent Reasoning #Parameter Efficiency #Adaptive Computation #Pre-training Scaling #Knowledge Manipulation #Early Exit Mechanisms #Transformer Architecture

2025년 10월 30일

[논문리뷰] From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

본 논문은 기존의 모듈형 Vision-Language Models (VLMs)이 가진 강한 시각적 인코딩 편향과 복잡한 인프라 문제를 해결하고, 초기 퓨전 방식의 단일(monolithic) VLM 아키텍처인 '네이티브 VLM'의 근본적인 제약을 극복하는 것을 목표로 합니다.

#Review #Vision-Language Models #Native VLMs #Early Fusion #Multimodal Learning #Transformer Architecture #Rotary Position Embeddings #Pixel-Word Alignment #End-to-End Training

2025년 10월 17일

[논문리뷰] Direct Multi-Token Decoding

본 논문은 대규모 언어 모델(LLM)의 비효율적인 계층 활용을 해결하여 추론 속도를 가속화하는 것을 목표로 합니다.

#Review #LLM Inference #Multi-token Decoding #Transformer Architecture #Layer Specialization #Cyclical Refilling #Inference Speedup #Model Scaling

2025년 10월 16일

[논문리뷰] Towards Scalable and Consistent 3D Editing

3D 에셋의 기하학적 형태나 외관을 로컬하게 수정하는 3D 편집 태스크에서 발생하는 주요 난제들을 해결하는 것을 목표로 합니다.

#Review #3D Editing #Generative Models #Transformer Architecture #Dataset Generation #Multimodal Learning #Conditional Generation #Image-to-3D

2025년 10월 10일

[논문리뷰] Online Generic Event Boundary Detection

본 논문은 기존 오프라인(offline) GEBD(Generic Event Boundary Detection)의 한계를 극복하고, 인간의 인지 과정에 더 가까운 온라인 GEBD(On-GEBD) 라는 새로운 태스크를 제안합니다.

#Review #Online Video Analysis #Event Boundary Detection #Event Segmentation Theory #Real-time AI #Anomaly Detection #Transformer Architecture

2025년 10월 9일

[논문리뷰] Native Hybrid Attention for Efficient Sequence Modeling

본 논문은 Transformer의 O(n²) 연산 복잡도와 선형 어텐션 모델의 낮은 정확도 문제를 해결하기 위해, 효율적이면서도 긴 컨텍스트에서 높은 정확도를 유지할 수 있는 새로운 하이브리드 어텐션 아키텍처를 개발하는 것을 목표로 합니다.

#Review #Sequence Modeling #Hybrid Attention #Transformer Architecture #Linear Attention #Sliding Window Attention #Long Context #Large Language Models (LLMs)#Efficiency

2025년 10월 9일

[논문리뷰] Hybrid Architectures for Language Models: Systematic Analysis and Design Insights

기존 대규모 언어 모델(LLM)에서 Transformer 의 quadratic 복잡성과 Mamba 의 장문 컨텍스트 처리 한계를 극복하고자 합니다.

#Review #Hybrid LLM #Transformer Architecture #Mamba #State Space Models (SSM)#Computational Efficiency #Long-Context #Language Model Architectures #Scaling Laws

2025년 10월 7일

[논문리뷰] Improving GUI Grounding with Explicit Position-to-Coordinate Mapping

본 논문은 기존 VLM(Vision-Language Model)의 GUI Grounding(자연어 지시를 픽셀 좌표에 매핑) 한계를 해결하는 것을 목표로 합니다. 특히, 모델이 학습 시 보지 못한 고해상도 디스플레이에 추론할 때 발생하는 불안정한 좌표 예측과 해상도 일반화 문제를 개선하고자 합니다.

#Review #GUI Grounding #Vision-Language Models #Positional Embedding #UI Automation #Coordinate Prediction #Resolution Generalization #Transformer Architecture

2025년 10월 6일

[논문리뷰] DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion

본 논문은 Diffusion Transformer (DiT) 모델을 재훈련 없이 초고해상도 이미지(예: 16M+ 픽셀 )를 생성할 수 있도록 하는 것을 목표로 합니다.

#Review #Diffusion Models #Transformer Architecture #Positional Encoding #High-Resolution Image Generation #Extrapolation #Dynamic Adaptation #Training-Free

2025년 10월 24일

[논문리뷰] Attention Sinks in Diffusion Language Models

Diffusion Language Models (DLMs)의 내부 메커니즘, 특히 다른 트랜스포머 아키텍처에서 관찰된 '어텐션 싱크(attention sink)' 현상 이 DLMs에서도 발생하는지 여부와 그 특성을 규명하는 것을 목표로 합니다.

#Review #Diffusion Language Models #Attention Sinks #Transformer Architecture #Masked Language Modeling #Bidirectional Attention #Generative Models #Robustness #Dynamic Attention

2025년 10월 23일

[논문리뷰] Boolean Satisfiability via Imitation Learning

본 논문은 CDCL(Conflict-Driven Clause Learning) SAT solver 의 핵심 구성 요소인 브랜칭 정책의 비효율성을 개선하는 것을 목표로 합니다.

#Review #Boolean Satisfiability #Imitation Learning #CDCL Solvers #Branching Policy #KeyTrace #Transformer Architecture #Perceiver AR

2025년 10월 2일

[논문리뷰] jina-reranker-v3: Last but Not Late Interaction for Document Reranking

본 논문은 문서 리랭킹에서 효율성과 효과성 사이의 근본적인 트레이드오프를 해결하고자 합니다.

#Review #Document Reranking #Last but Not Late Interaction #Multilingual #Transformer Architecture #Cross-Encoder #InfoNCE Loss #Contextual Embedding #Qwen3

2025년 10월 1일

[논문리뷰] Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training

대규모 추론 모델의 후처리 훈련(Post-Training) 기법(SFT, RL 등)이 모델의 추론 능력 향상에 기여하는 내부 아키텍처 메커니즘의 불투명성을 해소하는 것이 주요 목표입니다.

#Review #Mechanistic Interpretability #Attention Heads #Post-Training #Supervised Fine-Tuning (SFT)#Reinforcement Learning (RL)#Circuit Analysis #Reasoning Models #Transformer Architecture

2025년 10월 1일

[논문리뷰] The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

본 논문은 기존 Transformer 모델이 CoT (Chain-of-Thought) 추론 의 일반화와 뇌 기능에 대한 미시적 해석을 제공하지 못하는 한계를 지적합니다.

#Review #Large Language Models #Brain-Inspired AI #Graph Neural Networks #Hebbian Learning #Scale-Free Networks #Model Interpretability #Transformer Architecture

2025년 10월 1일