최신 포스트

[논문리뷰] PixelSmile: Toward Fine-Grained Facial Expression Editing

최근 diffusion-based 이미지 편집 모델의 발전에도 불구하고, 미세한(fine-grained) 얼굴 표정 편집은 여전히 어려운 문제로 남아 있다.

2026년 3월 26일

[논문리뷰] MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models

최근 VFM은 다양한 task에서 강력한 representation을 제공하며 컴퓨터 비전 분야의 핵심으로 자리 잡았습니다.

#Review #Vision Foundation Models (VFMs)#Multi-Resolution Fusion (MuRF)#Dense Prediction #Anomaly Detection #Multimodal Understanding #Scale-Robust Representation

2026년 3월 26일

[논문리뷰] MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution

Large Language Models (LLMs) 기반의 Agent들이 장기적인 상호작용을 지원하기 위해 외부 메모리 뱅크를 활용하지만, 대부분의 기존 시스템은 메모리 Construction , Retrieval , Utilization 단계를 개별적인 서브루틴으로 분리하여 처리하는 한계를 가집니다.

#Review #LLM Agents #Memory Cycle #Multi-Agent Reasoning #Self-Evolution #Long-Horizon Memory #Strategic Blindness #Memory Management

2026년 3월 26일

[논문리뷰] MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Large Language Models (LLMs)는 다양한 분야에서 뛰어난 능력을 보였지만, 수백만 토큰 규모의 장기적이고 세밀한 기억(long-term, fine-grained memory retention)을 처리하는 데에는 여전히 큰 어려움에 직면해 있습니다.

#Review #Memory Sparse Attention #Long-Context LLMs #Efficient Memory #End-to-End Trainable #KV Cache Compression #Rotary Positional Embedding #Multi-hop Reasoning #Scalability

2026년 3월 26일

[논문리뷰] MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data

Multi-reference image generation은 multi-subject composition, narrative illustration, novel view synthesis와 같은 실제 애플리케이션에 필수적이지만, 현재 모델들은 input reference의 수가 증가함에 따라 심각한 성능 저하를 겪고 있다.

2026년 3월 26일

[논문리뷰] Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting

기존의 feed-forward 3D Gaussian Splatting (3DGS) 방법론들은 pixel-aligned primitive를 예측하므로, 해상도가 증가함에 따라 primitive의 수가 quadratic하게 증가하여 4K와 같은 고해상도(high-resolution) novel view synthesis를 실질적으로 불가능하게 만듭니다.

#Review #3D Gaussian Splatting #Novel View Synthesis #Feed-Forward #High-Resolution Rendering #Textured Primitives #Geometry-Appearance Decoupling #4K

2026년 3월 26일

[논문리뷰] Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Large Language Models (LLMs)와 Visual Language Models (VLMs)의 등장은 인공지능 분야에 혁신을 가져왔지만, 과학 분야(AI for Science, AI4S)에서 효과적인 foundation model을 구축하는 것은 과학 domain의 immense diversity와 specialization으로 인해 큰 도전을 제시합니다.

2026년 3월 26일

[논문리뷰] IQuest-Coder-V1 Technical Report

기존 Large Language Models (LLMs)는 도메인 특화를 통해 일반적인 지능을 크게 향상시켰지만, 코드 지능 분야에서는 Claude 4.5 Sonnet 과 같은 독점적인 선두 모델들과 오픈-웨이트 모델들 사이에 여전히 큰 격차가 존재한다.

2026년 3월 26일

[논문리뷰] FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

최근 Large Language Models (LLMs)는 금융 애플리케이션에서 agent 로서 사용자 요청을 해석하고, 외부 도구를 호출하며, 다단계 추론을 수행해야 하는 역할을 점점 더 많이 맡고 있습니다.

#Review #LLM Agents #Financial Tool Use #Benchmarking #Model Context Protocol #Multi-tool Reasoning #Multi-turn Conversation #Evaluation Metrics

2026년 3월 26일

[논문리뷰] Extending Precipitation Nowcasting Horizons via Spectral Fusion of Radar Observations and Foundation Model Priors

기존 Radar-only 모델은 대규모 대기 Context 부족으로 인해 예측 Lead Time이 길어질수록 성능 저하를 겪는다 [cite: 1, Figure 1].

#Review #Precipitation Nowcasting #Spectral Fusion #Radar Observations #Foundation Model #Pangu-Weather #Frequency Domain #Deep Learning

2026년 3월 26일

[논문리뷰] Electrostatic Photoluminescence Tuning in All-Solid-State Perovskite Transistors

재료의 optoelectronic properties를 'electric knob'으로 가역적으로 tuning하는 것은 잠재적 응용 분야를 크게 확장할 수 있는 중요한 목표이지만, photoluminescence (PL)나 photoconductivity (PC)와 같은 광전 특성을 electrostatically 제어하는 연구는 상대적으로 미개척 상태입니다.

#Review #Perovskite #Photoluminescence #Field-Effect Transistor #Electrostatic Tuning #CsPbBr3 #Carrier Recombination #Quantum Efficiency

2026년 3월 26일

[논문리뷰] Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

The paper 'Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration' by Danil Tokhchukov, Aysel Mirzoeva, Andrey Kuznetsov, and Konstantin Sobolev from MSU and FusionBrain Lab, AXXX, discusses a new method called…

2026년 3월 26일

[논문리뷰] BioVITA: Biological Dataset, Model, and Benchmark for Visual-Textual-Acoustic Alignment

Understanding animal species through multimodal data (visual, textual, acoustic) is a growing challenge at the intersection of computer vision and ecology.

2026년 3월 26일

[논문리뷰] AVControl: Efficient Framework for Training Audio-Visual Controls

비디오 및 오디오 생성 과정의 정교한 제어는 실제 창의적인 애플리케이션에 필수적이다. 그러나 depth, pose, camera trajectories, audio transformations 등 다양한 modalities에 걸친 control의 범위는 매우 광대하다.

#Review #Audio-Visual Generation #Video Control #LoRA #Parallel Canvas Conditioning #Diffusion Models #Modularity #Efficiency

2026년 3월 26일

[SGLang] flush_cache API에 timeout 파라미터 추가

HiCache 비동기 작업 중 캐시 flush 실패를 방지하기 위해 대기 시간을 설정할 수 있는 timeout 파라미터를 도입한다

#SGLang #API #Cache Management #HiCache

2026년 3월 26일

[sglang] NPU CI 최적화: PyTorch 의존성 캐싱으로 설치 속도 개선

SGLang NPU CI에서 PyTorch 패키지 설치 시 내부 캐시 서비스를 활용하도록 변경하고, 외부 미러 의존성을 제거한 CI 설정 분석.

#SGLang #NPU #CI #GitHub Actions #Caching #Ascend

2026년 3월 26일

[PaddleOCR] FastDeploy-Server 백엔드 추가로 VL 파이프라인 배포 옵션 확장

PaddleOCR-VL 파이프라인에 fastdeploy-server 백엔드를 추가하여 프로덕션 배포 선택지를 넓힙니다.

#PaddleOCR #FastDeploy #Inference #Backend #Deployment

2026년 3월 26일

[triton] AMD Async Wait Count에서 Warp Free Variable 및 Register Zero Base 버그 수정

비정규 warp가 async copy를 건너뛰는 경우와 register zero base가 명령어 수를 부풀리는 문제를 수정한 사례를 분석합니다.

#Triton #AMD #GPU #AsyncCopy #WarpSpecialization

2026년 3월 26일

[sglang] sgl-kernel Wheel METADATA/WHEEL 태그를 CUDA 파일명과 정렬

sgl-kernel의 wheel 빌드에서 파일명에 +cu124 suffix를 추가할 때 내부 METADATA Version과 WHEEL 태그도 함께 수정하여 pip 설치 오류를 해결한 분석.

#SGLang #sgl-kernel #Python Packaging #Wheel #CUDA #CI/CD

2026년 3월 26일

[sglang] AMD/ROCm 시작 크래시 수정: CuteDSL KDA 커널 Lazy Import 적용

SGLang에서 CuteDSL KDA 커널의 top-level import가 AMD/ROCm 환경에서 시작 시 크래시를 유발하는 문제를 lazy import로 수정한 분석.

#SGLang #AMD #ROCm #Bug Fix #Lazy Import #Linear Attention

2026년 3월 25일