최신 포스트

[논문리뷰] Can Vision-Language Models Solve the Shell Game?

Vision-Language Models (VLMs)는 전반적인 비디오 이해 및 추론에서 뛰어난 성능을 보였지만, 시간 경과에 따른 개체 추적(Visual Entity Tracking)과 같은 저수준 인식 능력에서는 중요한 병목 현상을 겪고 있습니다.

#Review #Visual Entity Tracking #Shell Game #Vision-Language Models (VLMs)#VET-Bench #Spatiotemporal Grounded Chain-of-Thought (SGCoT)#NC1-complete #Transformer-based VLMs

2026년 3월 15일

[Uvicorn] bytes에서 bytearray로 변경하여 HTTP 바디 누적 O(n²) → O(n) 개선

요청 바디 누적에서 bytes += 대신 bytearray +=를 사용하여 O(n²) 메모리 복사를 amortized O(1)로 개선한 분석.

#Uvicorn #Python #Performance #HTTP #ASGI #Memory

2026년 3월 15일

[triton] AMD AtomicCAS의 Tensor Operand Thread Predicate 수정

AMD 백엔드에서 tensor 기반 atomic CAS 연산의 thread predicate를 올바르게 적용하여 redundant thread의 잘못된 atomic 실행을 방지한 사례를 분석합니다.

#Triton #AMD #GPU #Atomics #BugFix

2026년 3월 14일

[triton] AMD Pipelined Loop에서 TDM Load의 Buffer Race 수정

AMD GPU의 pipelined loop에서 TDM load 사용 시 버퍼 수가 부족하여 발생하는 데이터 경쟁 버그를 수정한 PR 분석.

#Triton #AMD #TDM #Pipeline #BufferRace #BugFix

2026년 3월 14일

[triton] Triton Gluon을 활용한 고성능 2CTA 블록 스케일 행렬 곱셈 최적화

Triton Gluon의 2CTA 워프 전문화 기법을 통해 행렬 곱셈의 연산 강도를 높이고 SMEM 사용량을 최적화하는 방법

#Triton #GPU #CUDA #MatMul #HighPerformanceComputing

2026년 3월 13일

[PaddleOCR] PaddleOCR-VL 배포 문서 개선 — Docker 이미지 및 디바이스 호환성 가이드 추가

PaddleOCR-VL의 배포 문서를 Docker 중심으로 재구성하고, 디바이스 호환성 매트릭스와 오프라인 환경 가이드를 추가합니다.

#PaddleOCR #Docker #Deployment #Documentation #DevOps

2026년 3월 13일

[Ray] Autoscaler V2 스케줄링 최적화: 불가능한 리소스 요청 캐싱으로 O(N²M) 제거

동일한 리소스 요청 형태를 캐싱하여 try_schedule 호출을 대폭 줄이고 Autoscaler 행 현상을 해결한 분석.

#Ray #Python #Performance #Caching #Autoscaler

2026년 3월 13일

[Loki] 쿼리 엔진 캐시 정확성 테스트 추가

결과 캐시의 정확성을 검증하는 통합 테스트로 캐시 히트/미스 시 동일한 결과를 보장.

#Grafana Loki #Go #Performance #Testing #Caching

2026년 3월 13일

[Loki] 포인터 읽기 배치 크기 증가로 인덱스 조회 25% 가속

포인터/블룸 읽기 배치를 128에서 8192로 증가시켜 I/O 효율 개선

#Grafana Loki #DataObj #Batch Size #Performance

2026년 3월 13일

[Ray] 메모리 압력 테스트의 로그 패턴 업데이트로 테스트 안정성 확보

워커 종료 로그 메시지 변경에 맞춰 메모리 압력 테스트의 기대 문자열을 업데이트한 분석.

#Ray #Python #Testing #Memory Management #Observability

2026년 3월 13일

[triton] Triton 2CTA Block-Scaled Matmul — cuBLAS 대비 성능 비교

Triton Gluon으로 구현한 2CTA warp-specialized block-scaled matmul이 mxfp8/mxfp4/nvfp4를 지원한다

#Triton #CUDA #Matrix Multiplication #FP8 #Blackwell

2026년 3월 13일

[Ultralytics] 캘리브레이션 데이터셋이 배치보다 작을 때 에러 대신 자동 조정

INT8 캘리브레이션 데이터셋이 batch 크기보다 작으면 에러를 던지던 동작을 자동 조정 + 경고로 개선합니다.

#Ultralytics #YOLO #INT8 #Calibration #Export

2026년 3월 12일

[논문리뷰] XSkill: Continual Learning from Experience and Skills in Multimodal Agents

Multimodal 에이전트는 복잡한 시각적 추론 task와 다양한 툴을 처리할 수 있게 되었지만, 여전히 비효율적인 툴 사용과 open-ended 환경에서의 유연하지 않은 orchestration이라는 두 가지 근본적인 병목 현상에 직면해 있습니다.

#Review #Multimodal Agents #Continual Learning #Experience Learning #Skill Learning #Tool Use #Knowledge Base #Visual Reasoning

2026년 3월 12일

[논문리뷰] WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing

저자들은 instruction-based image editing 분야에서 text-centric image editing 이 중요한 응용 잠재력에도 불구하고 아직 충분히 탐구되지 않은 영역임을 지적합니다.

#Review #Text-centric Image Editing #Diffusion Models #Glyph-Guided Fine-tuning #Reinforcement Learning #Multilingual Benchmark #Dataset Construction

2026년 3월 12일

[논문리뷰] Video-Based Reward Modeling for Computer-Use Agents

Computer-use agents ( CUAs )는 일반적인 컴퓨터 자동화 분야에서 유망한 패러다임으로 부상하고 있지만, 에이전트 trajectory가 사용자 지침을 진정으로 이행하는지 여부를 평가하는 것은 여전히 어려운 과제로 남아 있습니다.

#Review #Reward Modeling #Computer-Use Agents #Execution Video #Spatiotemporal Token Pruning #Dataset #Task Success

2026년 3월 12일

[논문리뷰] Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining

Large Language Models (LLMs)는 코드 생성(Code Generation)에서 놀라운 성공을 거두었지만, 복잡한 소프트웨어 Engineering을 위한 깊고 긴 Horizon의 Reasoning에는 여전히 어려움을 겪고 있습니다.

2026년 3월 12일

[논문리뷰] Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation

Diffusion models과 autoregressive models의 발전으로 T2I generation 및 image editing task에서 상당한 진전이 있었으나, 이러한 모델들의 성능 향상을 위한 RL 기반 접근 방식은 reward model 의 신뢰성 문제에 직면해 있습니다.

#Review #Reinforcement Learning #Reward Modeling #Image Editing #Image Generation #MLLM #Data Curation #Fidelity #Instruction Following

2026년 3월 12일

[논문리뷰] TeamHOI: Learning a Unified Policy for Cooperative Human-Object Interactions with Any Team Size

물리 기반 인간형 제어는 사실적이고 고성능의 단일 에이전트(Single-agent) 행동을 가능하게 하는 데 상당한 발전을 이루었지만, 이를 협동적인 Human-Object Interaction (HOI) 으로 확장하는 것은 여전히 어려운 과제입니다.

#Review #Human-Object Interaction (HOI)#Reinforcement Learning (RL)#Transformer-based Policy #Adversarial Motion Prior (AMP)#Decentralized Policy #Multi-agent Systems #Scalable Coordination

2026년 3월 12일

[논문리뷰] Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Multimodal Agent는 복잡한 문서 기반 워크플로우를 자동화하는 유망한 방향을 제시하지만, 이러한 Agent가 진정한 Strategic Reasoning 을 보여주는지, 아니면 단지 Stochastic Trial-and-error Search 에 의존하는지에 대한 근본적인 의문이 존재했습니다.

#Review #Multimodal Agents #Document QA #Agentic Reasoning #RAG #Benchmark #PDFs #Effort Calibration

2026년 3월 12일

[논문리뷰] Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training

인간은 시각적 관찰 스트림을 통해 실제 공간을 인지하고 이해하므로, 잠재적으로 무한한 비디오 스트림에서 Spatial Evidence 를 스트리밍 방식으로 유지하고 업데이트하는 능력은 Spatial Intelligence 에 필수적입니다.

#Review #Spatial Intelligence #Test-Time Training #MLLM #Streaming Video #Hybrid Architecture #Spatiotemporal Convolution

2026년 3월 12일