본문으로 건너뛰기

#Foundation Models

79개의 포스트

[논문리뷰] Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

댓글 수 로딩 중

[논문리뷰] Audio-Visual Intelligence in Large Foundation Models

댓글 수 로딩 중

[논문리뷰] NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results

댓글 수 로딩 중

[논문리뷰] MedGemma 1.5 Technical Report

댓글 수 로딩 중

[논문리뷰] The Geometric Alignment Tax: Tokenization vs. Continuous Geometry in Scientific Foundation Models

댓글 수 로딩 중

[논문리뷰] SciLT: Long-Tailed Classification in Scientific Image Domains

댓글 수 로딩 중

[논문리뷰] ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions

댓글 수 로딩 중

[논문리뷰] Layer by layer, module by module: Choose both for optimal OOD probing of ViT

댓글 수 로딩 중

[논문리뷰] Model Context Protocol (MCP) Tool Descriptions Are Smelly! Towards Improving AI Agent Efficiency with Augmented MCP Tool Descriptions

댓글 수 로딩 중

[논문리뷰] ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning

댓글 수 로딩 중

[논문리뷰] MetricAnything: Scaling Metric Depth Pretraining with Noisy Heterogeneous Sources

댓글 수 로딩 중

[논문리뷰] Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation

댓글 수 로딩 중

[논문리뷰] STEP3-VL-10B Technical Report

댓글 수 로딩 중

[논문리뷰] Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

댓글 수 로딩 중

[논문리뷰] Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems

댓글 수 로딩 중

[논문리뷰] SAM Audio: Segment Anything in Audio

댓글 수 로딩 중

[논문리뷰] The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

댓글 수 로딩 중

[논문리뷰] In Pursuit of Pixel Supervision for Visual Pre-training

댓글 수 로딩 중

[논문리뷰] A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning

댓글 수 로딩 중

[논문리뷰] Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation

댓글 수 로딩 중

[논문리뷰] DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

댓글 수 로딩 중

[논문리뷰] GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation

댓글 수 로딩 중

[논문리뷰] SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking

댓글 수 로딩 중

[논문리뷰] UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity

댓글 수 로딩 중

[논문리뷰] Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning

댓글 수 로딩 중

[논문리뷰] SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines

댓글 수 로딩 중

[논문리뷰] LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios

댓글 수 로딩 중

[논문리뷰] Visual Representation Alignment for Multimodal Large Language Models

댓글 수 로딩 중

[논문리뷰] UniVerse-1: Unified Audio-Video Generation via Stitching of Experts

댓글 수 로딩 중

[논문리뷰] Does DINOv3 Set a New Medical Vision Standard?

댓글 수 로딩 중

[논문리뷰] M3Ret: Unleashing Zero-shot Multimodal Medical Image Retrieval via Self-Supervision

댓글 수 로딩 중

[논문리뷰] EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

댓글 수 로딩 중

[논문리뷰] Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

댓글 수 로딩 중

[논문리뷰] Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

댓글 수 로딩 중

[논문리뷰] A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

댓글 수 로딩 중

[논문리뷰] The Role of Computing Resources in Publishing Foundation Model Research

댓글 수 로딩 중

[논문리뷰] AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning

댓글 수 로딩 중

[논문리뷰] Model Merging with Functional Dual Anchors

댓글 수 로딩 중

[논문리뷰] Chronos-2: From Univariate to Universal Forecasting

댓글 수 로딩 중

[논문리뷰] OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

댓글 수 로딩 중

[논문리뷰] Foundation Models for Scientific Discovery: From Paradigm Enhancement to Paradigm Transition

댓글 수 로딩 중

[논문리뷰] Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models

댓글 수 로딩 중