[논문리뷰] T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video GenerationarXiv에 게시된 'T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation' 논문에 대한 자세한 리뷰입니다.#Review#Text-to-Audio-Video Generation#Multimodal Evaluation#Benchmark#MLLM-as-a-Judge#Cross-modal Alignment#Instruction Following#Perceptual Realism#Audio Realism2025년 12월 24일댓글 수 로딩 중
[논문리뷰] Scaling Language-Centric Omnimodal Representation LearningarXiv에 게시된 'Scaling Language-Centric Omnimodal Representation Learning' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Embeddings#MLLMs#Contrastive Learning#Cross-modal Alignment#Generative Pretraining#Representation Learning#Scaling Laws2025년 10월 15일댓글 수 로딩 중
[논문리뷰] Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMsarXiv에 게시된 'Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal AI#Prompt Optimization#MLLMs#Bayesian Optimization#Cross-modal Alignment#Prompt Engineering#Generative AI#Exploration-Exploitation2025년 10월 13일댓글 수 로딩 중
[논문리뷰] Discrete Diffusion Models with MLLMs for Unified Medical Multimodal GenerationarXiv에 게시된 'Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation' 논문에 대한 자세한 리뷰입니다.#Review#Discrete Diffusion Models#Multimodal Large Language Models (MLLMs)#Medical Image Generation#Medical Report Generation#Multimodal Generation#Medical AI#Cross-modal Alignment2025년 10월 8일댓글 수 로딩 중
[논문리뷰] Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys ChallengeWentao Zhang이 arXiv에 게시한 'Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Science AI#Caption-assisted Reasoning#SeePhys Challenge#Large Language Models#Visual Question Answering#Physics Problems#Cross-modal Alignment2025년 9월 17일댓글 수 로딩 중