[논문리뷰] MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement LearningHongsheng Li이 arXiv에 게시한 'MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning' 논문에 대한 자세한 리뷰입니다.#Review#Image Metaphor Understanding#Visual Reasoning#Reinforcement Learning#MLLMs#TFQ-GRPO#End-to-End Learning#Cognitive AI2026년 2월 12일댓글 수 로딩 중
[논문리뷰] MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation ModelsarXiv에 게시된 'MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models' 논문에 대한 자세한 리뷰입니다.#Review#Audio Tokenizer#Transformer Architecture#End-to-End Learning#Residual Vector Quantization#Speech Synthesis#Audio Foundation Models#Scalability#Autoregressive Models2026년 2월 12일댓글 수 로딩 중
[논문리뷰] LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCRarXiv에 게시된 'LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR' 논문에 대한 자세한 리뷰입니다.#Review#OCR#Vision-Language Model#End-to-End Learning#Multilingual#Reinforcement Learning#Document Understanding#Bounding Box Prediction#Task Arithmetic Merging2026년 1월 20일댓글 수 로딩 중
[논문리뷰] UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous DrivingarXiv에 게시된 'UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving' 논문에 대한 자세한 리뷰입니다.#Review#Autonomous Driving#End-to-End Learning#Vision-Language Models#World Model#Chain-of-Thought#Video Generation#Trajectory Planning#Multimodal Learning2025년 12월 10일댓글 수 로딩 중
[논문리뷰] OpenREAD: Reinforced Open-Ended Reasoing for End-to-End Autonomous Driving with LLM-as-CriticarXiv에 게시된 'OpenREAD: Reinforced Open-Ended Reasoing for End-to-End Autonomous Driving with LLM-as-Critic' 논문에 대한 자세한 리뷰입니다.#Review#Autonomous Driving#Reinforcement Fine-tuning#LLM-as-Critic#Vision-Language Model#End-to-End Learning#Chain-of-Thought#Trajectory Planning2025년 12월 1일댓글 수 로딩 중
[논문리뷰] HunyuanOCR Technical ReportarXiv에 게시된 'HunyuanOCR Technical Report' 논문에 대한 자세한 리뷰입니다.#Review#Optical Character Recognition#Multimodal Large Language Model#End-to-End Learning#Reinforcement Learning#Document Parsing#Information Extraction#Text Spotting2025년 11월 25일댓글 수 로딩 중
[논문리뷰] EVTAR: End-to-End Try on with Additional Unpaired Visual ReferencearXiv에 게시된 'EVTAR: End-to-End Try on with Additional Unpaired Visual Reference' 논문에 대한 자세한 리뷰입니다.#Review#Virtual Try-on#Diffusion Models#End-to-End Learning#Reference Images#Unpaired Data#Flow Matching#Transformer Architecture#Generative AI2025년 11월 9일댓글 수 로딩 중
[논문리뷰] Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RLLiam-Liu이 arXiv에 게시한 'Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL' 논문에 대한 자세한 리뷰입니다.#Review#Chain-of-Agents#Agent Foundation Models#Multi-Agent Systems#Tool-Integrated Reasoning#Multi-agent Distillation#Agentic Reinforcement Learning#LLMs#End-to-End Learning2025년 8월 20일댓글 수 로딩 중
[논문리뷰] PixNerd: Pixel Neural Field DiffusionLimin Wang이 arXiv에 게시한 'PixNerd: Pixel Neural Field Diffusion' 논문에 대한 자세한 리뷰입니다.#Review#Diffusion Models#Neural Fields#Pixel Space#Generative Models#Image Synthesis#Transformer Architecture#End-to-End Learning2025년 8월 4일댓글 수 로딩 중