#Mask Tokenization

1개의 포스트

[논문리뷰] SAMTok: Representing Any Mask with Two Words

본 논문은 픽셀 단위의 멀티모달 대규모 언어 모델(MLLMs)이 복잡한 인코더, 전용 디코더, 비호환적인 훈련 목표로 인해 확장성 문제를 겪는 점을 해결하고자 합니다.

#Review #Mask Tokenization #Multimodal LLMs #Pixel-wise Vision-Language #Reinforcement Learning #Segmentation Anything Model #Discrete Representation

2026년 1월 22일