We introduce the Progressive Visual Token Compression (PVC) in large vision-language models (VLMs), which unifies the visual inputs as videos and progressively compresses vision tokens across video ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results