We introduce the Progressive Visual Token Compression (PVC) in large vision-language models (VLMs), which unifies the visual inputs as videos and progressively compresses vision tokens across video ...