FastVLM: Dramatically Faster Vision Language Model from Apple

https://news.ycombinator.com/rss Hits: 10

Summary

FastVLM: Efficient Vision Encoding for Vision Language Models This is the official repository of FastVLM: Efficient Vision Encoding for Vision Language Models. (CVPR 2025) Highlights We introduce FastViTHD, a novel hybrid vision encoder designed to output fewer tokens and significantly reduce encoding time for high-resolution images. Our smallest variant outperforms LLaVA-OneVision-0.5B with 85x faster Time-to-First-Token (TTFT) and 3.4x smaller vision encoder. Our larger variants using Qwen2-7B LLM outperform recent works like Cambrian-1-8B while using a single image encoder with a 7.9x faster TTFT. Demo iOS app to demonstrate the performance of our model on a mobile device. Getting Started We use LLaVA codebase to train FastVLM variants. In order to train or finetune your own variants, please follow instructions provided in LLaVA codebase. We provide instructions for running inference with our models. Setup conda create -n fastvlm python=3.10 conda activate fastvlm pip install -e . Model Zoo For detailed information on various evaluations, please refer to our paper. Model Stage Pytorch Checkpoint (url) FastVLM-0.5B 2 fastvlm_0.5b_stage2 3 fastvlm_0.5b_stage3 FastVLM-1.5B 2 fastvlm_1.5b_stage2 3 fastvlm_1.5b_stage3 FastVLM-7B 2 fastvlm_7b_stage2 3 fastvlm_7b_stage3 To download all the pretrained checkpoints run the command below (note that this might take some time depending on your connection so might be good to grab ☕️ while you wait). bash get_models.sh # Files will be downloaded to `checkpoints` directory. Usage Example To run inference of PyTorch checkpoint, follow the instruction below python predict.py --model-path /path/to/checkpoint-dir \ --image-file /path/to/image.png \ --prompt " Describe the image. " Inference on Apple Silicon To run inference on Apple Silicon, pytorch checkpoints have to be exported to format suitable for running on Apple Silicon, detailed instructions and code can be found model_export subfolder. Please see the README there for more ...

First seen: 2025-05-13 02:29

Last seen: 2025-05-13 11:30

Read Full Article More from this Source

FastVLM: Dramatically Faster Vision Language Model from Apple

Summary

Related News

Why are coffee stains darker at the edges?

Open Hardware Ethernet Switch project, part 1

NASA Study Reveals Venus Crust Surprise

Understanding LucasArts' iMUSE System

Ask HN: How are you acquiring your first hundred users?