On-device small language models with multimodality, RAG, and Function Calling

https://news.ycombinator.com/rss Hits: 1

Summary

Last year Google AI Edge introduced support for on-device small language models (SLMs) with four initial models on Android, iOS, and Web. Today, we are excited to expand support to over a dozen models including the new Gemma 3 and Gemma 3n models, hosted on our new LiteRT Hugging Face community.Gemma 3n, available via Google AI Edge as an early preview, is Gemma’s first multimodal on-device small language model supporting text, image, video, and audio inputs. Paired with our new Retrieval Augmented Generation (RAG) and Function Calling libraries, you have everything you need to prototype and build transformative AI features fully on the edge. Sorry, your browser doesn't support playback for this video Let users control apps with on-device SLMs and our new function calling library Broader model supportYou can find our growing list of models to choose from in the LiteRT Hugging Face Community. Download any of these models and easily run them on-device with just a few lines of code. The models are fully optimized and converted for mobile and web. Full instructions on how to run these models can be found in our documentation and on each model card on Hugging Face.To customize any of these models, you finetune the base model and then convert and quantize the model using the appropriate AI Edge libraries. We have a Colab showing every step you need to fine-tune and then convert Gemma 3 1B.With the latest release of our quantization tools, we have new quantization schemes that allow for much higher quality int4 post training quantization. Compared to bf16, the default data type for many models, int4 quantization can reduce the size of language models by a factor of 2.5-4X while significantly decreasing latency and peak memory consumption.Gemma 3 1B & Gemma 3nEarlier this year, we introduced Gemma 3 1B. At only 529MB, this model can run up to 2,585 tokens per second pre-fill on the mobile GPU, allowing it to process up to a page of content in under a second. Gemma 3 1B’s sm...

First seen: 2025-05-20 19:13

Last seen: 2025-05-20 19:13

Read Full Article More from this Source

On-device small language models with multimodality, RAG, and Function Calling

Summary

Related News

Using computers more freely and safely (2023)

Zero shot forecasting: finding the right foundation model for O11Y forecasting

Show HN: Job Compass – AI agents that help you find jobs, not replace you

The Hat, the Spectre and SAT Solvers (2024)

When random people give money to random other people (2017)