Run LLMs on Apple Neural Engine (ANE)

https://news.ycombinator.com/rss Hits: 17
Summary

ANEMLL ANEMLL (pronounced like "animal") is an open-source project focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE). Goals The goal is to provide a fully open-source pipeline from model conversion to inference for common LLM architectures running on ANE. This enables seamless integration and on-device inference for low-power applications on edge devices, ensuring maximum privacy and security. This is critical for autonomous applications, where models run directly on the device without requiring an internet connection. We aim to: Provide flexible and easy to use library/framework to port LLMs to ANE directly from Hugging Face models Provide on-device examples for iOS and macOS swift or C/C++ Applications See update Roadmap.md for more details Main Components in 0.3.0 Alpha Release ANEMLL provides five main components for Apple Neural Engine inference development: Pre-converted Models We provide sample converted models ready for use: LLAMA 3.1 (1B and 8B variants) including iOS "friendly builds" DeepSeek distilled models DeepHermes distilled models Note Please note that Quantization should be improved. LUT4 quality is fairly low due to lack of Block Quantization on Apple Neural Engine. Some GPTQ and Spin Quant should greatly improve LUT4 models. Visit our Hugging Face repository for the latest converted models. Important This is Alpha Release 0.3.0 for the library. It is designed to process Model Weights directly from Hugging Face models and convert them to the CoreML format for Apple Neural Engine (ANE for short). This is Alpha Release 0.3.0 for the library. It is designed to process Model Weights directly from Hugging Face models and convert them to the CoreML format for Apple Neural Engine (ANE for short). This release only supports LLAMA models including DeepSeek and DeepHermes distilled models on LLaMA 3.1 architecture The future release will add support for more models and archi...

First seen: 2025-05-03 16:44

Last seen: 2025-05-04 08:47