Ollama v0.19 integrates MLX backend for Apple Silicon, delivering significant local inference speed improvements for Mac developers.
Ollama released version 0.19 with MLX backend integration, enabling dramatically faster local model inference on Apple Silicon chips. MLX is Apple's own machine learning framework optimized for the M-series GPU/CPU unified memory architecture. This update makes running local LLMs on MacBooks and Mac Studios substantially faster without any cloud dependency. The release is available now through the standard Ollama update channel.
Ollama v0.19 switches to Apple's MLX backend under the hood, meaning M-series chips now use their GPU and unified memory architecture the way Apple intended. Expect 2–4x token generation speed on common models like Llama 3, Mistral, and Qwen compared to the previous CPU-heavy path. This is not a config change — it's a free performance unlock the moment you update.
Update Ollama to v0.19 and run a before/after tokens-per-second benchmark on your current most-used model to decide whether local inference can replace your paid API calls for dev/test workloads.
Run: ollama pull llama3.2 && ollama run llama3.2 to confirm you're on the latest model
Tags
Signals by role
Also today
Tools mentioned