ModelsHigh Impact·Friday, March 20, 2026

Qwen3.5-9B Runs Local AI Security at GPT-5 Parity

Qwen3.5-9B scores 93.8% on a real security-workflow benchmark, running fully local on an M5 MacBook Pro with zero API costs.

What happened

A developer ran a 96-test benchmark across 15 security-workflow suites — covering tool use, event deduplication, and image-based triage — comparing local and cloud models. Qwen3.5-9B running on an M5 MacBook Pro scored 93.8%, just 4.1 points behind GPT-5.4, at 25 tok/s and 765ms TTFT using 13.8 GB of unified memory. The Qwen3.5-35B-MoE posted lower latency (435ms TTFT) than all tested OpenAI cloud models. The benchmark was purpose-built for home security AI workflows, not generic chat evaluation.

Why it matters to you

personalized

Qwen3.5-9B fits in 13.8 GB of unified memory and hits 93.8% on a multi-suite agentic benchmark — that's production-viable on any M-series Mac without a single API call. The model is OpenAI-compatible endpoint ready, so swapping it into existing tool-use pipelines requires minimal refactoring. At 25 tok/s and 765ms TTFT, latency is acceptable for async security triage tasks, though not real-time streaming UX.

What to do about it

Pull Qwen3.5-9B via Ollama this week and run your highest-volume classification prompt locally — measure latency and accuracy against your current OpenAI endpoint to get a real cost-elimination number before your next sprint planning.

Try this now

Run: `ollama run qwen3.5:9b` then paste: 'You are a security triage assistant. Classify this event: motion detected at front door, 2am, no known visitor expected. Output: threat_level (low/medium/high), recommended_action, confidence.' Compare output quality to GPT-4o mini on the same prompt.

Community

3 comments