Qwen 2.5 0.5B (Alibaba) — Pick this if speed is everything. At ~350 MB it loads fastest and works well on low-end hardware. Great for quick rewrites, simple Q&A, and instant demos. Don't expect deep reasoning.
Llama 3.2 1B (Meta) — The recommended default for most people. At ~880 MB it balances download size and quality well. Strong instruction-following, handles summarization, rewrites, and everyday text tasks reliably. Start here if you're unsure.
Llama 3.2 3B (Meta) — A clear step up in coherence and reasoning depth. Handles longer documents, light coding tasks, and more structured outputs. Worth it if you have a modern GPU and don't mind the ~2.1 GB download.
Phi 3.5 Mini (Microsoft) — The highest-quality model in this list. Trained on Microsoft Research's curated synthetic datasets, it outperforms much larger models on reasoning, coding, and math benchmarks. Best pick for coding assistance, KQL writing, structured JSON extraction, and logical step-by-step tasks. Supports 20+ languages and a 128K token context window. Choose this when quality matters more than speed.
Gemma 2 2B (Google DeepMind) — Best for natural-sounding, fluent text. Uses knowledge distillation from larger Gemma models, making its prose unusually smooth and readable for its size. Great for creative writing, polished summaries, and natural rewrites. At ~1.5 GB it's a good middle ground between size and quality.