No. Everything runs in your browser tab. The only network request is the one-time download of the model weights from the Hugging Face CDN. After that, you could disconnect your internet and this would still work.
The model weights are hundreds of megabytes to a few gigabytes depending on which one you pick. They need to be downloaded once and are then cached in your browser (via the Cache API). Next time you visit, it loads in seconds.
WebGPU is still rolling out. Works great on recent Chrome, Edge, and Brave on desktop. On Safari and Firefox you may need to enable it manually. On most mobile browsers it's not yet reliable. See caniuse.com/webgpu.
Because it is literally a thousand times smaller. Frontier models have hundreds of billions of parameters. A 1B model running in your browser is about 0.3% of that. It's impressive it works at all. Use it for quick tasks (rewrite, summarize, extract) — not for reasoning or factual lookups.
Qwen 2.5 0.5B (Alibaba) — Pick this if speed is everything. At ~350 MB it loads fastest and works well on low-end hardware. Great for quick rewrites, simple Q&A, and instant demos. Don't expect deep reasoning.
Llama 3.2 1B (Meta) — The recommended default for most people. At ~880 MB it balances download size and quality well. Strong instruction-following, handles summarization, rewrites, and everyday text tasks reliably. Start here if you're unsure.
Llama 3.2 3B (Meta) — A clear step up in coherence and reasoning depth. Handles longer documents, light coding tasks, and more structured outputs. Worth it if you have a modern GPU and don't mind the ~2.1 GB download.
Phi 3.5 Mini (Microsoft) — The highest-quality model in this list. Trained on Microsoft Research's curated synthetic datasets, it outperforms much larger models on reasoning, coding, and math benchmarks. Best pick for coding assistance, KQL writing, structured JSON extraction, and logical step-by-step tasks. Supports 20+ languages and a 128K token context window. Choose this when quality matters more than speed.
Gemma 2 2B (Google DeepMind) — Best for natural-sounding, fluent text. Uses knowledge distillation from larger Gemma models, making its prose unusually smooth and readable for its size. Great for creative writing, polished summaries, and natural rewrites. At ~1.5 GB it's a good middle ground between size and quality.
Yes, completely free. No account, no subscription, no watermark. One of the free browser-based tools at jasperbernaers.com.