No. Everything runs in your browser tab. The only network request is the one-time download of the model weights from the Hugging Face CDN. After that, you could disconnect your internet and this would still work.
The model weights are hundreds of megabytes to a few gigabytes depending on which one you pick. They need to be downloaded once and are then cached in your browser (via the Cache API). Next time you visit, it loads in seconds.
WebGPU is still rolling out. Works great on recent Chrome, Edge, and Brave on desktop. On Safari and Firefox you may need to enable it manually. On most mobile browsers it's not yet reliable. See caniuse.com/webgpu.
Because it is literally a thousand times smaller. Frontier models have hundreds of billions of parameters. A 1B model running in your browser is about 0.3% of that. It's impressive it works at all. Use it for quick tasks (rewrite, summarize, extract) — not for reasoning or factual lookups.
Qwen 0.5B for instant demos on any device. Llama 3.2 1B for a good balance. Llama 3.2 3B or Phi 3.5 if you have a solid GPU and the patience for a ~2 GB download — the quality jump is noticeable.
Yes, completely free. No account, no subscription, no watermark. One of the free browser-based tools at jasperbernaers.com.