~/free-browser-based-ai apps ← about me
██████╗ ██████╗ ██████╗ ██╗ ██╗███████╗███████╗██████╗ ██████╗ █████╗ ███████╗███████╗██████╗ █████╗ ██╗ ██╗ ██╗ ███╗ ███╗ ██╔══██╗██╔══██╗██╔═══██╗██║ ██║██╔════╝██╔════╝██╔══██╗ ██╔══██╗██╔══██╗██╔════╝██╔════╝██╔══██╗ ██╔══██╗██║ ██║ ██║ ████╗ ████║ ██████╔╝██████╔╝██║ ██║██║ █╗ ██║███████╗█████╗ ██████╔╝ ██████╔╝███████║███████╗█████╗ ██║ ██║ ███████║██║ ██║ ██║ ██╔████╔██║ ██╔══██╗██╔══██╗██║ ██║██║███╗██║╚════██║██╔══╝ ██╔══██╗ ██╔══██╗██╔══██║╚════██║██╔══╝ ██║ ██║ ██╔══██║██║ ██║ ██║ ██║╚██╔╝██║ ██████╔╝██║ ██║╚██████╔╝╚███╔███╔╝███████║███████╗██║ ██║ ██████╔╝██║ ██║███████║███████╗██████╔╝ ██║ ██║██║ ███████╗███████╗██║ ╚═╝ ██║ ╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══╝╚══╝ ╚══════╝╚══════╝╚═╝ ╚═╝ ╚═════╝ ╚═╝ ╚═╝╚══════╝╚══════╝╚═════╝ ╚═╝ ╚═╝╚═╝ ╚══════╝╚══════╝╚═╝ ╚═╝
Free Browser Based AI LLM_
A real LLM running 100% in your browser · no login · no server · no data leaves your device
Built for demonstration purposes — not tested in heavy-use scenarios. This is a showcase of local small-model capabilities running entirely on your device.
▸ Model not loaded
First load downloads the weights once (cached in your browser). Subsequent visits start instantly.
▸ Chat
sys
Pick a model and hit "Load model" to begin. Once loaded, this page works fully offline.
0.70 tokens: 0 · speed: 0 tok/s
💡 Writing scripts or long code? Increase max tokens to 2048, 4096, or 8192 above — otherwise the model will cut off mid-response.
How this works
Runtime
WebLLM + WebGPU
Where the model runs
Your GPU. Your tab.
Data leaving your device
Zero
After first download
Works offline
▸ Terminal
jasper@macbook:~/free-browser-based-ai $ inspect --privacy
# checking data flows...
✓ no telemetry
✓ no API calls to inference servers
✓ no cookies, no tracking, no accounts
→ model weights fetched from CDN, then cached locally
→ inference runs on your GPU via WebGPU
⚠ small models = limited reasoning. Don't trust it with facts.
FAQ — Frequently Asked Questions about Free Browser Based AI

Frequently Asked Questions — Free Browser Based AI

Does my data get sent anywhere?

No. Everything runs in your browser tab. The only network request is the one-time download of the model weights from the Hugging Face CDN. After that, you could disconnect your internet and this would still work.

Why does the first load take so long?

The model weights are hundreds of megabytes to a few gigabytes depending on which one you pick. They need to be downloaded once and are then cached in your browser (via the Cache API). Next time you visit, it loads in seconds.

Why does my browser say "WebGPU not available"?

WebGPU is still rolling out. Works great on recent Chrome, Edge, and Brave on desktop. On Safari and Firefox you may need to enable it manually. On most mobile browsers it's not yet reliable. See caniuse.com/webgpu.

Why is the model so... dumb compared to ChatGPT or Claude?

Because it is literally a thousand times smaller. Frontier models have hundreds of billions of parameters. A 1B model running in your browser is about 0.3% of that. It's impressive it works at all. Use it for quick tasks (rewrite, summarize, extract) — not for reasoning or factual lookups.

Which model should I pick?

Qwen 2.5 0.5B (Alibaba) — Pick this if speed is everything. At ~350 MB it loads fastest and works well on low-end hardware. Great for quick rewrites, simple Q&A, and instant demos. Don't expect deep reasoning.

Llama 3.2 1B (Meta) — The recommended default for most people. At ~880 MB it balances download size and quality well. Strong instruction-following, handles summarization, rewrites, and everyday text tasks reliably. Start here if you're unsure.

Llama 3.2 3B (Meta) — A clear step up in coherence and reasoning depth. Handles longer documents, light coding tasks, and more structured outputs. Worth it if you have a modern GPU and don't mind the ~2.1 GB download.

Phi 3.5 Mini (Microsoft) — The highest-quality model in this list. Trained on Microsoft Research's curated synthetic datasets, it outperforms much larger models on reasoning, coding, and math benchmarks. Best pick for coding assistance, KQL writing, structured JSON extraction, and logical step-by-step tasks. Supports 20+ languages and a 128K token context window. Choose this when quality matters more than speed.

Gemma 2 2B (Google DeepMind) — Best for natural-sounding, fluent text. Uses knowledge distillation from larger Gemma models, making its prose unusually smooth and readable for its size. Great for creative writing, polished summaries, and natural rewrites. At ~1.5 GB it's a good middle ground between size and quality.

Is this tool free to use?

Yes, completely free. No account, no subscription, no watermark. One of the free browser-based tools at jasperbernaers.com.

What tech powers this?

WebLLM by MLC-AI, which compiles open-weights models (Llama, Qwen, Phi, Gemma) to run on WebGPU. Everything else is vanilla JavaScript and CSS in a single HTML file.