~/free-browser-based-ai apps ← about me
██████╗ ██████╗ ██████╗ ██╗ ██╗███████╗███████╗██████╗ ██████╗ █████╗ ███████╗███████╗██████╗ █████╗ ██╗ ██╗ ██╗ ███╗ ███╗ ██╔══██╗██╔══██╗██╔═══██╗██║ ██║██╔════╝██╔════╝██╔══██╗ ██╔══██╗██╔══██╗██╔════╝██╔════╝██╔══██╗ ██╔══██╗██║ ██║ ██║ ████╗ ████║ ██████╔╝██████╔╝██║ ██║██║ █╗ ██║███████╗█████╗ ██████╔╝ ██████╔╝███████║███████╗█████╗ ██║ ██║ ███████║██║ ██║ ██║ ██╔████╔██║ ██╔══██╗██╔══██╗██║ ██║██║███╗██║╚════██║██╔══╝ ██╔══██╗ ██╔══██╗██╔══██║╚════██║██╔══╝ ██║ ██║ ██╔══██║██║ ██║ ██║ ██║╚██╔╝██║ ██████╔╝██║ ██║╚██████╔╝╚███╔███╔╝███████║███████╗██║ ██║ ██████╔╝██║ ██║███████║███████╗██████╔╝ ██║ ██║██║ ███████╗███████╗██║ ╚═╝ ██║ ╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══╝╚══╝ ╚══════╝╚══════╝╚═╝ ╚═╝ ╚═════╝ ╚═╝ ╚═╝╚══════╝╚══════╝╚═════╝ ╚═╝ ╚═╝╚═╝ ╚══════╝╚══════╝╚═╝ ╚═╝
Free Browser Based AI LLM_
A real LLM running 100% in your browser · no login · no server · no data leaves your device
Built for demonstration purposes — not tested in heavy-use scenarios. This is a showcase of local small-model capabilities running entirely on your device.
▸ Model not loaded
⚙ System prompt (persona)
First load downloads the weights once (cached in your browser). Subsequent visits start instantly.
▸ Chat
sys
Pick a model and hit "Load model" to begin. Once loaded, this page works fully offline.
0.70 tokens: 0 · speed: 0 tok/s
💡 Writing scripts or long code? Increase max tokens to 2048, 4096, or 8192 above — otherwise the model will cut off mid-response.
How this works
Runtime
WebLLM + WebGPU
Where the model runs
Your GPU. Your tab.
Data leaving your device
Zero
After first download
Works offline
▸ Terminal
jasper@macbook:~/free-browser-based-ai $ inspect --privacy
# checking data flows...
✓ no telemetry
✓ no API calls to inference servers
✓ no cookies, no tracking, no accounts
→ model weights fetched from CDN, then cached locally
→ inference runs on your GPU via WebGPU
⚠ small models = limited reasoning. Don't trust it with facts.
Model comparison
ModelMakerSizeModeBest for
Qwen 2.5 0.5BAlibaba~350 MBWebGPUFastest load, low-end GPUs
TinyLlama 1.1BStatNLP~650 MBWebGPUVery light, basic rewrites
Llama 3.2 1BMeta~880 MBWebGPUBest all-round default
Qwen 2.5 Coder 1.5BAlibaba~950 MBWebGPUCoding help
Gemma 2 2BGoogle~1.5 GBWebGPUNatural, fluent prose
Phi 3.5 MiniMicrosoft~2.2 GBWebGPUHighest quality, reasoning & code
Mistral 7B v0.3Mistral AI~4.3 GBWebGPUPowerful GPUs only
SmolLM2 135M / 360MHuggingFace~70–200 MBWASM (mobile)Phones, universal CPU
Sizes are approximate quantised (q4) download sizes. WebGPU models need Chrome/Edge/Brave on desktop; WASM models run on any device including iOS and Android.
FAQ — Frequently Asked Questions about Free Browser Based AI

Frequently Asked Questions — Free Browser Based AI

Does my data get sent anywhere?

No. Everything runs in your browser tab. The only network request is the one-time download of the model weights from the Hugging Face CDN. After that, you could disconnect your internet and this would still work.

Why does the first load take so long?

The model weights are hundreds of megabytes to a few gigabytes depending on which one you pick. They need to be downloaded once and are then cached in your browser (via the Cache API). Next time you visit, it loads in seconds.

Why does my browser say "WebGPU not available"?

WebGPU is still rolling out. Works great on recent Chrome, Edge, and Brave on desktop. On Safari and Firefox you may need to enable it manually. On most mobile browsers it's not yet reliable. See caniuse.com/webgpu.

Why is the model so... dumb compared to ChatGPT or Claude?

Because it is literally a thousand times smaller. Frontier models have hundreds of billions of parameters. A 1B model running in your browser is about 0.3% of that. It's impressive it works at all. Use it for quick tasks (rewrite, summarize, extract) — not for reasoning or factual lookups.

Which model should I pick?

Qwen 2.5 0.5B (Alibaba) — Pick this if speed is everything. At ~350 MB it loads fastest and works well on low-end hardware. Great for quick rewrites, simple Q&A, and instant demos. Don't expect deep reasoning.

Llama 3.2 1B (Meta) — The recommended default for most people. At ~880 MB it balances download size and quality well. Strong instruction-following, handles summarization, rewrites, and everyday text tasks reliably. Start here if you're unsure.

Llama 3.2 3B (Meta) — A clear step up in coherence and reasoning depth. Handles longer documents, light coding tasks, and more structured outputs. Worth it if you have a modern GPU and don't mind the ~2.1 GB download.

Phi 3.5 Mini (Microsoft) — The highest-quality model in this list. Trained on Microsoft Research's curated synthetic datasets, it outperforms much larger models on reasoning, coding, and math benchmarks. Best pick for coding assistance, KQL writing, structured JSON extraction, and logical step-by-step tasks. Supports 20+ languages and a 128K token context window. Choose this when quality matters more than speed.

Gemma 2 2B (Google DeepMind) — Best for natural-sounding, fluent text. Uses knowledge distillation from larger Gemma models, making its prose unusually smooth and readable for its size. Great for creative writing, polished summaries, and natural rewrites. At ~1.5 GB it's a good middle ground between size and quality.

Is this tool free to use?

Yes, completely free. No account, no subscription, no watermark. One of the free browser-based tools at jasperbernaers.com.

What tech powers this?

WebLLM by MLC-AI, which compiles open-weights models (Llama, Qwen, Phi, Gemma) to run on WebGPU. Everything else is vanilla JavaScript and CSS in a single HTML file.

Is browser-based AI actually private?

Yes. Unlike cloud chatbots, the model itself runs inside your browser tab on your own hardware. Your prompts and the model's replies never leave your device — there are no API calls to an inference server, no logging, and no account. The only network request is the one-time model download.

Can I run an LLM offline with no signup?

Yes — that's the whole point. After the model weights download once and cache, you can switch off your internet connection and keep chatting. No login, no subscription, no email required.

Can I change the AI's behaviour or persona?

Yes. Open the System prompt (persona) box under the model picker and describe how you want it to respond — e.g. "You are a terse senior engineer" or "Always answer in Dutch". It's applied to every message in the conversation.