Gemma 4 Ollama setup: install it, pull the right tag, and test the API.
If you want the fastest realistic path to running Gemma 4 locally, use Ollama first. The basic flow is simple: install Ollama, pull gemma4 or a specific tag, verify the model with ollama list, run it, then confirm the local API on port 11434.
Part of the hub: this setup page belongs to our broader Gemma 4 guide, which also covers model tiers, VRAM planning, Mac usage, and comparison pages.
Jump to
Quick setup steps
- Install Ollama.
- Pull
gemma4or a specific Gemma 4 tag. - Use
ollama listto confirm the model is available. - Run the model in the CLI.
- Use the local API and
ollama psto verify it is serving correctly.
Recommended first move: start with gemma4 or gemma4:e4b, confirm the workflow, then decide whether larger tags are worth the extra memory cost.
Install Ollama
Use the official installer from ollama.com. On Linux, the quick shell install is common. On Mac, the desktop app or Homebrew works. On Windows, the official installer is the simplest route.
curl -fsSL https://ollama.com/install.sh | sh
brew install --cask ollama
Pull Gemma 4 and pick the right tag
Ollama's official gemma4 model page shows the default run path first, and that is still the right starting point for most users.
ollama pull gemma4 ollama list ollama run gemma4
If you want to control the size explicitly, pull a specific tag:
ollama pull gemma4:e2b ollama pull gemma4:e4b ollama pull gemma4:26b ollama pull gemma4:31b
Once the model is installed, these commands are the most useful first checks:
ollama list ollama ps ollama run gemma4 "roses are red"
Use the local API
After the CLI works, test the API immediately. That is the fastest way to know whether your local setup is ready for scripts, tools, or a small app.
curl http://localhost:11434/api/generate \
-d '{"model":"gemma4","prompt":"Summarize why local AI matters."}'
What about Mac, Windows, and Linux?
- Mac: easiest if you already use Homebrew or the official desktop installer.
- Windows: use the official installer, then run the same pull and run commands in PowerShell or Terminal.
- Linux: the shell installer is usually the fastest first step.
Common issues
The model downloads, but inference is painfully slow
Your hardware is probably mismatched to the tag you chose, or your prompt/context size is larger than your machine can carry comfortably.
The model runs, but the system becomes unusable
That is usually memory pressure, not a mysterious Ollama failure. Step down the model size or close other heavy apps.
The API fails even though the CLI works
Check whether the model is still active with ollama ps, then verify your client timeout and request size before debugging anything more complex.