Setup Updated 2026-04-03 6 min read

Gemma 4 Ollama setup: install it, pull the right tag, and test the API.

If you want the fastest realistic path to running Gemma 4 locally, use Ollama first. The basic flow is simple: install Ollama, pull gemma4 or a specific tag, verify the model with ollama list, run it, then confirm the local API on port 11434.

Part of the hub: this setup page belongs to our broader Gemma 4 guide, which also covers model tiers, VRAM planning, Mac usage, and comparison pages.

Quick setup steps

  1. Install Ollama.
  2. Pull gemma4 or a specific Gemma 4 tag.
  3. Use ollama list to confirm the model is available.
  4. Run the model in the CLI.
  5. Use the local API and ollama ps to verify it is serving correctly.

Recommended first move: start with gemma4 or gemma4:e4b, confirm the workflow, then decide whether larger tags are worth the extra memory cost.

Install Ollama

Use the official installer from ollama.com. On Linux, the quick shell install is common. On Mac, the desktop app or Homebrew works. On Windows, the official installer is the simplest route.

curl -fsSL https://ollama.com/install.sh | sh
brew install --cask ollama

Pull Gemma 4 and pick the right tag

Ollama's official gemma4 model page shows the default run path first, and that is still the right starting point for most users.

ollama pull gemma4
ollama list
ollama run gemma4

If you want to control the size explicitly, pull a specific tag:

ollama pull gemma4:e2b
ollama pull gemma4:e4b
ollama pull gemma4:26b
ollama pull gemma4:31b

Once the model is installed, these commands are the most useful first checks:

ollama list
ollama ps
ollama run gemma4 "roses are red"

Use the local API

After the CLI works, test the API immediately. That is the fastest way to know whether your local setup is ready for scripts, tools, or a small app.

curl http://localhost:11434/api/generate \
  -d '{"model":"gemma4","prompt":"Summarize why local AI matters."}'

What about Mac, Windows, and Linux?

  • Mac: easiest if you already use Homebrew or the official desktop installer.
  • Windows: use the official installer, then run the same pull and run commands in PowerShell or Terminal.
  • Linux: the shell installer is usually the fastest first step.

Common issues

The model downloads, but inference is painfully slow

Your hardware is probably mismatched to the tag you chose, or your prompt/context size is larger than your machine can carry comfortably.

The model runs, but the system becomes unusable

That is usually memory pressure, not a mysterious Ollama failure. Step down the model size or close other heavy apps.

The API fails even though the CLI works

Check whether the model is still active with ollama ps, then verify your client timeout and request size before debugging anything more complex.

Related guides