Setup Updated 2026-04-11 6 min read

Gemma 4 Ollama setup: install it, pull the right tag, and test the API.

If you want the fastest realistic path to running Gemma 4 locally, use Ollama first. The basic flow is simple: install Ollama, run ollama pull gemma4 or a specific tag like gemma4:e4b, gemma4:26b, or gemma4:31b, verify the model with ollama list, run it, then confirm the local API on port 11434.

Part of the hub: this setup page belongs to our broader Gemma 4 guide, which also covers model tiers, VRAM planning, Mac usage, and comparison pages.

Quick commands people actually search for

If you came here from searches like ollama pull gemma4, ollama pull gemma4:26b, or how to run Gemma 4 in Ollama, these are the commands you want first:

ollama pull gemma4
ollama pull gemma4:e4b
ollama pull gemma4:26b
ollama pull gemma4:31b
ollama list
ollama ps
ollama run gemma4:e4b

Best default: if you are not sure which tag to pull first, use gemma4:e4b. It is the lowest-risk way to confirm the install, the CLI flow, and the API before you spend time downloading larger tags.

Quick setup steps

  1. Install Ollama.
  2. Pull gemma4 or a specific Gemma 4 tag.
  3. Use ollama list to confirm the model is available.
  4. Run the model in the CLI.
  5. Use the local API and ollama ps to verify it is serving correctly.

Recommended first move: start with gemma4 or gemma4:e4b, confirm the workflow, then decide whether larger tags are worth the extra memory cost.

Install Ollama

Use the official installer from ollama.com. On Linux, the quick shell install is common. On Mac, the desktop app or Homebrew works. On Windows, the official installer is the simplest route.

curl -fsSL https://ollama.com/install.sh | sh
brew install --cask ollama

Pull Gemma 4 and pick the right tag

Ollama's official gemma4 model page shows the default run path first, and that is still the right starting point for most users.

ollama pull gemma4
ollama list
ollama run gemma4

If you want to control the size explicitly, pull a specific tag:

ollama pull gemma4:e2b
ollama pull gemma4:e4b
ollama pull gemma4:26b
ollama pull gemma4:31b

Which Gemma 4 tag should you pull first?

Most users do not really need a full model catalogue. They need one sane starting tag. This table turns the common tag searches into a direct decision.

Your hardware or goal Start with Why this is the right first pull
CPU-only, tiny GPU, or first workflow test ollama pull gemma4:e2b Fastest way to prove the install path works on a constrained machine.
Most laptops, 8 to 12 GB GPU, or Apple Silicon 16 GB+ ollama pull gemma4:e4b Best balance of model quality, speed, and low failure risk.
16 GB GPU or strong Apple Silicon memory ollama pull gemma4:26b Worth trying when you already know the machine can carry a larger tag.
24 GB GPU or high-memory workstation / Mac ollama pull gemma4:31b This is where the largest local tier starts feeling realistic.

Use the default gemma4 tag if you want the simplest “just run it” path. Use explicit tags like gemma4:e4b or gemma4:26b when you want repeatable behavior across machines and future installs.

Once the model is installed, these commands are the most useful first checks:

ollama list
ollama ps
ollama run gemma4 "roses are red"

Use the local API

After the CLI works, test the API immediately. That is the fastest way to know whether your local setup is ready for scripts, tools, or a small app.

curl http://localhost:11434/api/generate \
  -d '{"model":"gemma4","prompt":"Summarize why local AI matters."}'

If you are following a tag-specific workflow, replace gemma4 with the exact tag you pulled. For example, if the search that brought you here was ollama pull gemma4:26b, test the same tag in the API too so you do not accidentally validate the wrong model.

What about Mac, Windows, and Linux?

  • Mac: easiest if you already use Homebrew or the official desktop installer.
  • Windows: use the official installer, then run the same pull and run commands in PowerShell or Terminal.
  • Linux: the shell installer is usually the fastest first step.

Common issues

The model downloads, but inference is painfully slow

Your hardware is probably mismatched to the tag you chose, or your prompt/context size is larger than your machine can carry comfortably.

The model runs, but the system becomes unusable

That is usually memory pressure, not a mysterious Ollama failure. Step down the model size or close other heavy apps.

The API fails even though the CLI works

Check whether the model is still active with ollama ps, then verify your client timeout and request size before debugging anything more complex.

Should you use gemma4 or a specific tag like gemma4:e4b?

Use gemma4 when you want the simplest install path and do not care which exact default tag Ollama points to later. Use a specific tag when you want predictable downloads, repeatable tests, or you are matching the model to hardware intentionally.

Related guides