How to use Gemma 4 in Android Studio as your local coding model
Gemma 4 is Google's officially recommended local model for Android Studio's Agent Mode. Your code never leaves your machine, there's no API key required, and it works offline. This guide covers the exact steps to get it running, the hardware you actually need, and what Agent Mode can do once it's set up.
Jump to
Why Gemma 4 specifically for Android Studio
Gemma 4 isn't just a generic local model that happens to work in Android Studio — it was explicitly trained on Android development patterns. That means it understands Kotlin idioms, Jetpack Compose, and Android-specific code structures at a deeper level than a general-purpose model would.
Three practical reasons to use it instead of a cloud model:
- Privacy: proprietary code stays on your machine. No prompts are sent to Google or any API endpoint during inference.
- No quota limits: run as many Agent Mode requests as you want without hitting rate limits or unexpected bills.
- Offline use: works without internet once the model is downloaded. Useful in restricted corporate environments or on flights.
Hardware requirements
Google's official recommendation is the 26B MoE model for the best Agent Mode experience. The E4B is also supported for lower-spec machines.
| Model | Minimum RAM | Storage needed | Best for |
|---|---|---|---|
| Gemma 4 E4B | 12 GB RAM total | ~4 GB | Laptops, lower-spec dev machines |
| Gemma 4 26B A4B ✅ recommended | 24 GB RAM total | ~17 GB | Workstations, high-spec dev machines |
Important: these RAM numbers include both Android Studio's own memory usage and the model. Android Studio typically needs 4–8 GB on its own. If you have exactly 16 GB total RAM, the 26B model will be tight — use E4B instead and you'll get a better experience than a constantly-swapping 26B.
GPU is not strictly required — both models can run on CPU — but a discrete GPU or Apple Silicon with enough unified memory makes generation noticeably faster.
Setup steps
The setup has four parts: install Android Studio, install an LLM provider, pull the model, and connect them. The whole process takes about 20–30 minutes depending on your download speed.
Step 1 — Install the latest Android Studio
You need a recent version of Android Studio (Meerkat or newer) that includes the local model provider integration. Download from developer.android.com/studio.
If you already have Android Studio installed, check for updates: Help → Check for Updates.
Step 2 — Install an LLM provider
Android Studio doesn't run the model directly — it connects to a local inference server. You need one of these two:
Option A: Ollama (recommended — command-line, fastest setup)
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull the recommended model
ollama pull gemma4:26b
# Start the server (runs automatically after install on Mac/Linux)
ollama serve
Option B: LM Studio (GUI — easier if you prefer no terminal)
- Download from lmstudio.ai
- Open the app, search "gemma4" in the model browser
- Download Gemma 4 E4B or 26B A4B (pre-quantized GGUF files available)
- Click "Start Server" in the Local Server tab
Step 3 — Connect the provider to Android Studio
- Open Android Studio
- Go to Settings → Tools → AI → Model Providers
- Click Add and select Ollama or LM Studio
- Enter the server address:
- Ollama default:
http://localhost:11434 - LM Studio default:
http://localhost:1234
- Ollama default:
- Click Test Connection — you should see a green checkmark
- Click Apply
Step 4 — Select Gemma 4 in Agent Mode
- Open the Gemini panel in Android Studio (the star icon in the toolbar)
- Switch to Agent Mode
- Click the model selector dropdown at the bottom of the panel
- Select Gemma 4 from the available local models
You're ready. Type a prompt and Gemma 4 will handle it entirely on your machine.
What Agent Mode can actually do with Gemma 4
Agent Mode with Gemma 4 goes beyond autocomplete. It can execute multi-step tasks across your codebase. Practical examples that work well:
| Task | Example prompt | What Gemma 4 does |
|---|---|---|
| Build a feature | "Build a settings screen with dark mode toggle" | Generates Kotlin + Compose UI, wires ViewModel, follows Material 3 patterns |
| Refactor legacy code | "Migrate all XML layouts in this activity to Compose" | Reads existing files, rewrites them, preserves logic |
| Fix build errors | "Build my project and fix any errors" | Runs build, reads error output, applies fixes iteratively |
| Externalize strings | "Extract all hardcoded strings and add them to strings.xml" | Scans codebase, applies changes across multiple files |
| Write tests | "Write unit tests for the UserRepository class" | Generates JUnit tests with mocking, follows Android testing conventions |
Gemma 4's native function calling is what makes these multi-step tasks reliable. It can call tools (read file, write file, run build) and chain them sequentially without losing context between steps.
Troubleshooting
Android Studio can't connect to Ollama
First confirm Ollama is running: open a terminal and run ollama ps. If nothing shows, run ollama serve. On Windows, check that Ollama appears in the system tray. Then verify the URL in Android Studio settings is exactly http://localhost:11434 with no trailing slash.
Generation is very slow
The 26B model needs 24 GB RAM headroom to avoid swapping. If Android Studio, Chrome, and other apps are open alongside, you may not have enough. Close memory-heavy apps, or switch to the E4B model which runs comfortably in 12 GB.
Model downloaded but doesn't appear in Agent Mode selector
Try restarting Android Studio after connecting the provider. The model list is populated when the IDE starts. Also confirm the Ollama server is running before you open Android Studio, not after.
Agent Mode starts but stops mid-task
This is usually a timeout issue. Ollama's default request timeout may be too short for complex multi-step tasks. Set a longer timeout by adding OLLAMA_KEEP_ALIVE=30m to your environment before starting Ollama.