Running Gemini Nano On-Device: Your First Android AI Feature Without a Server

Why On-Device AI Matters for Android Developers

Every AI feature you ship today probably depends on a network call. The user types something, your app hits an API, waits for a response, and displays the result. It works, but it comes with latency, server costs, and the uncomfortable fact that your app is useless without a connection. Gemini Nano changes that equation entirely.

Gemini Nano is Google’s smallest model in the Gemini family, designed specifically to run on-device on Android phones. With the AI Core system service and the AICore SDK, you can run inference directly on the user’s hardware — no API key, no server, no round trips. In this post, you’ll set up Gemini Nano in an Android project and build a simple text summarization feature that works entirely offline. This is a key piece of the AI-assisted Android development workflow that puts intelligence directly on the device.

Prerequisites and Device Support

Before you write a single line of code, there are some hardware and software requirements to be aware of. Gemini Nano currently runs on Pixel 8, Pixel 8 Pro, Pixel 9 series, and Samsung Galaxy S24 and newer devices. The model is delivered through Google AI Core, a system-level APK that manages model downloads and inference.

Your project needs a minimum SDK of 31 (Android 12) and you’ll want to target SDK 35. You also need to enroll in the Android AI Early Access program if you haven’t already, though this requirement may be lifted by the time you read this.

Adding the AICore Dependency

Start by adding the AI Core dependency to your module-level build.gradle.kts:

dependencies {
    implementation("com.google.ai.edge.aicore:aicore:0.0.1-exp02")
}

Sync the project. The library is lightweight — the heavy lifting happens inside the AI Core system service, which manages model downloads separately from your APK. This means your app bundle size stays small.

Next, add the required permission to your AndroidManifest.xml:

<uses-permission android:name="com.google.android.providers.gsf.permission.READ_GSERVICES" />

Checking Model Availability at Runtime

Not every device has the model downloaded yet. Google AI Core downloads Gemini Nano in the background, and the user can control this in system settings. You need to check availability before attempting inference:

import com.google.ai.edge.aicore.AICore
import com.google.ai.edge.aicore.ModelAvailability

suspend fun isGeminiNanoAvailable(): Boolean {
    return try {
        val availability = AICore.getOnDeviceModel(
            modelName = "gemini-nano"
        ).availability
        availability == ModelAvailability.READY
    } catch (e: Exception) {
        false
    }
}

If the model isn’t available, you can request a download or fall back to a cloud-based solution. The graceful degradation pattern here is important — you should never assume the model is present.

Building a Simple Summarizer

Let’s build something practical. Imagine you have an app that displays articles, and you want to offer a “Summarize” button that works instantly, even offline. Here’s the ViewModel:

class SummarizerViewModel : ViewModel() {

    private val _summary = MutableStateFlow("")
    val summary: StateFlow = _summary.asStateFlow()

    private val _isLoading = MutableStateFlow(false)
    val isLoading: StateFlow = _isLoading.asStateFlow()

    private val model by lazy {
        AICore.getOnDeviceModel(modelName = "gemini-nano")
    }

    fun summarize(articleText: String) {
        viewModelScope.launch {
            _isLoading.value = true
            try {
                val prompt = "Summarize this article in 2-3 sentences: $articleText"
                val response = model.generateContent(userMessage = prompt)
                _summary.value = response.text ?: "Could not generate a summary."
            } catch (e: Exception) {
                _summary.value = "Summarization failed: ${e.message}"
            } finally {
                _isLoading.value = false
            }
        }
    }
}

The approach here keeps the prompt simple and direct — pass the article and instruction together. For summarization tasks, you want the model to stick close to the source material rather than get creative. The prompt guides the model’s behavior without needing additional configuration parameters.

Wiring It Up in Compose

The Compose UI is straightforward. Collect the state and show a loading indicator while inference runs:

@Composable
fun SummarizerScreen(
    viewModel: SummarizerViewModel = viewModel(),
    articleText: String
) {
    val summary by viewModel.summary.collectAsStateWithLifecycle()
    val isLoading by viewModel.isLoading.collectAsStateWithLifecycle()

    Column(modifier = Modifier.padding(16.dp)) {
        Button(
            onClick = { viewModel.summarize(articleText) },
            enabled = !isLoading
        ) {
            Text(if (isLoading) "Summarizing..." else "Summarize")
        }

        if (summary.isNotEmpty()) {
            Spacer(modifier = Modifier.height(16.dp))
            Text(
                text = summary,
                style = MaterialTheme.typography.bodyLarge
            )
        }
    }
}

On a Pixel 9 Pro, inference for a 500-word article typically completes in under two seconds. That’s faster than most API round trips, and it works in airplane mode.

Performance Tips and Gotchas

After building a few features with Gemini Nano, here are the patterns that work well and the traps to watch out for:

Keep prompts short. Gemini Nano has a smaller context window than its cloud siblings. Long prompts eat into your output budget. For summarization, trim the input to the first 2,000 characters if needed.

Reuse the model instance. Fetching the model is not free — the lazy initialization in the ViewModel above ensures you only pay that cost once. Don’t create a new instance per request.

Handle the download state. Some users will have AI Core installed but the model not yet downloaded. Show a clear message and offer a way to trigger the download rather than silently failing.

Test on real hardware. The emulator doesn’t support AI Core. You need a physical device with the model available. If you’ve been following our guide on testing and debugging workflows for Android, add a device check to your test setup to skip Gemini Nano tests on unsupported hardware.

When to Use On-Device vs Cloud AI

Gemini Nano isn’t a replacement for the full Gemini API. It’s a complement. Use on-device inference for features that need to be fast, private, or work offline: text summarization, smart replies, simple classification, or content suggestions. Use cloud models when you need larger context windows, image understanding, or complex multi-turn reasoning.

The best apps will use both. Check if the model is available, run the inference on-device when you can, and fall back to the cloud when you need more power. Your users won’t know or care where the inference happens — they’ll just notice that your app feels faster and works everywhere.

On-device AI is moving from experiment to expectation. Gemini Nano gives you a clean, well-supported way to start shipping these features today. Pick one feature in your app that could benefit from instant, offline intelligence, and give it a try.

This post was written by a human with the help of Claude, an AI assistant by Anthropic.