Problem: Your Android App Needs AI Without the Cloud
You want to add text summarization, smart replies, or content classification to your Android app — but cloud API calls mean latency, cost, and privacy concerns.
Android 16 ships with Gemini Nano baked in. Here's how to use it directly from Kotlin.
You'll learn:
- How to check if Gemini Nano is available on a device
- How to run inference with the Android AI Core API
- How to stream responses for a responsive UX
Time: 25 min | Level: Intermediate
Why This Happens
Google introduced the AICore system service in Android 14 QPR1 and expanded it significantly for Android 16. The model runs on-device via a system-level singleton — your app doesn't bundle the model weights, it requests access through a structured API.
This means:
- Gemini Nano must be downloaded on the device (happens automatically on supported hardware)
- Pixel 9+ and devices with 8GB+ RAM are primary targets
- The API is part of Google Play Services, not AOSP
Common blockers:
- Device doesn't meet hardware requirements (API returns
FEATURE_NOT_SUPPORTED) - Model not yet downloaded (API returns
DOWNLOADING) - Requesting too large a context window for on-device limits
Your app talks to AICore as a system service — the model never lives in your APK
Solution
Step 1: Add Dependencies
// build.gradle.kts (app)
dependencies {
// Android AI Core - on-device inference
implementation("com.google.android.gms:play-services-mlkit-language-id:17.0.0")
implementation("com.google.ai.edge.aicore:aicore:0.0.1-exp01")
}
Also add the required manifest declaration:
<!-- AndroidManifest.xml -->
<uses-feature
android:name="android.software.ai_capabilities"
android:required="false" />
Setting required="false" lets your app install on all devices — you'll handle availability gracefully in code.
Expected: Gradle sync completes, no dependency conflicts.
If it fails:
- Unresolved reference
aicore: Confirm you're on AGP 8.5+ and targetingcompileSdk = 36 - Manifest merge conflict: Remove duplicate
<uses-feature>from a library
Step 2: Check Availability Before Use
Never assume Gemini Nano is ready. Always check first.
import com.google.ai.edge.aicore.GenerativeModel
import com.google.ai.edge.aicore.DownloadCallback
import com.google.ai.edge.aicore.generationConfig
class AiAvailabilityChecker(private val context: Context) {
suspend fun checkAndPrepare(): Result<GenerativeModel> {
val config = generationConfig {
context = this@AiAvailabilityChecker.context
}
return try {
val model = GenerativeModel(
generationConfig = config
)
// prepareInferenceSession triggers download if needed
when (val state = model.checkAvailability()) {
AvailabilityStatus.AVAILABLE -> Result.success(model)
AvailabilityStatus.DOWNLOADING -> {
// Wait for download to complete
awaitDownload(model)
}
else -> Result.failure(
UnsupportedOperationException("Gemini Nano not supported: $state")
)
}
} catch (e: Exception) {
Result.failure(e)
}
}
private suspend fun awaitDownload(model: GenerativeModel): Result<GenerativeModel> {
return suspendCoroutine { cont ->
model.downloadModel(object : DownloadCallback {
override fun onDownloadCompleted() {
cont.resume(Result.success(model))
}
override fun onDownloadFailed(e: Exception) {
cont.resume(Result.failure(e))
}
})
}
}
}
Why checkAvailability() first: Calling generateContent() on an unavailable model throws a checked exception. Explicit state checking makes your error handling cleaner and your UX more informative.
Step 3: Run Inference with Streaming
Streaming matters here — on-device models can be slower than cloud APIs on mid-range hardware.
import com.google.ai.edge.aicore.generationConfig
import kotlinx.coroutines.flow.Flow
class OnDeviceAiService(private val context: Context) {
private val model by lazy {
GenerativeModel(
generationConfig = generationConfig {
context = this@OnDeviceAiService.context
// Keep temperature low for factual tasks
temperature = 0.2f
// On-device limit: typically 512-1024 tokens output
maxOutputTokens = 512
}
)
}
// Streaming: emit tokens as they're generated
fun summarize(text: String): Flow<String> {
val prompt = buildPrompt(text)
return model.generateContentStream(prompt)
.map { chunk -> chunk.text ?: "" }
}
// Non-streaming: wait for full response
suspend fun classify(text: String): String {
val prompt = "Classify this text into one category " +
"(positive/negative/neutral): $text"
val response = model.generateContent(prompt)
return response.text?.trim() ?: "unknown"
}
private fun buildPrompt(input: String): String =
"""
Summarize the following in 2-3 sentences. Be concise.
Text: $input
Summary:
""".trimIndent()
}
Expected: summarize() returns a Flow<String> that emits tokens progressively.
If it fails:
maxOutputTokensexception: On-device models cap at 512 on most Pixel 9 devices — reduce if you seeTOKEN_LIMIT_EXCEEDED- Empty
chunk.text: Some chunks carry metadata only; the?: ""fallback handles this
Step 4: Wire Up to Your UI (Compose)
@Composable
fun SummarizerScreen(
aiService: OnDeviceAiService = remember { OnDeviceAiService(LocalContext.current) }
) {
var inputText by remember { mutableStateOf("") }
var summary by remember { mutableStateOf("") }
var isLoading by remember { mutableStateOf(false) }
val scope = rememberCoroutineScope()
Column(modifier = Modifier.padding(16.dp)) {
OutlinedTextField(
value = inputText,
onValueChange = { inputText = it },
label = { Text("Paste text to summarize") },
modifier = Modifier.fillMaxWidth()
)
Spacer(modifier = Modifier.height(8.dp))
Button(
onClick = {
isLoading = true
summary = ""
scope.launch {
aiService.summarize(inputText)
.collect { token ->
summary += token // Stream tokens into UI
}
isLoading = false
}
},
enabled = inputText.isNotBlank() && !isLoading
) {
Text(if (isLoading) "Summarizing..." else "Summarize")
}
Spacer(modifier = Modifier.height(16.dp))
if (summary.isNotBlank()) {
Text(
text = summary,
style = MaterialTheme.typography.bodyMedium
)
}
}
}
Why stream into UI: First token appears in ~300ms on Pixel 9. Without streaming, the user sees a blank screen for 3-8 seconds. The += pattern gives the typing effect for free.
Tokens stream in progressively — no spinner needed
Verification
# Run on a physical device (emulator doesn't support AICore)
adb shell cmd aicore status
You should see:
AICore status: AVAILABLE
Gemini Nano: DOWNLOADED (v1.x.x)
Then run your app and paste a paragraph of text. Summary should begin appearing within 1 second.
Logcat showing inference completing with token count and latency
What You Learned
- Gemini Nano lives in the system, not your APK — always check availability before calling inference
- Streaming (
generateContentStream) is essential for on-device AI UX; latency is real - Token limits are hardware-dependent — design prompts to stay under 512 output tokens
Limitations to know:
- No on-device fine-tuning — you work with the base model only
- Context window is smaller than cloud Gemini; chunk long documents before sending
- Emulators don't support AICore — test on physical hardware only
When NOT to use this:
- Tasks requiring >1000 output tokens (summarizing entire books, long-form generation) — fall back to cloud
- Devices running Android <14 QPR1 — add a cloud fallback path
Tested on Android 16 DP2, Pixel 9 Pro, Gemini Nano 1.0, Kotlin 2.0, Compose BOM 2025.04