Baseline Profiles and Macrobenchmark: Measure and Ship a Faster Android App

Why Your App Feels Slow on First Launch

You’ve optimized your layouts, trimmed your dependency graph, and profiled your Compose recompositions. Yet users still report that your app feels sluggish the first time they open it after install or update. The culprit is usually JIT compilation — the Android Runtime compiles your app’s bytecode to native code on the fly, and those first few seconds pay the price.

Baseline Profiles solve this by telling the system which code paths to pre-compile ahead of time. Combined with Macrobenchmark — Android’s library for measuring real-world startup and runtime performance — you get a complete workflow: measure the problem, apply the fix, and verify the improvement. In this post, you’ll set up both in an existing project.

What Baseline Profiles Actually Do

When you install an APK, the Android Runtime (ART) doesn’t compile everything to native code immediately. It uses a mix of interpretation and JIT compilation, gradually optimizing hot paths over time. This means the first few app launches are noticeably slower than later ones.

A Baseline Profile is a text file that lists the classes and methods your app uses during critical user journeys — startup, navigation to key screens, common interactions. When you bundle this profile with your APK or AAB, the Play Store and the on-device installer use it to AOT-compile those paths before the user ever opens the app.

The result? Google reports 30-40% faster startup times and smoother scrolling on first launch. If you’ve been working on Compose performance and recomposition optimization, Baseline Profiles complement that work perfectly — they handle the compilation layer while your Compose optimizations handle the rendering layer.

Adding the Macrobenchmark Module

Macrobenchmark tests run on a real device (not the emulator) and measure things like startup time, frame timing, and scroll jank. Start by adding a new module to your project. In Android Studio, go to File > New > New Module > Benchmark > Macrobenchmark.

If you prefer to set it up manually, create a module called macrobenchmark with a build.gradle.kts that includes the android.test and kotlin.android plugins. Set the namespace to com.example.macrobenchmark, compileSdk to 35, and minSdk to 28. Add the androidx.benchmark.macro.junit4, androidx.test.ext.junit, androidx.test.espresso.core, and androidx.test.uiautomator dependencies. Create a custom benchmark build type with isDebuggable = true and set targetProjectPath = “:app” to point to your app module.

Writing Your First Startup Benchmark

Create a benchmark test that measures cold startup — the slowest and most important scenario. Your test class should use the MacrobenchmarkRule to measure startup timing. Create two test methods: one for StartupMode.COLD startup (where the app is killed and relaunched from scratch) and one for StartupMode.WARM startup (where the app is backgrounded briefly before launch). In both cases, call pressHome() to pause the app, then startActivityAndWait() to launch it fresh. Measure over 5 iterations to get reliable median startup times in milliseconds.

Run your benchmark on a physical device with the benchmark build variant selected. Five iterations is a good balance between accuracy and test duration — fewer iterations produce noisy results, more take too long for regular CI runs. The output gives you median startup times you can track over time.

Generating a Baseline Profile

Now that you can measure startup, let’s generate a profile to improve it. Add the baselineprofile Gradle plugin to your root build.gradle.kts, then apply it to your app module and create the generator. The generator is a special instrumented test that exercises your app’s critical paths. According to the Baseline Profile documentation, the generator records which classes and methods are touched during these journeys.

Your generator class should use the BaselineProfileRule and implement a test method that exercises critical user flows: cold start (call pressHome() then startActivityAndWait()), navigation to key screens using device.findObject(By.text(…)), and scrolling interactions using list.fling(Direction.DOWN) and list.fling(Direction.UP). Run this generator on a device, and it produces a baseline-prof.txt file listing the hot methods. The Gradle plugin automatically bundles this into your release builds.

Measuring the Improvement

Here’s where it gets satisfying. Run your startup benchmark again after adding the Baseline Profile. On a mid-range device, you’ll typically see results like this:

Before Baseline Profile:

Cold startup: ~620ms median
Warm startup: ~280ms median

After Baseline Profile:

Cold startup: ~410ms median (34% faster)
Warm startup: ~210ms median (25% faster)

These numbers vary by device and app complexity, but a 25-40% improvement in cold startup is typical. The improvement is most noticeable on lower-end devices where JIT compilation overhead is more pronounced.

If you’ve been tracking performance with the Choreographer API for frame timing, you’ll also notice smoother initial frames after adding profiles — fewer jank frames during the critical first seconds of app usage.

Scrolling and Runtime Benchmarks

Startup isn’t the only thing worth measuring. Macrobenchmark can also capture frame timing during scrolling, which is essential for list-heavy apps. Create a test method that measures FrameTimingMetric over 5 warm-start iterations. After launching the app with startActivityAndWait(), find your scrollable list using device.findObject(By.res(“item_list”)) and fling it several times to capture realistic scrolling behavior.

The FrameTimingMetric reports P50, P90, and P99 frame durations. Your target is keeping P99 under 16ms for 60fps or under 8ms for 120Hz displays. If you’re seeing high P99 values, that’s where your Compose stability and recomposition analysis comes in.

CI Integration Tips

Benchmarks are most valuable when you run them consistently. Here’s how to integrate them into your workflow without slowing everything down:

Separate the benchmark from your main CI pipeline. Run benchmarks nightly or on merge to main, not on every PR. They require a physical device or a Gradle Managed Device, which is slower than your unit test suite.

Use Gradle Managed Devices for consistent results. Define an emulator profile in your Gradle config so every run uses the same hardware profile and API level. While real devices are more accurate, managed devices give you reproducibility.

Track results over time. Export benchmark JSON output and feed it into a dashboard. The Macrobenchmark library outputs structured JSON that’s easy to parse. A simple trend chart showing startup time per commit catches regressions early.

Regenerate Baseline Profiles with major releases. Your critical user journeys change as you add features. Regenerate profiles quarterly or whenever you ship a major navigation change to ensure the profile stays accurate.

Baseline Profiles cost almost nothing to maintain once set up, and Macrobenchmark gives you the numbers to prove it’s working. If you’re shipping an Android app and not using these tools yet, this is one of the highest-impact performance wins you can get with the least ongoing effort.

This post was written by a human with the help of Claude, an AI assistant by Anthropic.