AI-Powered Crash Triage: Stack Traces to LLMs

The Stack Trace Triage Problem

Every app that ships crashes. Users hit unexpected states, device configurations trigger edge cases, third-party libraries fail in mysterious ways. When a crash report lands in Firebase Crashlytics, you’re looking at a stack trace, maybe some breadcrumbs or custom logs, and you need to decide: Is this a high-priority bug? Can I reproduce it? How do I even start debugging?

The traditional approach is manual: you read the stack trace, search for the failing line, try to mentally reconstruct the call graph, and maybe reproduce it locally. This process is slow, error-prone, and wastes developer time on pattern matching that a machine could handle instantly. When you have dozens or hundreds of crash variants across your user base, this approach doesn’t scale.

A holographic visualization of a digital stack trace being analyzed and converted into organized code solutions by an AI prism. — Featured image representing the automated triage of Firebase Crashlytics reports using Large Language Models to identify root causes and suggested fixes.

AI-powered crash triage changes this equation. Instead of manually reading each stack trace, you can pipe crash data directly into an LLM like Claude, ask it to analyze the pattern, suggest a root cause, and propose a fix — all in seconds. For your most critical crashes, this turns the triage workflow from hours of detective work into a structured, rapid analysis pipeline. In this post, you’ll learn how to build this workflow, what prompts work best for stack trace analysis, and how to integrate it into your crash monitoring system.

Why LLMs Excel at Crash Analysis

Stack traces are inherently narrative. They tell a story of where execution went wrong, but that story is often obscured by framework layers, asynchronous complexity, and non-obvious state dependencies. Humans have to hold multiple files and concepts in their head to understand the full picture. LLMs, with their ability to process large contexts and reason across multiple code files simultaneously, excel at this exact task.

An LLM can:

Read a full stack trace and identify the most meaningful frames (not the noisy framework internals)
Look at your actual source code and understand what the code does at that point
Connect the crash to similar patterns in your codebase — maybe you have the same bug in three places
Suggest specific fixes with trade-offs, rather than generic advice
Process dozens of crash variants and identify common root causes you might miss

The result is that your best developers spend their time on complex problems that need human judgment, while routine crash analysis happens automatically.

A Practical Crash Analysis Workflow

Here’s the workflow that works at scale. You have a function that runs periodically (or on-demand) to fetch recent crashes from Crashlytics, pipe them to an LLM, and categorize them:

import anthropic
import json
from typing import TypedDict

class CrashAnalysis(TypedDict):
    crash_id: str
    summary: str
    root_cause: str
    severity: str  # critical, high, medium, low
    affected_files: list[str]
    suggested_fix: str
    similar_crashes: list[str]

def analyze_crash(crash_id: str, stack_trace: str, relevant_source: str) -> CrashAnalysis:
    """
    Feed a crash and source code to Claude for analysis.
    """
    client = anthropic.Anthropic()
    
    prompt = f"""You are an expert Android crash analyst. You will analyze a crash from Firebase Crashlytics and provide actionable diagnosis.

Crash ID: {crash_id}

Stack Trace:
{stack_trace}

Relevant Source Code:
{relevant_source}

Analyze this crash and provide:
1. Summary: One-line summary of what crashed
2. Root Cause: The most likely root cause, explained clearly
3. Severity: Critical (blocks main flow), High (feature broken), Medium (edge case), Low (rare condition)
4. Affected Files: List of source files that likely need changes
5. Suggested Fix: Concrete code change to fix this issue
6. Similar Patterns: If you see this pattern elsewhere in the code, mention it

Respond in JSON format with these exact keys: summary, root_cause, severity, affected_files, suggested_fix, similar_patterns
"""
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20250219",
        max_tokens=2000,
        messages=[{"role": "user", "content": prompt}]
    )
    
    # Parse Claude's JSON response
    analysis_text = response.content[0].text
    analysis_data = json.loads(analysis_text)
    
    return CrashAnalysis(
        crash_id=crash_id,
        summary=analysis_data["summary"],
        root_cause=analysis_data["root_cause"],
        severity=analysis_data["severity"],
        affected_files=analysis_data["affected_files"],
        suggested_fix=analysis_data["suggested_fix"],
        similar_crashes=analysis_data.get("similar_patterns", [])
    )

The key insight here is that you’re feeding both the stack trace and the relevant source code. A stack trace alone is context-poor; stack trace plus source code gives the LLM everything it needs to make a diagnosis. If your crash involves multiple files, include them all. Claude can handle 200K tokens of context, so you have room for substantial code samples. Obviously you can use your code LLM to produce this for you and also for your particular needs, no need to copy it.

Extracting Stack Traces From Crashlytics

Firebase Crashlytics provides an API for programmatic access. Here’s how to fetch crashes and prepare them for analysis:

from google.cloud import firebase_admin
from google.cloud.firestore import Client
from datetime import datetime, timedelta
import requests

def fetch_recent_crashes(project_id: str, hours: int = 24) -> list[dict]:
    """
    Fetch recent crashes from Firebase Crashlytics API.
    Returns raw crash data with stack traces.
    """
    # Use the Firebase REST API or the Admin SDK
    # This example uses the Crashlytics REST endpoint
    url = f"https://firebase.googleapis.com/v1/projects/{project_id}/issues"
    
    headers = {
        "Authorization": f"Bearer {get_firebase_token()}",
    }
    
    # Fetch crashes from the last N hours
    response = requests.get(
        url,
        headers=headers,
        params={"pageSize": 100}  # Get top 100 crashes
    )
    
    crashes = []
    for issue in response.json().get("issues", []):
        # Fetch full crash details including stack trace
        crash_detail = fetch_crash_detail(project_id, issue["id"])
        crashes.append(crash_detail)
    
    return crashes

def fetch_crash_detail(project_id: str, issue_id: str) -> dict:
    """
    Fetch the full stack trace for a specific crash.
    """
    url = f"https://firebase.googleapis.com/v1/projects/{project_id}/issues/{issue_id}"
    
    headers = {
        "Authorization": f"Bearer {get_firebase_token()}",
    }
    
    response = requests.get(url, headers=headers)
    return response.json()

Once you have the crashes, you can batch them for analysis. Process the highest-impact crashes first — those affecting the most users or blocking critical flows.

Prompt Template for Stack Trace Analysis

The prompt you use dramatically affects the quality of the analysis. Here’s a template that works well for Android crashes:

You are an expert Android crash analyst with deep knowledge of the Android framework,
common crash patterns, and memory/threading issues.

I'm providing a crash from Firebase Crashlytics along with the relevant source code.
Analyze the crash and answer the following questions:

1. What is the root cause of this crash? Be specific — point to the exact line or condition.
2. Is this a symptom of a deeper architectural issue, or an isolated bug?
3. What are the three most likely ways to fix this, with trade-offs for each?
4. If you had to prioritize this for immediate fix vs. monitoring, what would you recommend?
5. Are there similar patterns elsewhere in the code that might have the same bug?

Stack Trace:
---
{STACK_TRACE}
---

Relevant Source Files:
---
{SOURCE_CODE}
---

Context:
- App targets Android {MIN_SDK} - {TARGET_SDK}
- Major libraries: {LIBRARIES}
- Known issues or recent changes: {RECENT_CHANGES}

Respond with clear, actionable analysis. Focus on the most likely root cause first.

The context fields (libraries, SDK levels, recent changes) are crucial. If you recently migrated to a new networking library, updated your coroutine scopes, or changed how you manage lifecycle, mention it. The LLM can then weight hypotheses accordingly — if you just updated Coroutines, a cancellation issue is more likely than if nothing has changed in months.

Real Example: NullPointerException in Repository

Let’s walk through a concrete example. Your Crashlytics dashboard shows this crash:

java.lang.NullPointerException: Attempt to invoke virtual method 'java.lang.String com.example.User.getId()' on a null object reference
  at com.example.repository.UserRepository.cacheUser(UserRepository.kt:87)
  at com.example.repository.UserRepository.access$cacheUser(UserRepository.kt:1)
  at com.example.repository.UserRepository$fetchUser$2.invokeSuspend(UserRepository.kt:45)
  at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
  at kotlinx.coroutines.DispatchedTask.run(Dispatchers.kt:106)
  at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)

You paste this and the UserRepository source into your prompt:

class UserRepository(
    private val api: UserApi,
    private val userCache: UserCache
) {
    suspend fun fetchUser(id: String): User = withContext(Dispatchers.IO) {
        // Line 45
        val apiResponse = api.getUser(id)
        cacheUser(apiResponse)
        apiResponse
    }
    
    private fun cacheUser(user: User) {
        // Line 87
        val cachedKey = "user_${user.getId()}"  // NPE here
        userCache.put(cachedKey, user)
    }
}

Claude’s analysis would identify that apiResponse can be null even though the type says User (not User?). The API might be returning null in some cases, or there’s a deserialization issue. The fix options are: (1) change the API contract to explicitly allow null, (2) add a null check before caching, (3) investigate why the API is returning null in the first place and fix it upstream. Claude will weigh each option and rank them.

Batch Analysis and Root Cause Clustering

For large crash volumes, instead of analyzing each crash individually, cluster them first:

def cluster_similar_crashes(crashes: list[dict]) -> dict[str, list[dict]]:
    """
    Group crashes by common stack trace patterns using string similarity.
    This reduces redundant analysis when many users hit the same bug.
    """
    clusters = {}
    
    for crash in crashes:
        # Extract the top 3 frames (most specific to your code)
        key_frames = extract_key_frames(crash["stack_trace"])
        cluster_key = "_".join(key_frames)
        
        if cluster_key not in clusters:
            clusters[cluster_key] = []
        clusters[cluster_key].append(crash)
    
    return clusters

def analyze_crash_cluster(
    cluster_key: str,
    crashes: list[dict],
    source_code: str
) -> dict:
    """
    Analyze a cluster of similar crashes as a single issue.
    This is more efficient than analyzing each crash separately.
    """
    client = anthropic.Anthropic()
    
    # Summarize the cluster
    crash_summary = f"""
This cluster represents {len(crashes)} crash instances affecting 
{len(set(c['user_id'] for c in crashes))} unique users.

Common pattern in stack trace:
{crashes[0]['stack_trace']}

Affected versions: {set(c['app_version'] for c in crashes)}
Device breakdown: {summarize_device_types(crashes)}
"""
    
    prompt = f"""{crash_summary}

Source code context:
{source_code}

Analyze this cluster of crashes and provide:
1. A single root cause that explains all instances
2. Priority (critical if it affects many users or blocks core flow)
3. Recommended fix
4. Test case to verify the fix
"""
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20250219",
        max_tokens=2000,
        messages=[{"role": "user", "content": prompt}]
    )
    
    return {
        "cluster_key": cluster_key,
        "crash_count": len(crashes),
        "analysis": response.content[0].text
    }

This batching approach is crucial for scaling. Instead of analyzing 500 crashes, you analyze 20-30 root cause clusters. Your analysis time drops from hours to minutes.

Integrating With Your Development Workflow

Once you have crash analyses, the natural next step is to integrate them into your development workflow. You can:

Auto-create GitHub/Jira issues: For high-priority crashes, create a ticket automatically with the crash analysis in the description
Notify on Slack: Post analysis summaries to your team’s crash channel with severity and suggested fix
Add to your monitoring dashboard: Show crash trends and LLM-generated insights alongside raw metrics
Close the loop: When you deploy a fix, query Crashlytics to verify the crash rate dropped and report success back to the team

The key pattern here is closing the feedback loop. You analyze a crash, deploy a fix, and verify it worked. Over time, this turns crash triage from a reactive firefighting exercise into a structured, measurable process.

When to Use LLM Analysis and When Not To

LLM crash analysis isn’t a magic bullet. It works best for:

Android framework exceptions: NullPointerException, IllegalStateException, and similar runtime errors where the code context matters
Repository and service-layer bugs: State management, threading, and data flow issues that benefit from reasoning about code flow
Crashes affecting multiple users: If it’s a one-off user issue, manual investigation might be faster
Complex stack traces: When the crash involves multiple layers and you need to hold a lot of context in your head

It’s less helpful for:

Out-of-memory errors: These usually need heap dump analysis, which is a separate tool
Third-party library crashes: If the crash is deep inside a library you don’t control, LLM analysis has less to offer
Device-specific issues: If the crash only happens on one obscure device model, it’s likely a hardware/driver issue, not a code bug

The best approach is hybrid: use LLM analysis to handle the bulk of crashes quickly, then dedicate human time to the 10-20% that need deeper investigation.

Connecting Crash Analysis to Your Broader Development Workflow

Crash triage with LLMs is particularly powerful when it’s part of your broader AI-assisted Android development workflow. You can feed crash analysis results into your testing and debugging processes:

When LLM analysis suggests a fix, immediately ask it to write a refactoring prompt for that section of code
Use crash patterns to guide your testing strategy — if crashes cluster around a specific feature, write more tests for that area
Leverage the rubber duck debugging approach to think through complex interactions before they become crashes

Over time, you’re building a feedback loop: crashes inform analysis, analysis informs testing, better testing prevents future crashes.

Beyond Basic Triage: Predictive Insights

Once you have historical crash analyses, you can ask Claude meta-questions about trends and patterns:

Here are crash analyses from the last 30 days for our app:
{CRASH_ANALYSES_JSON}

Identify:
1. The top 3 root causes accounting for the most crashes
2. Areas of the codebase with the highest crash frequency (architectural weaknesses?)
3. Patterns that predict crashes (e.g., crashes spike after specific code changes)
4. Architectural recommendations to reduce future crash rates
5. Testing gaps revealed by the crash patterns

This meta-analysis can highlight systemic issues that would take humans weeks to identify manually. Maybe your crash data reveals that you have a concurrency issue in your repository pattern, or that a specific third-party library tends to fail under certain conditions.

Getting Started: A Minimal Implementation

You don’t need a complex system to start. A minimal implementation:

Set up a Crashlytics API client to fetch crashes periodically (hourly or daily)
For your top 10 most-impactful crashes, extract the stack trace and relevant source code
Send them to Claude with a simple prompt asking for root cause and suggested fix
Collect the analyses in a document or spreadsheet
Review them and fix the bugs

Once that’s working, add automation: create GitHub issues automatically, send Slack notifications, batch similar crashes, run the analysis hourly. But start simple. You’ll learn what works for your team and refine from there.

Conclusion

AI-powered crash triage transforms how you respond to bugs. Instead of manually reading stack traces and guessing at root causes, you pipe crash data to an LLM, get structured analysis in seconds, and focus your human effort on the crashes that genuinely need it. Combined with proper testing, refactoring practices, and a systematic development workflow, this approach reduces time-to-fix, improves code quality, and frees your best developers to work on harder problems. Start with your highest-impact crashes and expand from there.

This post was written by a human with the help of Claude, an AI assistant by Anthropic.

AI-Powered Crash Triage: Feeding Stack Traces to LLMs for Faster Fixes

The Stack Trace Triage Problem

Why LLMs Excel at Crash Analysis

A Practical Crash Analysis Workflow

Extracting Stack Traces From Crashlytics

Prompt Template for Stack Trace Analysis

Real Example: NullPointerException in Repository

Batch Analysis and Root Cause Clustering

Integrating With Your Development Workflow

When to Use LLM Analysis and When Not To

Connecting Crash Analysis to Your Broader Development Workflow

Beyond Basic Triage: Predictive Insights

Getting Started: A Minimal Implementation

Conclusion

Related

The Stack Trace Triage Problem

Why LLMs Excel at Crash Analysis

A Practical Crash Analysis Workflow

Extracting Stack Traces From Crashlytics

Prompt Template for Stack Trace Analysis

Real Example: NullPointerException in Repository

Batch Analysis and Root Cause Clustering

Integrating With Your Development Workflow

When to Use LLM Analysis and When Not To

Connecting Crash Analysis to Your Broader Development Workflow

Beyond Basic Triage: Predictive Insights

Getting Started: A Minimal Implementation

Conclusion

Share this:

Related

Related Posts