
When your FastAPI app talks to five different services and something breaks, you know the drill: SSH into servers, grep through logs, try to piece together what happened when. Service A says "Request received ✓", Service B logs "Processing ✓", Service C shows "Query executed ✓" – but which request? What was the timing? Where did it actually fail?
The key insight: OpenTelemetry's distributed tracing transforms scattered logs into connected visual flows, automatically correlating every request as it moves through your system. Instead of detective work, you get a timeline.
Why Log Grepping Fails in Microservices
Traditional logging breaks down with distributed systems because each service operates in isolation. You might see:
1# Service A logs
22024-01-15 10:23:45 INFO: Processing user request for /api/users/123
3
4# Service B logs
52024-01-15 10:23:47 ERROR: Database timeout after 5000ms
6
7# Service C logs
82024-01-15 10:23:46 INFO: Cache miss for user:123But which Service B error corresponds to which Service A request? In high-traffic systems, this correlation becomes impossible. You're left guessing based on timestamps, hoping you've found the right needle in the haystack.
<> The fundamental problem isn't the logging – it's the lack of request correlation across service boundaries./>
How OpenTelemetry Changes the Game
OpenTelemetry solves this by creating traces – unique identifiers that follow requests through your entire system. Every operation becomes a span within that trace, creating a hierarchical view of what happened, when, and how long it took.
Here's the minimal FastAPI setup:
1from fastapi import FastAPI
2from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
3from opentelemetry import trace
4from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
5from opentelemetry.sdk.trace import TracerProvider
6from opentelemetry.sdk.trace.export import BatchSpanProcessor
7
8# Configure tracingThat single instrument_app() call automatically captures:
- HTTP method, path, status codes, response times
- Headers and query parameters
- Route handler execution
- Database queries (with additional DB instrumentation)
- External API calls
- Error details and stack traces
Making Traces Flow Across Services
The magic happens when traces cross service boundaries. OpenTelemetry uses W3C standard headers (traceparent, tracestate) to propagate context:
1import httpx
2from opentelemetry.propagate import inject
3
4async def call_user_service(user_id: int):
5 headers = {}
6 # Inject trace context into outgoing request headers
7 inject(headers)
8
9 async with httpx.AsyncClient() as client:
10 response = await client.get(
11 f"http://user-service/users/{user_id}",
12 headers=headers # This carries the trace context
13 )
14 return response.json()Now when Service A calls Service B, Service B's spans become children of Service A's span. You get a complete request tree instead of isolated log entries.
Custom Spans for Business Logic
While automatic instrumentation covers HTTP and database operations, you'll want custom spans for business logic:
1from opentelemetry import trace
2
3tracer = trace.get_tracer(__name__)
4
5@app.post("/orders")
6async def create_order(order_data: dict):
7 with tracer.start_as_current_span("validate_order") as span:
8 # Add custom attributes to spansThis creates spans for each business operation, with custom attributes that help during debugging. When something fails, you'll see exactly which step broke and why.
The FastAPI Dependency Injection Advantage
FastAPI's dependency system works beautifully with tracing:
1from fastapi import Depends
2
3async def get_traced_database():
4 with tracer.start_as_current_span("database_connection"):
5 # Database setup with automatic span creation
6 db = await get_database_connection()
7 try:
8 yield dbEvery dependency injection creates its own span, giving you detailed visibility into where time is spent.
Production Setup Considerations
For production environments, you'll want proper trace backends. Here's a Docker Compose setup with Jaeger:
1version: '3.8'
2services:
3 jaeger:
4 image: jaegertracing/all-in-one:latest
5 ports:
6 - "16686:16686" # Jaeger UI
7 - "4317:4317" # OTLP gRPC
8 - "4318:4318" # OTLP HTTPCritical environment variables:
OTEL_SERVICE_NAME: Identifies your service in tracesOTEL_TRACES_SAMPLER: Controls trace sampling (useparentbased_traceidratiowith a ratio like 0.1 for high-traffic systems)OTEL_EXPORTER_OTLP_ENDPOINT: Where to send trace data
Why This Matters
Distributed tracing isn't just about replacing grep – it fundamentally changes how you understand system behavior. Instead of reactive debugging after problems occur, you get:
Performance insights: See exactly which service or operation is the bottleneck
Error correlation: When Service C fails, immediately see the upstream request from Service A that caused it
Scaling decisions: Identify which services need optimization based on real request patterns
Team productivity: Junior developers can debug complex distributed issues without tribal knowledge
The investment in OpenTelemetry setup pays dividends as your system grows. What starts as a simple FastAPI app inevitably becomes a distributed system – traces scale with that complexity while logs become increasingly unwieldy.
Start with the basic FastAPIInstrumentor.instrument_app(app) setup and a local Jaeger instance. As you see the value, add custom spans for business logic and expand to your other services. Your future debugging self will thank you.

