
The real problem isn't connectivity — it's recovery
Most WebSocket tutorials stop at "send a ping every 30 seconds." That's fine for a chat app. For financial market data — where you're streaming tens of thousands of price updates per second and a 200ms stall means a missed trade signal — that advice is dangerously incomplete.
A production financial feed system needs three things working in concert: heartbeats that detect silent link failures fast, reconnection logic that restores subscription state not just the TCP connection, and flow control that prevents a burst of market activity from turning your client into a memory balloon.
Let's break down what each of these actually looks like in practice.
---
Heartbeats: detecting the "half-open" connection
The most insidious failure mode in long-lived WebSocket connections isn't a clean disconnect — it's the half-open connection. Your socket thinks it's alive. The server thinks it's alive. But packets are being silently dropped by a firewall, NAT table expiry, or a flaky middle-box.
Default Ping/Pong at 30-second intervals means you might not notice this failure for half a minute. In equity markets, that's entire candles worth of missed data.
<> The goal isn't just to keep the connection alive — it's to know within seconds when it isn't./>
A better approach combines protocol-level Ping/Pong with application-level heartbeats that carry a timestamp. This way you get both liveness detection and latency measurement in the same message:
1import asyncio
2import time
3import websockets
4
5class FinancialFeedClient:
6 def __init__(self, uri: str, heartbeat_interval: float = 5.0, timeout: float = 10.0):
7 self.uri = uri
8 self.heartbeat_interval = heartbeat_intervalKey details: a 5-second heartbeat interval (not 30), and a 10-second timeout threshold. For equities, you want to know about a dead link within one to two heartbeat cycles — not after you've missed an entire minute-bar.
---
Reconnection: state restoration, not just socket restart
The classic mistake is treating reconnection as purely a transport concern. You catch the disconnect, wait a bit, open a new socket, done. But in financial systems, that new socket is useless unless it immediately re-subscribes to the same instruments with the same parameters.
Exponential backoff with jitter is table stakes. The important part is what happens after the connection is re-established:
1import random
2
3async def connect_with_backoff(client, max_retries=10):
4 base_delay = 0.5 # seconds
5 cap = 30.0 # max delay
6
7 for attempt in range(max_retries):
8 try:The restore_subscriptions call is doing the heavy lifting here. You need to maintain a local record of what you've subscribed to — instrument IDs, tick types, depth levels — and replay those subscriptions on every reconnect. Without this, you have a working connection that's delivering nothing.
The jitter component (±30% of the delay) is critical in scenarios where a server restart causes hundreds of clients to disconnect simultaneously. Without jitter, they all retry at the same intervals, creating a thundering herd that can take down a freshly-restarted server.
---
Flow control: backpressure is not optional
Market open, major economic data release, circuit breaker events — these create message bursts that can be 10-100x normal throughput. If your client processes messages synchronously, or dumps everything into an unbounded queue, you will eventually OOM or introduce latency spikes that defeat the entire purpose of the low-latency feed.
The practical solution for Python is an asyncio.Queue with a bounded size, combined with a consumer that can shed stale ticks when under pressure:
1from asyncio import Queue
2from dataclasses import dataclass
3from typing import Optional
4
5@dataclass
6class Tick:
7 symbol: str
8 price: floatThis is a deliberate design decision worth internalizing: for market data, dropping an old price is better than processing it late. A price from 2 seconds ago is worse than no price at all — it will cause your system to act on stale information. Recency beats completeness in tick data.
For true high-throughput scenarios (futures, FX), consider batching: instead of processing tick-by-tick, accumulate for 1-5ms and process the batch, taking only the latest price per symbol. This collapses update storms into a single state snapshot.
---
Why this matters beyond finance
The patterns here — fast failure detection, state-aware reconnection, bounded queues with drop policies — are directly applicable to any system where WebSocket connections carry high-value, time-sensitive data: IoT sensor streams, live sports data, real-time collaborative editing, gaming state synchronization.
The key mental shift is treating abnormal disconnection as the expected code path, not the exception. In a long-running financial feed client, you will disconnect. The question is how quickly you detect it, how gracefully you recover, and whether your client is still processing meaningful data when you come back up.
Building that resilience in from day one — with instrumented heartbeats, subscription-aware reconnection, and bounded ingestion queues — is the difference between a prototype and a system you can run in production during market hours.
