Circuit Breaker Implementation Guide

Prevent Cascade Failures & Improve AI API Reliability

Difficulty: Intermediate
Time Required: 2-3 hours
Potential Savings: Prevents catastrophic cost overruns (up to $10K+ saved in outage scenarios)
Best For: All production applications using AI APIs

What is a Circuit Breaker?

A Circuit Breaker is a design pattern that prevents your application from repeatedly calling a failing service, protecting your system from:

Cascade failures (one failing API causes your entire app to fail)
Cost overruns (retrying failed requests thousands of times)
Poor user experience (long timeouts instead of fast failures)
Resource exhaustion (threads/connections stuck waiting)

The Problem Without Circuit Breakers:

Your App → [Call OpenAI] → 500 Error
Your App → [Retry] → 500 Error
Your App → [Retry] → 500 Error
Your App → [Retry] → 500 Error  (repeats 1000x)

Result: 
- App becomes unresponsive
- $5,000 in wasted API calls
- 30 minute recovery time

The Solution With Circuit Breakers:

Your App → [Call OpenAI] → 500 Error
Your App → [Retry] → 500 Error
Your App → [Retry] → 500 Error

Circuit Breaker: "Provider is down, stop trying!"
Circuit Status: OPEN (requests blocked)

Your App → [Fast fail] → Use fallback provider
Your App → [Fast fail] → Use cached response
Your App → [Fast fail] → Show user friendly error

After 30 seconds:
Circuit Status: HALF-OPEN (test if provider recovered)
Your App → [Test call] → Success! → Circuit CLOSED

Result:

App remains responsive
Only 3 failed calls instead of 1000
Automatic recovery
$4,997 saved

How Circuit Breakers Work

Three States:

CLOSED (Normal operation)
- All requests pass through
- Monitoring for failures
OPEN (Provider is down)
- All requests immediately fail
- No calls to failing provider
- Save time and money
HALF-OPEN (Testing recovery)
- Allow one test request
- If success → CLOSED
- If failure → OPEN again

State Transitions:

      [Normal]
        ↓
      CLOSED ←─────────────┐
        ↓                   │
   [Too many               │
    failures]           [Success]
        ↓                   │
       OPEN ────────→  HALF-OPEN
    [Wait 30s]         [Test call]
                           │
                      [Failure]
                           ↓
                         OPEN

Prerequisites

Before implementing:

Edge Proxy deployed (see Edge Proxy Guide) - OR -
Direct AI API integration in your application
Python 3.8+ (for code examples) or your language of choice
Basic understanding of error handling and retries

Implementation Steps

Step 1: Choose Your Implementation Approach

Option A: Use LiteLLM Proxy (Recommended if you have Edge Proxy)

Circuit breakers built-in, no code needed.

Option B: Application-Level Circuit Breaker (For direct API calls)

Add circuit breaker to your existing code using pybreaker library.

Option C: Service Mesh (Advanced/Enterprise)

Use Istio or Linkerd for circuit breakers at infrastructure level.

We'll cover Options A and B in this guide.

Option A: Circuit Breakers in LiteLLM Proxy

If you already have an Edge Proxy (from the Edge Proxy Implementation Guide), circuit breakers are built-in.

Step 1: Enable Circuit Breakers in Config

Update your litellm_config.yaml:

model_list:
  - model_name: gpt-4o-mini
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: os.environ/OPENAI_API_KEY
  
  - model_name: claude-3-5-sonnet
    litellm_params:
      model: anthropic/claude-3-5-sonnet-20241022
      api_key: os.environ/ANTHROPIC_API_KEY

# Circuit breaker configuration
router_settings:
  routing_strategy: least-cost
  
  # Circuit breaker settings
  circuit_breaker:
    enabled: true
    
    # Failure threshold
    failure_threshold: 5  # Open circuit after 5 failures
    
    # Time window for counting failures
    window_size: 60  # Count failures in last 60 seconds
    
    # Recovery timeout
    recovery_timeout: 30  # Wait 30 seconds before testing recovery
    
    # Success threshold for recovery
    success_threshold: 2  # Need 2 successes to close circuit
    
    # What counts as a failure?
    failure_conditions:
      - status_code: 500
      - status_code: 502
      - status_code: 503
      - status_code: 504
      - timeout: true
      - rate_limit: true  # Treat rate limits as failures
    
  # Fallback when circuit is open
  fallbacks:
    - claude-3-5-sonnet  # Use Claude if OpenAI is down
    - gpt-4o-azure       # Then try Azure

# Alert when circuit opens
alerting:
  webhook_url: https://your-app.com/webhooks/circuit-breaker
  slack_webhook: https://hooks.slack.com/services/YOUR/WEBHOOK

Step 2: Restart Proxy

docker restart litellm-proxy

# Or if using docker-compose
docker-compose restart

Step 3: Test Circuit Breaker

Simulate provider failure:

# Send 6 requests with invalid API key (will fail)
for i in {1..6}; do
  curl -X POST http://localhost:4000/chat/completions \
    -H "Authorization: Bearer sk-1234567890abcdef" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "gpt-4o-mini",
      "messages": [{"role": "user", "content": "Hello"}]
    }'
  sleep 1
done

# Check circuit status
curl http://localhost:4000/metrics | grep circuit_breaker
# Should show: circuit_breaker{provider="openai"} = 1  (1 = OPEN)

Verify fallback:

# This request should now go to Claude (fallback)
curl -X POST http://localhost:4000/chat/completions \
  -H "Authorization: Bearer sk-1234567890abcdef" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

# Check response headers
# X-LiteLLM-Provider: anthropic  (routed to fallback)

Wait for recovery test:

# Wait 30 seconds, then send request
sleep 30

# This will test if OpenAI recovered
curl -X POST http://localhost:4000/chat/completions \
  -H "Authorization: Bearer sk-1234567890abcdef" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

# If successful, circuit closes automatically

Option B: Application-Level Circuit Breaker

If you're making direct API calls (no proxy), implement circuit breakers in your application code.

Step 1: Install Circuit Breaker Library

pip install pybreaker --break-system-packages

Step 2: Create Circuit Breaker Wrapper

Create ai_circuit_breaker.py:

import logging
from functools import wraps
from pybreaker import CircuitBreaker, CircuitBreakerError
import openai
from anthropic import Anthropic

logger = logging.getLogger(__name__)

# Create circuit breakers for each provider
openai_breaker = CircuitBreaker(
    fail_max=5,              # Open after 5 failures
    reset_timeout=30,        # Wait 30 seconds before testing recovery
    exclude=[                # Don't count these as failures:
        openai.RateLimitError,  # Rate limits are expected
    ],
    name="OpenAI"
)

anthropic_breaker = CircuitBreaker(
    fail_max=5,
    reset_timeout=30,
    exclude=[],
    name="Anthropic"
)

azure_breaker = CircuitBreaker(
    fail_max=5,
    reset_timeout=30,
    exclude=[],
    name="Azure"
)


class AIProviderWithCircuitBreaker:
    """Wrapper that adds circuit breaker to AI provider calls"""
    
    def __init__(self):
        self.openai_client = openai.OpenAI(api_key=os.environ["OPENAI_API_KEY"])
        self.anthropic_client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
        
    def chat_completion(self, messages, model="gpt-4o-mini", **kwargs):
        """
        Make chat completion with automatic fallback if provider is down.
        
        Tries providers in order:
        1. OpenAI (primary)
        2. Anthropic (fallback)
        3. Azure (last resort)
        """
        
        # Try OpenAI first
        if openai_breaker.current_state == "closed":
            try:
                return self._openai_chat(messages, model, **kwargs)
            except CircuitBreakerError:
                logger.warning("OpenAI circuit breaker is OPEN, trying fallback")
            except Exception as e:
                logger.error(f"OpenAI call failed: {e}")
                # Let circuit breaker handle the failure
        
        # Fallback to Anthropic
        if anthropic_breaker.current_state == "closed":
            try:
                return self._anthropic_chat(messages, **kwargs)
            except CircuitBreakerError:
                logger.warning("Anthropic circuit breaker is OPEN, trying last resort")
            except Exception as e:
                logger.error(f"Anthropic call failed: {e}")
        
        # Last resort: Azure
        if azure_breaker.current_state == "closed":
            try:
                return self._azure_chat(messages, model, **kwargs)
            except CircuitBreakerError:
                logger.error("All providers have open circuit breakers!")
                raise Exception("All AI providers are currently unavailable")
            except Exception as e:
                logger.error(f"Azure call failed: {e}")
                raise
        
        # All circuit breakers are open
        raise Exception("All AI providers are currently unavailable")
    
    @openai_breaker
    def _openai_chat(self, messages, model, **kwargs):
        """OpenAI API call wrapped with circuit breaker"""
        logger.info(f"Calling OpenAI with model {model}")
        
        response = self.openai_client.chat.completions.create(
            model=model,
            messages=messages,
            **kwargs
        )
        
        return {
            'provider': 'openai',
            'model': model,
            'content': response.choices[0].message.content,
            'usage': {
                'input_tokens': response.usage.prompt_tokens,
                'output_tokens': response.usage.completion_tokens,
            }
        }
    
    @anthropic_breaker
    def _anthropic_chat(self, messages, **kwargs):
        """Anthropic API call wrapped with circuit breaker"""
        logger.info("Calling Anthropic with Claude 3.5 Sonnet")
        
        # Convert OpenAI message format to Anthropic format
        anthropic_messages = []
        system_prompt = None
        
        for msg in messages:
            if msg['role'] == 'system':
                system_prompt = msg['content']
            else:
                anthropic_messages.append({
                    'role': msg['role'],
                    'content': msg['content']
                })
        
        response = self.anthropic_client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=kwargs.get('max_tokens', 4096),
            system=system_prompt,
            messages=anthropic_messages
        )
        
        return {
            'provider': 'anthropic',
            'model': 'claude-3-5-sonnet-20241022',
            'content': response.content[0].text,
            'usage': {
                'input_tokens': response.usage.input_tokens,
                'output_tokens': response.usage.output_tokens,
            }
        }
    
    @azure_breaker
    def _azure_chat(self, messages, model, **kwargs):
        """Azure OpenAI API call wrapped with circuit breaker"""
        logger.info(f"Calling Azure OpenAI with model {model}")
        
        azure_client = openai.AzureOpenAI(
            api_key=os.environ["AZURE_API_KEY"],
            api_version="2024-02-15-preview",
            azure_endpoint=os.environ["AZURE_API_BASE"]
        )
        
        response = azure_client.chat.completions.create(
            model=model,
            messages=messages,
            **kwargs
        )
        
        return {
            'provider': 'azure',
            'model': model,
            'content': response.choices[0].message.content,
            'usage': {
                'input_tokens': response.usage.prompt_tokens,
                'output_tokens': response.usage.completion_tokens,
            }
        }
    
    def get_circuit_status(self):
        """Get current status of all circuit breakers"""
        return {
            'openai': openai_breaker.current_state,
            'anthropic': anthropic_breaker.current_state,
            'azure': azure_breaker.current_state,
        }


# Singleton instance
ai_provider = AIProviderWithCircuitBreaker()

Step 3: Update Your Application Code

Before (Direct calls with no protection):

import openai

client = openai.OpenAI(api_key="sk-...")

# If OpenAI is down, this will retry indefinitely
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}]
)

After (With circuit breaker + fallback):

from ai_circuit_breaker import ai_provider

# Automatically fails fast and uses fallback if OpenAI is down
response = ai_provider.chat_completion(
    messages=[{"role": "user", "content": "Hello"}],
    model="gpt-4o-mini"
)

print(f"Response from {response['provider']}: {response['content']}")
print(f"Tokens used: {response['usage']}")

Step 4: Add Circuit Status Endpoint (Optional)

Monitor circuit breaker status in your application:

from flask import Flask, jsonify
from ai_circuit_breaker import ai_provider

app = Flask(__name__)

@app.route('/health/circuit-breakers')
def circuit_breaker_status():
    """Endpoint to check circuit breaker status"""
    status = ai_provider.get_circuit_status()
    
    # Determine overall health
    all_open = all(state == 'open' for state in status.values())
    some_open = any(state == 'open' for state in status.values())
    
    return jsonify({
        'providers': status,
        'overall_health': 'critical' if all_open else 'degraded' if some_open else 'healthy'
    })

Step 5: Configure Alerting

Get notified when circuit breakers trip:

# Add to ai_circuit_breaker.py

def on_circuit_open(breaker):
    """Called when circuit breaker opens"""
    logger.critical(f"Circuit breaker OPENED for {breaker.name}!")
    
    # Send alert to Slack
    import requests
    requests.post(
        "https://hooks.slack.com/services/YOUR/WEBHOOK",
        json={
            "text": f"🚨 Circuit Breaker OPEN: {breaker.name} provider is down",
            "attachments": [{
                "color": "danger",
                "fields": [
                    {"title": "Provider", "value": breaker.name, "short": True},
                    {"title": "Failures", "value": str(breaker.fail_counter), "short": True}
                ]
            }]
        }
    )

def on_circuit_close(breaker):
    """Called when circuit breaker closes (recovery)"""
    logger.info(f"Circuit breaker CLOSED for {breaker.name} - provider recovered")
    
    # Send recovery notification
    import requests
    requests.post(
        "https://hooks.slack.com/services/YOUR/WEBHOOK",
        json={
            "text": f"✅ Circuit Breaker CLOSED: {breaker.name} provider recovered",
            "attachments": [{
                "color": "good",
                "fields": [
                    {"title": "Provider", "value": breaker.name, "short": True},
                    {"title": "Status", "value": "Operational", "short": True}
                ]
            }]
        }
    )

# Add listeners to circuit breakers
openai_breaker.add_listener(on_circuit_open, on_circuit_close)
anthropic_breaker.add_listener(on_circuit_open, on_circuit_close)
azure_breaker.add_listener(on_circuit_open, on_circuit_close)

Testing Your Circuit Breaker

Test 1: Simulate Provider Failure

# test_circuit_breaker.py

import os
os.environ["OPENAI_API_KEY"] = "sk-invalid"  # Use invalid key

from ai_circuit_breaker import ai_provider

# This should fail 5 times, then open circuit
for i in range(6):
    try:
        response = ai_provider.chat_completion(
            messages=[{"role": "user", "content": "Hello"}],
            model="gpt-4o-mini"
        )
        print(f"Call {i+1}: Success from {response['provider']}")
    except Exception as e:
        print(f"Call {i+1}: Failed - {e}")

# Check circuit status
status = ai_provider.get_circuit_status()
print(f"\nCircuit Status: {status}")
# Should show: {'openai': 'open', 'anthropic': 'closed', 'azure': 'closed'}

Test 2: Verify Fallback

# With OpenAI circuit open, requests should go to Anthropic
response = ai_provider.chat_completion(
    messages=[{"role": "user", "content": "Hello"}],
    model="gpt-4o-mini"
)

print(f"Provider used: {response['provider']}")
# Should print: Provider used: anthropic

Test 3: Verify Recovery

import time

# Fix OpenAI API key
os.environ["OPENAI_API_KEY"] = "sk-correct-key"

# Wait for recovery timeout (30 seconds)
print("Waiting 30 seconds for circuit to enter half-open state...")
time.sleep(30)

# Next request will test recovery
response = ai_provider.chat_completion(
    messages=[{"role": "user", "content": "Hello"}],
    model="gpt-4o-mini"
)

print(f"Provider used: {response['provider']}")
# Should print: Provider used: openai (circuit recovered!)

status = ai_provider.get_circuit_status()
print(f"Circuit Status: {status}")
# Should show: {'openai': 'closed', 'anthropic': 'closed', 'azure': 'closed'}

Advanced Configuration

Custom Failure Detection

Not all errors should trip the circuit breaker. Configure what counts as a failure:

from pybreaker import CircuitBreaker

openai_breaker = CircuitBreaker(
    fail_max=5,
    reset_timeout=30,
    
    # Exclude these exceptions (don't count as failures)
    exclude=[
        openai.RateLimitError,     # Expected during high traffic
        openai.AuthenticationError, # Configuration issue, not provider issue
        ValueError,                 # App logic error, not provider issue
    ],
    
    # Only count these as failures (whitelist approach)
    # listeners=[my_custom_failure_detector],
)

def my_custom_failure_detector(exception):
    """Custom logic to determine if exception should trip circuit"""
    if isinstance(exception, openai.APIError):
        # Only trip for 5xx errors
        if hasattr(exception, 'status_code'):
            return 500 <= exception.status_code < 600
    return False

Adaptive Thresholds

Adjust failure threshold based on traffic volume:

import time

class AdaptiveCircuitBreaker:
    def __init__(self):
        self.base_fail_max = 5
        self.requests_per_minute = 0
        self.last_reset = time.time()
        
        self.breaker = CircuitBreaker(
            fail_max=self.base_fail_max,
            reset_timeout=30
        )
    
    def call(self, func, *args, **kwargs):
        # Track request rate
        self.requests_per_minute += 1
        if time.time() - self.last_reset > 60:
            self.requests_per_minute = 0
            self.last_reset = time.time()
        
        # Adjust threshold based on traffic
        # High traffic = more lenient (allow more failures)
        if self.requests_per_minute > 100:
            self.breaker._failure_threshold = 20
        elif self.requests_per_minute > 50:
            self.breaker._failure_threshold = 10
        else:
            self.breaker._failure_threshold = self.base_fail_max
        
        return self.breaker.call(func, *args, **kwargs)

Per-Endpoint Circuit Breakers

Different endpoints have different reliability profiles:

# Separate circuit breakers for different operations
chat_breaker = CircuitBreaker(fail_max=5, reset_timeout=30, name="OpenAI-Chat")
embedding_breaker = CircuitBreaker(fail_max=10, reset_timeout=60, name="OpenAI-Embeddings")
image_breaker = CircuitBreaker(fail_max=3, reset_timeout=120, name="OpenAI-Images")

@chat_breaker
def chat_completion(...):
    ...

@embedding_breaker
def create_embedding(...):
    ...

@image_breaker
def generate_image(...):
    ...

Monitoring & Dashboards

Metrics to Track

Circuit State Changes:
- When did circuit open?
- How long was it open?
- How many times per day?
Failure Rate:
- Failures per minute
- Failure types (timeout, 500, etc)
- Which provider fails most?
Fallback Usage:
- % of requests using fallback
- Cost impact of fallbacks
Recovery Time:
- How quickly do circuits close?
- Are recovery tests succeeding?

Prometheus Metrics

from prometheus_client import Counter, Gauge, Histogram

# Circuit breaker metrics
circuit_state = Gauge('circuit_breaker_state', 'Circuit breaker state', ['provider'])
circuit_failures = Counter('circuit_breaker_failures', 'Failures counted', ['provider'])
circuit_state_changes = Counter('circuit_breaker_state_changes', 'State transitions', ['provider', 'from_state', 'to_state'])

def on_circuit_state_change(breaker, old_state, new_state):
    circuit_state.labels(provider=breaker.name).set(
        1 if new_state == 'open' else 0.5 if new_state == 'half_open' else 0
    )
    circuit_state_changes.labels(
        provider=breaker.name,
        from_state=old_state,
        to_state=new_state
    ).inc()

Grafana Dashboard

Create a dashboard with:

Circuit state over time (closed/open/half-open)
Failure rate by provider
Fallback usage percentage
Recovery time histogram

Production Checklist

Before deploying circuit breakers to production:

Expected Results

Without Circuit Breakers:

Provider outage scenario:
- 10,000 requests retry indefinitely
- App becomes unresponsive
- $50,000 in wasted API calls
- 2 hour recovery time
- Customer complaints

With Circuit Breakers:

Same outage scenario:
- 5 requests fail, circuit opens
- 9,995 requests fast-fail to fallback
- App remains responsive
- $250 in failed calls, rest goes to fallback
- 30 second automatic recovery
- Users barely notice

Cost Savings in Outages:

Prevented: $49,750
Time Saved: 1h 59m 30s
User Impact: Minimal

Reliability Improvement:

99.9% → 99.99% uptime
Mean Time To Recovery: 2 hours → 30 seconds

Troubleshooting

Circuit breaker not opening

Possible causes:

Failure threshold too high
Errors being excluded
Window size too large

Fix:

# Lower threshold for testing
CircuitBreaker(
    fail_max=3,        # Was 5
    window_size=30,    # Was 60
    exclude=[]         # Don't exclude any errors during testing
)

Circuit breaker opens too easily

Possible causes:

Threshold too low for traffic volume
Counting expected errors as failures

Fix:

# Increase threshold or exclude expected errors
CircuitBreaker(
    fail_max=10,  # Was 5
    exclude=[
        RateLimitError,
        TimeoutError,  # If timeouts are common
    ]
)

Recovery testing too aggressive

Symptom: Circuit repeatedly opens and closes

Fix:

# Increase recovery timeout
CircuitBreaker(
    reset_timeout=60,  # Was 30
    success_threshold=3  # Need 3 successes to fully recover
)

Next Steps

Once circuit breakers are working:

Add Caching (see Caching Implementation Guide)
- Serve cached responses when circuit is open
- Reduce dependency on live providers
Implement Retry with Backoff (see Retry Strategies Guide)
- Intelligent retries before opening circuit
- Exponential backoff
Set Up Comprehensive Monitoring (see Monitoring Guide)
- Track circuit state in real-time
- Alert on concerning patterns

Additional Resources

Support

Need help implementing circuit breakers?

Estimated Implementation Time: 2-3 hours
Difficulty: ⭐⭐⭐☆☆ (3/5)
Impact: 🚀🚀🚀🚀🚀 (5/5 - Prevents catastrophic failures)

Last Updated: January 26, 2026
Tested with: pybreaker 1.0.1, OpenAI SDK 1.12.0, Anthropic SDK 0.18.0