What this guide covers
This guide shows engineering teams how to replace multiple LLM integrations with one unified API. You’ll learn to implement automatic failover, standardized error handling, and intelligent retry logic without custom code.The problem with multiple LLM providers
Engineering teams maintaining separate integrations for each LLM provider face several challenges.- Different SDKs create complexity. Each provider requires its own library, authentication method, and error handling logic. Your codebase becomes fragmented with provider-specific code.
- Manual retry logic is error-prone. Teams write custom exponential backoff, track retry counts, and handle edge cases differently for each provider. This inconsistency leads to bugs.
- Provider outages affect availability. When OpenAI experiences downtime, your application fails even though Anthropic might be fully operational. There’s no automatic failover.
- Error responses vary wildly. OpenAI returns 429 for rate limits while Bedrock might return ThrottlingException. Your error handling becomes a maze of conditionals.
How Portkey solves these problems
Portkey acts as an intelligent gateway between your application and LLM providers.- One SDK replaces many. Use the same client and methods regardless of whether you’re calling OpenAI, Anthropic, or Bedrock. The API signature remains consistent.
- Automatic retries handle transient failures. Configure retry attempts and Portkey manages exponential backoff automatically. No custom retry loops needed.
- Instant failover maintains availability. When one provider fails, requests automatically route to your backup providers in milliseconds. Your application stays online.
- Standardized errors simplify handling. All providers return consistent error codes through Portkey’s gateway. One error handler works for all providers.
Quick start: Your first unified request
Let’s start with a basic example that demonstrates the unified interface.Step 1: Install the Portkey SDK
Step 2: Set up your AI Provider
Navigate to the Portkey dashboard and add your first AI Provider. This securely stores your API credentials.1
Go to AI Providers
Click AI Providers in the sidebar, then Add Provider

2
Select your service
Choose OpenAI, Anthropic, or AWS Bedrock from the list

3
Add credentials
Enter your API key or AWS credentials. Portkey encrypts and stores them securely.
4
Name your provider
Give it a memorable slug like
@openai-prod
or @anthropic-dev
. You’ll use this slug in your code.AI Provider Setup Guide
Detailed instructions for setting up providers and managing credentials
Step 3: Make your first request
With your provider configured, make requests using the unified API.@provider-slug/model-name
to specify both the provider and model. This keeps your code explicit about which provider serves each request.
Automatic failover between providers
Failover is Portkey’s most powerful feature for production applications. Configure multiple providers and Portkey automatically switches between them when failures occur.Understanding failover strategy
Failover works through a configuration object that defines your provider hierarchy and trigger conditions.mode
to “fallback” and specify which status codes trigger failover. Common triggers include rate limits (429) and server errors (500-504).
Targets execute in order. Portkey tries OpenAI first. If it fails with a trigger status code, Portkey immediately tries Anthropic. If Anthropic fails, it moves to Bedrock.
Override params customize each target. Since different providers use different model names, override_params lets you specify the correct model for each provider.
Implementing failover in code
Save your configuration in Portkey’s dashboard to get a config ID. Then reference it in your code.Monitoring failover behavior
Portkey’s observability dashboard shows exactly what happens during failover. You can see which providers were attempted, why they failed, and which one ultimately succeeded. Each attempt appears in the trace. Failed requests show their status codes and error messages. The successful request shows response time and tokens used.Tracing Guide
Learn how to trace requests across multiple providers
Intelligent retry logic
Retries handle temporary failures without failing over to another provider. Configure automatic retries with exponential backoff.Configuring retry behavior
Specify retry attempts and which status codes should trigger retries.Exponential backoff timing
Portkey automatically implements exponential backoff between retries. Each retry waits longer than the previous one.Retry Attempt | Wait Time | Cumulative Time |
---|---|---|
1st retry | 1 second | 1 second |
2nd retry | 2 seconds | 3 seconds |
3rd retry | 4 seconds | 7 seconds |
4th retry | 8 seconds | 15 seconds |
5th retry | 16 seconds | 31 seconds |
Combining retries with failover
Use both strategies together for maximum reliability. Retry transient failures on the primary provider, then failover if retries exhaust.Unified error handling
Handle errors consistently regardless of which provider generated them.Standard error codes
All providers return these standardized codes for most errors.Code | Description | Recommended Action |
---|---|---|
400 | Bad Request | Fix request parameters |
401 | Unauthorized | Check API credentials |
403 | Forbidden | Verify permissions |
408 | Request Timeout | Retry with backoff |
412 | Budget Exhausted | Increase budget limits |
429 | Rate Limited | Retry with backoff or failover |
446 | Guardrail Failed | Review content filters |
500-504 | Server Error | Retry or failover |
Implementing error handlers
Write one config handler that works for all providers. For example, if you want to implement fallback if you get rate limited on one particular Provider. You can simply attach a config on your request like this:- NodeJS
- Python
- OpenAI NodeJS
- OpenAI Python
- cURL
Streaming responses
Stream responses consistently across all providers. The streaming interface remains the same regardless of backend.Batch processing for scale
Process thousands of requests efficiently using Portkey’s unified batch API. This works across providers, even those without native batch support.Understanding batch modes
Portkey offers two batch processing modes to fit different needs. Provider batch mode uses native endpoints. When available, Portkey uses the provider’s batch API (like OpenAI’s batch endpoint). This typically offers discounted pricing but has a 24-hour completion window. Portkey batch mode works universally. For immediate processing or providers without batch support, Portkey manages batching at the gateway level. Requests process in groups of 25 with 5-second intervals.Batch Processing Guide
Complete documentation for batch inference at scale
Load balancing across keys
Distribute requests across multiple API keys or providers to maximize throughput and avoid rate limits.Configuring load distribution
Set weights to control traffic distribution between targets.Dynamic weight adjustment
Adjust weights without changing code by updating the config in Portkey’s dashboard. Gradually migrate providers. Start with 90% OpenAI and 10% Anthropic. Gradually shift traffic as you validate the new provider. Handle provider issues. If one provider experiences degraded performance, reduce its weight to minimize impact while maintaining some traffic for monitoring.Monitoring and observability
Portkey provides comprehensive observability for all your LLM requests. Monitor performance, costs, and errors across all providers from a single dashboard.
Analytics Dashboard
Deep dive into analytics and monitoring capabilities
Dynamic configuration updates
Update your routing logic without touching code. Modify configs through Portkey’s dashboard and changes apply immediately.When to update configs
- Add new providers. When you get access to a new model or provider, add it to your fallback chain without deployment.
- Adjust retry logic. If you’re seeing more transient errors, increase retry attempts. If latency is critical, reduce them.
- Shift traffic gradually. Use load balancing to gradually migrate from one provider to another while monitoring performance.
- Respond to incidents. If a provider experiences an outage, temporarily remove it from rotation or reduce its weight.
Config versioning
Portkey maintains version history for all configs. You can rollback to previous versions if issues arise. Test changes safely. Create a new config version and test with a small percentage of traffic before full rollout. Audit changes. Every config change is logged with timestamp and author for compliance and debugging.What you’ve built
By implementing this guide, your engineering team now has:- Single API interface. One SDK and consistent methods for all LLM providers. No more provider-specific code scattered through your application.
- Automatic failover. When providers fail, requests seamlessly route to backups. Your application stays online even during provider outages.
- Unified error handling. Consistent error codes across all providers. One error handler works everywhere.
- Intelligent retries. Automatic exponential backoff for transient failures. No custom retry loops needed.
- Production observability. Complete visibility into requests, costs, and performance across all providers.
- Dynamic configuration. Update routing logic, add providers, or adjust limits without code changes or deployments.
Next steps
Explore these advanced capabilities to further enhance your LLM infrastructure.Conditional Routing
Route requests based on metadata, user tiers, or custom rules
Guardrails
Add content filters and safety checks to all requests