Building Integrations That Don't Break
A practical guide to system integration patterns, error handling, and the decisions that determine whether your integrations help or hurt.
Patterns from production systems, not textbooks.
We could tell you to be consistent with your naming conventions and use proper HTTP status codes. You already know that. What we actually want to talk about is what happens when your API hits real traffic and things start breaking.
The stuff that matters at scale isn't in the tutorials. It's infrastructure decisions that determine your monthly bill. Caching strategies that turn a 5-second timeout into a 50ms response. Knowing when your database is about to become your bottleneck - ideally before it does.
And increasingly, it's about the fact that modern APIs don't exist alone. They orchestrate calls to AI providers, CRMs, payment processors, video generation services. Your reliability is now bounded by your least reliable dependency, and your costs are driven by services you don't control. That changes everything about how you architect.
Before you write any code, you need to decide how requests reach your backend. This is the decision that determines your monthly bill and your scaling ceiling - and it's almost never discussed in tutorials.
On AWS, the choice typically comes down to API Gateway versus Application Load Balancer. Here's what the numbers actually look like:
We've watched companies spend $3,000/month on API Gateway when an ALB would have cost $200. We've also seen the reverse - teams running ALBs with hand-rolled rate limiting code that took weeks to build, when API Gateway would have given them the same thing out of the box.
The break-even point is roughly 5.3 million requests per month. Below that, API Gateway wins. Above that, ALB wins - sometimes dramatically. Do the math for your traffic before you commit.
Everyone knows caching improves performance. What surprises us is how few teams actually implement it correctly. The difference between a good caching strategy and a bad one isn't 10% - it's 10-100x in latency.
Redis with ElastiCache 7.1 delivers P99 latency under 1 millisecond. We've tested this across production workloads - it holds up. A single cluster handles 500+ million requests per second. The question isn't whether Redis is fast enough. It's whether you're actually using it, and using it correctly.
Here's where most teams fail: cache invalidation. The safe pattern we recommend: delete cache entries instead of updating them (deletes are idempotent), always set a TTL as a safety net, and invalidate as close to the database write as possible. For complex systems, tag-based invalidation lets you group related entries and clear them together.
The most common caching bug we see: updating the cache entry instead of deleting it. When your update fails halfway through, you've got a cache that lies to your users. Delete and let the next read repopulate. It's slower but it's correct.
We've watched teams skip rate limiting until their AI provider bills hit $10K in a weekend. One runaway client, one bug that retries infinitely, one malicious actor - and your margins are gone. Rate limiting isn't about being stingy with your API. It's about survival.
The algorithm you choose affects both accuracy and what happens at the edges:
For distributed systems, we implement sliding window rate limiting in Redis with Lua scripts. The entire check-and-increment must be atomic - otherwise race conditions let traffic through. A sorted set with timestamps as scores handles the window trimming efficiently. If this sounds complicated, that's because it is. Get it from a library, not from scratch.
Rate your limits by dimension: per-IP for anonymous traffic (first line of defense), per-API-key for authenticated requests (primary control), and per-user for consumer apps (tiered plans). Most APIs need all three. If you only implement one, you'll regret it.
This isn't a prediction. It's a pattern. Your database will become your bottleneck before anything else. The patterns that prevent this aren't complicated, but they require decisions upfront that are painful to change later.
The N+1 query problem kills more APIs than any other issue. You fetch a list of 100 items, then query related data for each one individually. That's 101 queries instead of 2. At scale, this turns a 50ms endpoint into a 5-second timeout. The DataLoader pattern batches and deduplicates these requests within a single request cycle - but you have to know you need it.
Schema design for APIs requires thinking about your query patterns upfront. If you'll filter by status and sort by date on every list endpoint, that compound index should exist before you ship. If your JSON columns are frequently queried, consider JSONB with GIN indexes. The best index strategy comes from your API contract, not your entity relationships.
Selective denormalization is not a sin. When P95/P99 latencies are dominated by joins across 3+ tables, store the pre-joined shape directly. We use the mantra: "Normalize until it hurts, denormalize until it works."
For write-heavy APIs (event ingestion, logging, IoT), the CQRS pattern separates read and write models. Writes go to an append-only event store optimized for inserts. A projection service builds read-optimized views asynchronously. This eliminates contention between read and write workloads at the database level.
Average latency is a vanity metric. If your average is 100ms but your P99 is 3 seconds, 1% of your users are having a terrible experience. At a million requests per day, that's 10,000 frustrated users. And you'll never know from your average.
For error rate alerting, we use Google's SRE approach: page immediately if you burn 2% of your error budget in 1 hour, create a ticket if you burn 10% in 3 days. This balances urgency with alert fatigue. Nothing kills an on-call rotation faster than alerts that cry wolf.
OpenTelemetry has become the standard for distributed tracing. Start with auto-instrumentation, add manual spans for business-critical paths. Correlate logs with traces using SPAN_ID and TRACE_ID. The goal isn't to trace everything - it's to trace what matters when something breaks.
Your API is only as reliable as the clients consuming it. If clients hammer you during outages, they make recovery harder. We've seen APIs stay down an extra hour because clients kept retrying aggressively the moment the server came back up.
Exponential backoff with jitter prevents thundering herds. When a thousand clients retry at exactly the same time, they create the same spike that caused the failure. Adding random jitter (0-500ms) spreads retries across time.
Document your rate limits, retry expectations, and error formats. The best API documentation answers "what do I do when this fails?" not just "how do I call this endpoint?" If your docs don't cover failure modes, your clients will guess - usually wrong.
Most production APIs aren't standalone systems. They orchestrate calls to Salesforce, Stripe, AI providers, and dozens of other services. Your API's reliability is now bounded by your least reliable dependency - and you don't control any of them.
The patterns you use for your own rate limiting don't apply when you're the client. Third-party rate limits are non-negotiable, poorly documented, and often inconsistent. AI provider rate limits vary by model, account tier, and time of day. Salesforce's daily API limits reset at midnight in your org's timezone. HubSpot throttles by 100 requests per 10 seconds, then blocks for 11 seconds. Nobody tells you this until you hit the wall.
Third-party APIs deprecate endpoints constantly. Twilio sunset Authy. Salesforce removes API versions after ~3 years. Build deprecation monitoring into your CI - flag any API calls to endpoints with announced sunset dates. Future you will thank present you.
For multi-vendor orchestration, the saga pattern prevents partial failures. If creating a customer requires Salesforce (CRM), Stripe (billing), and SendGrid (welcome email), you need compensating actions. When Stripe fails after Salesforce succeeds, you need to either retry Stripe or roll back the Salesforce contact. Queued sagas with explicit compensation handlers are the only reliable approach.
Not everything fits in a request-response cycle. AI video generation takes 2-5 minutes. Batch CRM imports process thousands of records. Report generation queries terabytes. We've seen teams try to shove all of this into synchronous endpoints. It never works.
The decision framework is simple: if P95 latency exceeds your frontend timeout (typically 30 seconds), it needs to be async. If it can fail partially and retry matters, it needs to be async. If users benefit from progress updates, it needs to be async.
For AI integrations specifically, the token-streaming pattern has become standard. Start returning tokens as they're generated rather than waiting for completion. Users see response start in ~200ms instead of waiting 5-15 seconds. This requires SSE or WebSockets, but the UX improvement is dramatic.
Job cleanup matters. A HeyGen video generation job produces a 500MB file. Store the result URL, not the file, in your job status. Set aggressive TTLs on job metadata (24-48 hours) and implement archival for compliance needs. Orphaned job data is a common source of runaway storage costs - we've seen S3 bills double from forgotten temp files.
AI APIs changed the economics of API design. A single large language model call can cost $0.01-0.15 depending on provider and model. HeyGen video generation runs $0.10-1.00+ per minute. At scale, a poorly optimized AI feature costs more than your entire infrastructure.
The math is unforgiving: 100,000 users making 10 LLM calls per day at $0.05 average equals $50,000/month in AI costs alone. Without caching and optimization, AI features don't scale - they bankrupt. We've seen startups burn through their seed round on AI bills because nobody did the math.
The build-vs-buy calculation shifts with AI. Self-hosting open models on dedicated GPU instances costs $15,000-30,000/month but handles unlimited requests. At roughly 300,000-500,000 API calls per month, self-hosting breaks even. Below that, API costs are cheaper. Above that, dedicated infrastructure wins. OpenRouter and AWS Bedrock let you experiment with multiple providers before committing.
Track AI costs per feature, per user, and per customer segment. Some features have 10x the cost of others. Some customers generate 100x the AI spend. Without granular cost attribution, you can't price appropriately or identify optimization targets. We've helped teams cut AI costs by 70% just by finding the one feature that was hemorrhaging money.
For third-party integrations beyond AI, the same principles apply. Salesforce API calls are limited and precious - cache CRM data aggressively (1-5 minute TTL for frequently accessed records). Twilio charges per message - batch notifications into digests. Every API call has a cost, even if it's rate limits rather than dollars.
We've made these mistakes so you don't have to. Let's talk about your architecture before you discover the problems in production.
Start a Conversation→or email partner@greenfieldlabsai.com
A practical guide to system integration patterns, error handling, and the decisions that determine whether your integrations help or hurt.
How we choose technology stacks, integrate external services, and ship applications that hold up under real-world load.