You know that feeling when everyone's buzzing about a new tech, and you just have to try it? That was us, early 2025, with large language models. We were building a really cool 'smart assistant' for a client's internal delivery system – picture it making things run super smooth, spotting weird stuff, and answering tricky questions, all thanks to an LLM. When the whispers about Gemini 3 started, promising huge jumps in how smart it was and how well it 'got' things, we were all in.
Chasing the AI Dragon
My team had to plug this smart AI brain into our existing backend (that's the server side, running Node 20.9.0) and our React 18 frontend (that's what users see). Our current solution used an older LLM, but it was... okay. We saw how Gemini 3 could really change our system for the better. We'd been prototyping with earlier Gemini versions, seeing some decent results, but the promised leap with Gemini 3 felt like a game-changer.
Around mid-2025, the rumours became real. We planned how our system would work, totally thinking about Gemini 3, expecting its better speed and cool new stuff. This meant less of our own 'if-this-then-that' rules and more trust in the LLM's pure thinking power. It felt right at the time – like we were future-proofing. Honestly, one of my biggest mistakes was not building in solid backup plans from day one. We got a bit too comfy, totally wowed by the demos.
Fast forward to late October 2025. Gemini 3 launched. We got access, plugged it into our test setup, and the results were seriously amazing. Our internal tests showed P99 response times for complex queries dropping from around 1.5 seconds to just under 300ms. We were chuffed. We pushed it live for a few users, watching it like hawks. Everything looked solid.
The Day Everything Broke
Then came November 2025. It was a Tuesday. I remember it clearly because my sprint planning usually runs like clockwork. Around 10 AM UK time, our internal down detector alerts started screaming. Specific parts of our app that really leaned on Gemini 3 started timing out. At first, it was just here and there, then it totally blew up. Our P99 latency for LLM-powered responses shot from 300ms to over 5 seconds, then just 504 Gateway Timeouts.
Our support team was flooded with calls. Users couldn't get important info, couldn't okay deliveries, couldn't make routes better. We had a massive problem. My tech lead, Sarah, and I jumped onto a call with the rest of the backend team. We checked our own servers – CPU, memory, network – everything looked fine. Our PostgreSQL 15 database was humming along. The error messages consistently pointed to our calls to the external LLM.
We immediately suspected the Gemini 3 API. A quick check of down detector and Twitter confirmed it: a widespread outage. It wasn't just us. The first reports were a bit fuzzy, blaming Cloudflare for problems hitting services everywhere. This was confusing, because our direct calls to Gemini 3 weren't supposed to go through Cloudflare like a normal website. After six hours of digging, we found out it was a deeper problem inside Google's own 'Antigravity' system (that's their internal network for services) causing all the Gemini 3 issues worldwide. Cloudflare just got caught in the middle because of bigger network issues, but our real problem was deeper.
Our app just wasn't built to handle an outside service failing this badly. Our backend queues (usually for background jobs) started filling up with failed LLM requests that were trying again and again, super fast. Our server costs shot up by 300% during the 4-hour outage because of all those endless retries and the huge number of failed requests eating up all our processing power. It cost us about £5k just in server costs, not even counting the huge mess it made for our client.
The Frantic Fix and Our Breakthrough
That afternoon was a blur. We rushed out an emergency fix: a simple 'circuit breaker' (a neat trick with a Node.js library called opossum) around our Gemini 3 API calls. If the LLM service failed more than 5 times in 10 seconds, it would 'open' the circuit for 30 seconds. That meant it would immediately send back a cached "service unavailable" message or just use a much simpler, pre-made answer. It wasn't perfect, but it stopped the whole mess from getting worse.
// Simplified example of a circuit breaker in Node.js
const CircuitBreaker = require('opossum');
const options = {
timeout: 3000, // If our API call takes longer than 3 seconds, it's a failure
errorThresholdPercentage: 50, // If 50% of requests fail, open the circuit
resetTimeout: 10000 // After 10 seconds, try again
};
const llmServiceCall = async (prompt) => {
// In a real app, this would be an actual API call to Gemini 3
console.log(`Calling Gemini 3 with prompt: ${prompt}`);
if (Math.random() < 0.6) { // Simulate failure 60% of the time for demo
throw new Error('LLM API call failed or timed out');
}
return `Response for: ${prompt}`;
};
const breaker = new CircuitBreaker(llmServiceCall, options);
breaker.fallback(() => {
console.warn('Gemini 3 service unavailable, returning fallback.');
return 'I am currently unable to process complex requests. Please try again later.';
});
// Monitor circuit state
breaker.on('open', () => console.log('Circuit opened!'));
breaker.on('close', () => console.log('Circuit closed.'));
breaker.on('halfOpen', () => console.log('Circuit half-open, trying...'));
// Example usage
(async () => {
for (let i = 0; i < 20; i++) {
try {
const result = await breaker.fire(`Tell me about today's logistics, request ${i}`);
console.log(`Result: ${result}`);
} catch (error) {
console.error(`Request failed: ${error.message}`);
}
await new Promise(resolve => setTimeout(resolve, 500));
}
})();
That night, we also set up a smarter caching system for frequently asked LLM questions, using Redis. Our tech lead, Sarah, had pushed for this weeks ago during a code review for another feature. “What if the LLM goes down for 30 minutes?” she’d asked. I’d brushed it off, thinking, “It’s Google, it won’t happen.” Boy, was I wrong. This new cache cut down our direct Gemini 3 API calls by about 40% for frequently requested data, making us way tougher against outages.
Lessons Learned the Hard Way
When we went live, I learned that even the fanciest AI models are just another API call, and they can have the same network weirdness and tech hiccups as anything else. Our post-mortem (that's when we look back at what went wrong) was tough, but we needed it. We spent 3 hours picking apart what went wrong, what we could've done better, and how to stop it from happening again.
programming strategy, if you like) for when the LLM is just super busy, not totally broken.rate limiting at our API gateway (that's like a traffic cop for our services, running Nginx with OpenResty) to stop our own services from accidentally overwhelming the LLM provider when it's already struggling.This experience made me think about other dependencies too. When I was building a custom search component for a different project, it started with a simple keyword match. But as data grew, it got slow. I started with Lucene but realised it was way too much, then spent months moving to something I built myself, which actually led to My Zig Journey Building a Fast Search Component. The idea is similar: know what you're relying on, and build your system to be tough.
What I'd Do Differently (and My Advice to You)
Looking back, I'd have pushed harder for those circuit breakers and backup plans right at the start of the project. It's easy to get super excited about a new tech like Gemini 3 and forget about the real-world problems. I messed this up at first: I didn't add proper error handling and retry logic from the start. That was a big mistake.
If you're integrating AI or any other critical third-party service, here's my advice:
Test Your Failure Modes: Don't just test if it works. Test what happens when it doesn't* work. Use tools like Chaos Monkey in your test setup.
* Build Redundancy: Can you use multiple LLM providers? Can you have a simpler local model as a fallback? This makes things a bit more complex, sure, but it stops one thing from taking everything down.
* Monitor Everything: Not just your app, but how those outside services are doing too. Set up alerts that scream if something's wrong.
* Communicate Clearly: When something breaks, tell your users and key people right away. Transparency builds trust.
We learned a tough lesson that November. But we came out of it with a much tougher and more reliable system. Our post-outage test coverage for LLM integration went from 60% to 95%, and we’ve had zero critical incidents related to external API failures since. Sometimes, the biggest breakthroughs come from the biggest failures. And trust me, that Gemini 3 outage was a failure we won't soon forget.
