You know that feeling when you're trying to build something genuinely smart, not just another basic app? That's exactly where I was a few months back. Everyone keeps talking about AI, and after reading a few posts like What Everyone's Talking About in Tech Right Now, I was buzzing with ideas. I wanted to build an AI shopping assistant, something way more than just a glorified search bar. I imagined a system that could understand what people really needed, compare products in different ways, and even suggest green alternatives for a client who only sold eco-friendly electronics.

My Naive Start and The Headaches

Initially, I thought, "How hard can it be?" I grabbed langchain and started with one big, single agent. The idea was simple: feed it a user question, give it a search tool, and let it recommend stuff. I was running Python 3.10 and using a basic OpenAI gpt-3.5-turbo model. My backend was a minimal FastAPI app (version 0.95.0 at the time) wrapped in a Docker container, with a React 17 frontend.

It was a disaster. The agent was often vague, making up product names, or just getting stuck in repetitive loops. If a user asked, "Find me a durable laptop under £800 that's good for video editing and has a long battery life," the agent would typically search for "durable laptop" and give a generic list. It couldn't properly compare or put information together. In production, I learned that a single brain just isn't enough for complex thinking. The response times were terrible too, often hovering around 5-8 seconds per question, which is ages for a user expecting quick answers. This approach meant I was spending hours trying to fix the agent's thought processes, constantly tweaking prompts to make it 'smarter'. I messed this up at first, honestly. One of my biggest mistakes was underestimating how much structure the AI needed.

The Breakthrough: CrewAI and Multi-Agent Magic

After a particularly frustrating 3-hour debugging session where the agent kept recommending non-existent "EcoFlex 3000" headphones, I realised I needed a different approach. That's when I stumbled upon CrewAI (I started with version 0.20, then quickly upgraded to 0.28). It was a game-changer. The idea of creating a crew of specialist agents, each with specific jobs and tools, just clicked. It's like organising a small team for a project; you wouldn't expect one person to do everything from research to quality checks.

I broke down the problem into different jobs:

Researcher Agent: This one was in charge of searching for product details, reviews, and specs. Its main tool was Tavily (a fantastic search API, way better for targeted searches than just requests and BeautifulSoup).

Critic Agent: This agent's job was to check the Researcher's findings for accuracy, completeness, and if it was actually relevant. Did the Researcher miss anything? Is the information verifiable? This was super important for reducing made-up info. My tech lead pointed out during a code review that this would be key for preventing bad recommendations.

Recommender Agent: Took the cleaned-up information from the Critic and made a short, helpful product recommendation, making sure it fit what the user asked for.

Curator Agent: Made sure the final output was well-formatted, easy to read, and sounded like the brand. This also handled any follow-up questions.

This multi-agent setup, working together one after the other, felt much more solid and reliable. I also upgraded my backend to FastAPI 0.109.0 and my Python environment with Uv (version 0.1.0) for lightning-fast dependency resolution – honestly, Uv reduced my dependency install times from 45s to about 12s on fresh builds, which made my life so much easier. For the frontend, I moved to React 18, using useEffect and useState to make things stream way better.

Here’s a simplified peek at how a CrewAI agent and task looked:

python

from crewai import Agent, Task, Crew, Process
from langchain_community.tools.tavily_search import TavilySearchAPIWrapper
from dotenv import load_dotenv
import os

load_dotenv()

tavily_tool = TavilySearchAPIWrapper(tavily_api_key=os.getenv("TAVILY_API_KEY"))

# Define the Researcher Agent
researcher = Agent(
    role='Senior Product Researcher',
    goal='Find detailed specifications and reviews for eco-friendly electronics',
    backstory='You are an expert in sustainable tech, meticulously gathering accurate product data.',
    verbose=True,
    allow_delegation=False,
    tools=[tavily_tool]
)

# Define the Critic Agent
critic = Agent(
    role='Information Quality Analyst',
    goal='Verify the accuracy and completeness of product research',
    backstory='You scrutinise every piece of information, ensuring no detail is missed or misrepresented.',
    verbose=True,
    allow_delegation=False
)

# Define the Recommender Agent
recommender = Agent(
    role='Personal Shopping Advisor',
    goal='Craft personalized, unbiased product recommendations based on verified research',
    backstory='You understand user needs deeply and recommend the best sustainable options.',
    verbose=True,
    allow_delegation=False
)

# Example Task
research_task = Task(
    description="""Research the best eco-friendly laptop for a student needing good battery life
    and suitable for light video editing, under £800.
    Focus on performance, battery, and environmental certifications.""",
    agent=researcher
)

critic_task = Task(
    description="""Review the research findings from the Researcher agent. Identify any gaps,
    inaccuracies, or points that need further investigation.
    Ensure the information is sufficient to make a robust recommendation.""",
    agent=critic,
    context=[research_task]
)

recommend_task = Task(
    description="""Based on the verified research, provide a concise and helpful recommendation for the student.
    Highlight key features, pros, and cons, and why it fits the eco-friendly and budget criteria.""",
    agent=recommender,
    context=[critic_task]
)

# Instantiate your crew
project_crew = Crew(
    agents=[researcher, critic, recommender],
    tasks=[research_task, critic_task, recommend_task],
    verbose=2,
    process=Process.sequential # For sequential task execution
)

# Kick off the crew
# crew_result = project_crew.kickoff()
# print(crew_result)

This way of doing things prevented at least 3 really bad issues related to wrong information within the first two weeks of testing. It also made finding problems way easier; if something went wrong, the detailed messages from CrewAI showed exactly which agent failed and why.

Gotchas and Production War Stories

Even with CrewAI, it wasn't all smooth sailing. When we first put it live on a small server (costing about £15/month), we ran into a few snags:

Rate Limits: Tavily and OpenAI have rate limits, you know? When our API hit 100k requests/day during a test phase, we started seeing 429 Too Many Requests errors. I had to set up a smart way to retry requests slowly on the client side and add caching for popular product questions in our PostgreSQL 14 database. Database queries for cached items went from 300ms (doing a live lookup) to 50ms (from cache). Took me forever to figure out this one, at 2 AM when the API started timing out consistently.

Agent Loops: Sometimes, an agent would get stuck in a loop, asking for more info from another agent that wasn't meant to do that job, or repeatedly searching for the same thing. After debugging for 6 hours, it turned out I had too much allow_delegation=True in agents that weren't designed to take over the task. Setting allow_delegation=False for specialist agents fixed this, making them do only their job.

Cost Management: Running GPT-4 for every question can get expensive. After checking how the app was using resources, I found that complex questions were asking the AI model lots of times. We switched to a cheaper gpt-3.5-turbo model for simpler queries and cut down our prompts a lot to use fewer words/tokens. This cut our LLM costs by about 40% in the first month.

Streaming UI: Building a good user experience for streaming AI responses on the frontend was tricky. Initially, I just waited for the full response, which led to the old 5-second wait. Switching the FastAPI endpoint to stream SSE (Server-Sent Events) and handling partial updates in React made a huge difference. Users saw results appearing almost immediately, making it feel much faster.

Measurable Outcomes and Lessons Learned

The impact was a big deal. The AI assistant, after these tweaks, reduced load time for complex questions from an average of 5-8 seconds to 1.2-2.5 seconds. More importantly, the accuracy of product recommendations went from about 40% (often had nothing to do with it or made up) to over 90% (checked and spot-on). This way of doing things prevented loads of customer service tickets related to wrong product info. When the client launched, with 10k users interacting with the assistant, the system remained stable, and the AI responses were consistently under 2 seconds, which was a huge win.

One of my biggest mistakes was not going with the multi-agent idea from the start. I tried to make one agent too smart, too general. What I would do differently next time is to design the Crew structure and agent jobs before writing any code. Sketch out how the information flows, decide which agent needs which tool, and give each a clear goal. This would save days of refactoring and debugging.

This project really hammered home the power of specialised AI agents working together. It’s not about finding the 'best' single prompt or model; it's about getting them to work well together. If you're playing with AI, especially with language models, don't try to build one big brain. Think about how a human team works together, and you'll be much closer to a solid solution. Oh, and one more thing, definitely check out Uv if you haven't. It's a game-changer for Python dev experience. Perhaps, with the upcoming Apple M5 chips, running more complex local LLMs will become even easier, making new things possible for agent design without having to worry about the immediate costs of cloud APIs. Imagine developing and improving on these crews locally with incredible speed!

This wasn't just about shipping code; it was about learning how to build intelligent systems that are actually useful. Every bug, every late-night fix, taught me something really important about how to build AI. It's a hard but super satisfying space to be in, and I'm still buzzing about what's next for AI and webdev. And hey, if you're struggling with complex data or anything, remember that sometimes a structured approach with multiple 'brains' is the only way to get it right. Also, if you're dealing with something like crypto tax, you definitely want an 'agent' that's accurate and reliable, not one that hallucinates your gains or losses!

My wild ride building an AI shopping assistant

My Naive Start and The Headaches

The Breakthrough: CrewAI and Multi-Agent Magic

Gotchas and Production War Stories

Measurable Outcomes and Lessons Learned

TOPICS

Share This Article

Related Articles

Pebble went open source you guys

Browser Fingerprinting Is a Sneaky Privacy Trap

My fight for nuanced AI images (with a nano banana)

Archives

Categories

The Dev Insights Team