This week's video covers the most common mistake we see on RAG projects. Many teams start the same way they would a traditional software engineering project: with a user demo. Demos make sense when user adoption is the biggest risk. But with LLMs, that’s not the problem. The real challenge is getting the model to perform reliably and not make costly mistakes.
When you build a chat interface on your data, users will ask a few questions, get some responses, and say, “This is cool.” But that feedback is shallow—it doesn’t give you the transparency you need into how your system actually performs.
Instead of building a demo, generate a set of representative questions your users are likely to ask, along with the desired answers. Run them through your application and compare the results with users. This process will surface real insights—risks, gaps, and improvement areas.
This is a performance evaluation framework, a critical part of Performance-Driven Development (PDD) or PDD. It gives you the transparency you need to understand your system’s strengths and weaknesses, so you can iterate and improve based on data—not subjective opinions. Check out our GitHub repo for more on PDD.