Regression Testing
When evaluating LLM applications, it is important to be able to track how your system performs over time. In this guide, we will show you how to use LangSmith's comparison view in order to track regressions in your application, and drill down to inspect the specific runs that improved/regressed over time.
Overview
In the LangSmith comparison view, runs that regressed on your specified feedback key against your baseline experiment will be highlighted in red, while runs that improved will be highlighted in green. At the top of each column, you can see how many runs in that experiment did better and and how many did worse than your baseline experiment.
Baseline Experiment
In order to track regressions, you need a baseline experiment against which to compare. This will be automatically assigned as the first experiment in your comparison, but you can change it from the dropdown at the top of the page.
Select Feedback Key
You will also want to select the feedback key on which you would like focus. This can be selected via another dropdown at the top. Again, one will be assigned by default, but you can adjust as needed.
Filter to Regressions or Improvements
Click on the regressions or improvements buttons on the top of each column to filter to the runs that regressed or improved in that specific experiment.
Try it out
To get started with regression testing, try running a no-code experiment in our prompt playground or check out the Evaluation Quick Start Guide to get started with the SDK.