Skip to main content

Use Cases

The following guides are provided to serve as example use cases for how you can use LangSmith's production logging and automations. These are not meant to be exhaustive, nor are they optimized for your use case. They are meant as a reference to help you get started.

Add bad datapoints to an annotation queue

This flow can be used to send datapoints that get negative end user feedback to an annotation queue, where you can inspect them manually. To do this, you will want to set Sampling Rate to be 1, and you will want to filter to root runs with negative feedback. Your action is to send to an annotation queue.

Send datapoints with positive feedback to a dataset.

This flow can be used to send datapoints with positive feedback to a dataset. The logic here is that these datapoints are good input/output pairs. They can then be used for testing, few shot prompting, or fine tuning. You will want to set a Sampling Rate of 1 and you will want to filter to root runs with positive feedback. Your action is to send to a dataset.

Send child runs whose trace got positive feedback to a dataset

In the case where your application contains multiple steps, you may also want to build up datasets for each step. This is useful because often you will want to use few shot prompting on these individual steps. In order to do that, you need to collect examples of those individual steps. You often don’t get direct feedback on those individual steps, but rather on the top level trace. If you assume that if you get positive feedback on the overall trace then all sub-runs are also correct, you can set up a rule that selects all particular sub-runs whose parent trace has positive feedback.

Send random datapoints to an annotation queue

You don’t always get great end user feedback (or it may be biased), so you may want to randomly sample some percentage of datapoints and send them to an annotation queue.

To do this, you can set Sampling Rate to be a fraction and then sample all root runs.

Combinations of the above

Oftentimes, combinations of the above rules make sense. For example, longer stacks of runs could be:

The “Help me get positive examples” rule set

  • For a random sample of runs, send them to the annotation queue
  • For any datapoints with positive feedback, add them (or their child runs) to a dataset
  • You can then use these examples as few shot examples or to finetune

The “What datapoints should I look at?” rule set

  • Define an online evaluator to run over random sample of datapoints
  • For any datapoints with negative LLM feedback, send them to annotation queue for manual inspection
  • From the annotation queue, you can give them correct labels and add them to a dataset to test against to see if you can improve

Was this page helpful?


You can leave detailed feedback on GitHub.