Dataset transformations

LangSmith allows you to attach transformations to fields in your dataset's schema that apply to your data before it is added to your dataset, whether that be from UI, API, or run rules.

Coupled with LangSmith's prebuilt JSON schema types, these allow you to do easy preprocessing of your data before saving it into your datasets.

Transformation types

Transformation Type	Target Types	Functionality
remove_system_messages	Array[Message]	Filters a list of messages to remove any system messages.
convert_to_openai_message	Message Array[Message]	Converts any incoming data from LangChain's internal serialization format to OpenAI's standard message format using langchain's convert_to_openai_messages. If the target field is marked as required, and no matching message is found upon entry, it will attempt to extract a message (or list of messages) from several well-known LangSmith tracing formats (e.g., any traced LangChain BaseChatModel run or traced run from the LangSmith OpenAI wrapper), and remove the original key containing the message.
convert_to_openai_tool	Array[Tool] Only available on top level fields in the inputs dictionary.	Converts any incoming data into OpenAI standard tool formats here using langchain's convert_to_openai_tool Will extract tool definitions from a run's invocation parameters if present / no tools are found at the specified key. This is useful because LangChain chat models trace tool definitions to the `extra.invocation_params` field of the run rather than inputs.
remove_extra_fields	Object	Removes any field not defined in the schema for this target object.

Chat Model prebuilt schema

The main use case for transformations is to simplify collecting production traces into datasets in a format that can be standardized across model providers for usage in evaluations / few shot prompting / etc downstream.

To simplify setup of transformations for our end users, LangSmith offers a pre-defined schema that will do the following:

Extract messages from your collected runs and transform them into the openai standard format, which makes them compatible all LangChain ChatModels and most model providers' SDK for downstream evaluation and experimentation
Extract any tools used by your LLM and add them to your example's input to be used for reproducability in downstream evaluation

tip

Users who want to iterate on their system prompts often also add the Remove System Messages transformation on their input messages when using our Chat Model schema, which will prevent you from saving the system prompt to your dataset.

Compatibility

The LLM run collection schema is built to collect data from LangChain BaseChatModel runs or traced runs from the LangSmith OpenAI wrapper.

Please reach out to support@langchain.dev if you have an LLM run you are tracing that is not compatible and we can extend support.

If you want to apply transformations to other sorts of runs (for example, representing LangGraph state with message history), please define your schema directly and manually add the relevant transformations.

Enablement

When adding a run from a tracing project or annotation queue to a dataset, if it has the LLM run type, we will apply the Chat Model schema by default.

For enablement on new datasets, see our dataset management how-to guide.

Specs

For the full API specs of the prebuilt schema, see the below sections:

Input Schema

{
  "type": "object",
  "properties": {
    "messages": {
      "type": "array",
      "items": {
        "$ref": "https://api.smith.langchain.com/public/schemas/v1/message.json"
      }
    },
    "tools": {
      "type": "array",
      "items": {
        "$ref": "https://api.smith.langchain.com/public/schemas/v1/tooldef.json"
      }
    }
  },
  "required": ["messages"]
}

Output Schema

{
  "type": "object",
  "properties": {
    "message": {
      "$ref": "https://api.smith.langchain.com/public/schemas/v1/message.json"
    }
  },
  "required": ["message"]
}

Transformations

And the transformations look as follows:

[
  {
    "path": ["inputs"],
    "transformation_type": "remove_extra_fields"
  },
  {
    "path": ["inputs", "messages"],
    "transformation_type": "convert_to_openai_message"
  },
  {
    "path": ["inputs", "tools"],
    "transformation_type": "convert_to_openai_tool"
  },
  {
    "path": ["outputs"],
    "transformation_type": "remove_extra_fields"
  },
  {
    "path": ["outputs", "message"],
    "transformation_type": "convert_to_openai_message"
  }
]

Dataset transformations

Transformation types

Chat Model prebuilt schema

Compatibility

Enablement

Specs

Input Schema

Output Schema

Transformations

Was this page helpful?

You can leave detailed feedback on GitHub.

Transformation types​

Chat Model prebuilt schema​

Compatibility​

Enablement​

Specs​

Input Schema​

Output Schema​

Transformations​

Was this page helpful?

You can leave detailed feedback on GitHub.

Transformation types

Chat Model prebuilt schema

Compatibility

Enablement

Specs

Input Schema

Output Schema

Transformations