Skip to main content

Troubleshooting

This guide will walk you through common issues you may encounter when running a self-hosted instance of LangSmith.

While running LangSmith, you may encounter unexpected 500 errors, slow performance, or other issues. This guide will help you diagnose and resolve these issues.

The first step in troubleshooting is to check the logs of the various services that make up LangSmith.

If running on Kubernetes, you can check the logs of your deployment by running the following command (replace langsmith with the name of your deployment if you deployed with a different name):

kubectl logs -l app.kubernetes.io/instance=langsmith --prefix -n <namespace> >> logs.txt

If running on Docker, you can check the logs your deployment by running the following command:

docker compose logs >> logs.txt

Generally, the main services you will want to analyze are:

  • langsmith-backend: The main backend service.
  • langsmith-queue: The queue service.

Common issues

DB::Exception: Cannot reserve 1.00 MiB, not enough space: While executing WaitForAsyncInsert. (NOT_ENOUGH_SPACE)

This error occurs when ClickHouse runs out of disk space. You will need to increase the disk space available to ClickHouse.

Kubernetes

In Kubernetes, you will need to increase the size of the ClickHouse PVC. To achieve this, you can perform the following steps:

  1. Get the storage class of the PVC: kubectl get pvc data-langsmith-clickhouse-0 -n <namespace> -o jsonpath='{.spec.storageClassName}'
  2. Ensure the storage class has AllowVolumeExpansion: true: kubectl get sc <storage-class-name> -o jsonpath='{.allowVolumeExpansion}'
    • If it is false, some storage classes can be updated to allow volume expansion.
    • To update the storage class, you can run kubectl patch sc <storage-class-name> -p '{"allowVolumeExpansion": true}'
    • If this fails, you may need to create a new storage class with the correct settings.
  3. Edit your pvc to have the new size: kubectl edit pvc data-langsmith-clickhouse-0 -n <namespace> or kubectl patch pvc data-langsmith-clickhouse-0 '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}' -n <namespace>
  4. Update your helm chart langsmith_config.yaml to new size(e.g 100 Gi)
  5. Delete the clickhouse statefulset kubectl delete statefulset langsmith-clickhouse --cascade=orphan -n <namespace>
  6. Apply helm chart with updated size (You can follow the upgrade guide here)
  7. Your pvc should now have the new size. Verify by running kubectl get pvc and kubectl exec langsmith-clickhouse-0 -- bash -c "df"

Docker

In Docker, you will need to increase the size of the ClickHouse volume. To achieve this, you can perform the following steps:

  1. Stop your instance of LangSmith. docker compose down
  2. If using bind mount, you will need to increase the size of the mount point.
  3. If using a docker volume, you will need to allocate more space to the volume/docker.

error: Dirty database version 'version'. Fix and force version

This error occurs when the ClickHouse database is in an inconsistent state with our migrations. You will need to reset to an earlier database version and then rerun your upgrade/migrations.

Kubernetes

  1. Force migration to an earlier version, where version = dirty version - 1.
kubectl exec -it deployments/langsmith-backend -- bash -c 'migrate -source "file://clickhouse/migrations" -database "clickhouse://$CLICKHOUSE_HOST:$CLICKHOUSE_NATIVE_PORT?username=$CLICKHOUSE_USER&password=$CLICKHOUSE_PASSWORD&database=$CLICKHOUSE_DB&x-multi-statement=true&x-migrations-table-engine=MergeTree&secure=$CLICKHOUSE_TLS" force <version>'
  1. Rerun your upgrade/migrations.

Docker

  1. Force migration to an earlier version, where version = dirty version - 1.
docker compose exec langchain-backend migrate -source "file://clickhouse/migrations" -database "clickhouse://$CLICKHOUSE_HOST:$CLICKHOUSE_NATIVE_PORT?username=$CLICKHOUSE_USER&password=$CLICKHOUSE_PASSWORD&database=$CLICKHOUSE_DB&x-multi-statement=true&x-migrations-table-engine=MergeTree&secure=$CLICKHOUSE_TLS" force <version>
  1. Rerun your upgrade/migrations.

Was this page helpful?


You can leave detailed feedback on GitHub.