Connect to an external ClickHouse database
ClickHouse is a high-performance, column-oriented database system. It allows for fast ingestion of data and is optimized for analytical queries.
LangSmith uses ClickHouse as the primary data store for traces and feedback. By default, self-hosted LangSmith will use an internal ClickHouse database that is bundled with the LangSmith instance. This is run as a stateful set in the same Kubernetes cluster as the LangSmith application or as a Docker container on the same host as the LangSmith application.
However, you can configure LangSmith to use an external ClickHouse database for easier management and scaling. By configuring an external ClickHouse database, you can manage backups, scaling, and other operational tasks for your database. While Clickhouse is not yet a native service in Azure, AWS, or Google Cloud, you can run LangSmith with an external ClickHouse database in the following ways:
- LangSmith-managed ClickHouse (beta)
- Provision a ClickHouse Cloud either directly or through a cloud provider marketplace:
- On a VM in your cloud provider
Using the first two options (LangSmith-managed ClickHouse or ClickHouse Cloud) will provision a Clickhouse service OUTSIDE of your VPC. However, both options support private endpoints, meaning that you can direct traffic to the ClickHouse service without exposing it to the public internet (eg via AWS PrivateLink, or GCP Private Service Connect).
Additionally, sensitive information can be configured to be not stored in Clickhouse. Please reach out to support@langchain.dev for more information.
Requirements
- A provisioned ClickHouse instance that your LangSmith application will have network access to (see above for options).
- A user with admin access to the ClickHouse database. This user will be used to create the necessary tables, indexes, and views.
- We only support standalone ClickHouse (not clustered or replicated) or ClickHouse Cloud.
- We only support ClickHouse versions >= 23.9. Use of ClickHouse versions >= 24.2 requires LangSmith v0.6 or later. See the LangSmith release notes for more information.
Parameters
You will need to provide several parameters to your LangSmith installation to configure an external ClickHouse database. These parameters include:
- Host: The hostname or IP address of the ClickHouse database
- HTTP Port: The port that the ClickHouse database listens on for HTTP connections
- Native Port: The port that the ClickHouse database listens on for native connections
- Database: The name of the ClickHouse database that LangSmith should use
- Username: The username to use to connect to the ClickHouse database
- Password: The password to use to connect to the ClickHouse database
Configuration
With these parameters in hand, you can configure your LangSmith instance to use the provisioned ClickHouse database. You can do this by modifying the config.yaml
file for your LangSmith Helm Chart installation or the .env
file for your Docker installation.
- Helm
- Docker
clickhouse:
external:
enabled: true
host: "host"
port: "http port"
nativePort: "native port"
user: "default"
password: "password"
database: "default"
tls: false
# In your .env file
CLICKHOUSE_HOST=langchain-clickhouse # Change to your Clickhouse host if using external Clickhouse. Otherwise, leave it as is
CLICKHOUSE_USER=default # Change to your Clickhouse user if needed
CLICKHOUSE_DB=default # Change to your Clickhouse database if needed
CLICKHOUSE_PORT=8123 # Change to your Clickhouse port if needed
CLICKHOUSE_TLS=false # Change to true if you are using TLS to connect to Clickhouse. Otherwise, leave it as is
CLICKHOUSE_PASSWORD=password # Change to your Clickhouse password if needed
CLICKHOUSE_NATIVE_PORT=9000 # Change to your Clickhouse native port if needed
Once configured, you should be able to reinstall your LangSmith instance. If everything is configured correctly, your LangSmith instance should now be using your external ClickHouse database.