Configuration¶
Lakekeeper is configured via environment variables. Settings listed in this page are shared between all projects and warehouses. Previous to Lakekeeper Version 0.5.0
please prefix all environment variables with ICEBERG_REST__
instead of LAKEKEEPER__
.
For most deployments, we recommend to set at least the following variables: LAKEKEEPER__PG_DATABASE_URL_READ
, LAKEKEEPER__PG_DATABASE_URL_WRITE
, LAKEKEEPER__PG_ENCRYPTION_KEY
.
Routing and Base-URL¶
Some Lakekeeper endpoints return links pointing at Lakekeeper itself. By default, these links are generated using the x-forwarded-for
, x-forwarded-proto
and x-forwarded-port
headers, if these are not present, the host
header is used. If these heuristics are not working for you, you may set the LAKEKEEPER_BASE_URI
environment variable to the base-URL where Lakekeeper is externally reachable. This may be necessary if Lakekeeper runs behind a reverse proxy or load balancer, and you cannot set the headers accordingly. In general, we recommend relying on the headers.
General¶
Variable | Example | Description |
---|---|---|
LAKEKEEPER__BASE_URI |
https://example.com:8181 |
Optional base-URL where the catalog is externally reachable. Default: None . See Routing and Base-URL. |
LAKEKEEPER__ENABLE_DEFAULT_PROJECT |
true |
If true , the NIL Project ID ("00000000-0000-0000-0000-000000000000") is used as a default if the user does not specify a project when connecting. This option is enabled by default, which we recommend for all single-project (single-tenant) setups. Default: true . |
LAKEKEEPER__RESERVED_NAMESPACES |
system,examples,information_schema |
Reserved Namespaces that cannot be created via the REST interface |
LAKEKEEPER__METRICS_PORT |
9000 |
Port where the Prometheus metrics endpoint is reachable. Default: 9000 |
LAKEKEEPER__LISTEN_PORT |
8181 |
Port Lakekeeper listens on. Default: 8181 |
LAKEKEEPER__BIND_IP |
0.0.0.0 , ::1 , :: |
IP Address Lakekeeper binds to. Default: 0.0.0.0 (listen to all incoming IPv4 packages) |
LAKEKEEPER__SECRET_BACKEND |
postgres |
The secret backend to use. If kv2 (Hashicorp KV Version 2) is chosen, you need to provide additional parameters Default: postgres , one-of: [postgres , kv2 ] |
LAKEKEEPER__ALLOW_ORIGIN |
* |
A comma separated list of allowed origins for CORS. |
Storage¶
Variable | Example | Description |
---|---|---|
LAKEKEEPER__ENABLE_AWS_SYSTEM_CREDENTIALS |
true |
Lakekeeper supports using AWS system identities (i.e. through AWS_* environment variables or EC2 instance profiles) as storage credentials for warehouses. This feature is disabled by default to prevent accidental access to restricted storage locations. To enable AWS system identities, set LAKEKEEPER__ENABLE_AWS_SYSTEM_CREDENTIALS to true . Default: false (AWS system credentials disabled) |
LAKEKEEPER__S3_ENABLE_DIRECT_SYSTEM_CREDENTIALS |
true |
By default, when using AWS system credentials, users must specify an assume-role-arn for Lakekeeper to assume when accessing S3. Setting this option to true allows Lakekeeper to use system credentials directly without role assumption, meaning the system identity must have direct access to warehouse locations. Default: false (direct system credential access disabled) |
LAKEKEEPER__S3_REQUIRE_EXTERNAL_ID_FOR_SYSTEM_CREDENTIALS |
true |
Controls whether an external-id is required when assuming a role with AWS system credentials. External IDs provide additional security when cross-account role assumption is used. Default: true (external ID required) |
LAKEKEEPER__ENABLE_AZURE_SYSTEM_CREDENTIALS |
true |
Lakekeeper supports using Azure system identities (i.e. through AZURE_* environment variables or VM managed identities) as storage credentials for warehouses. This feature is disabled by default to prevent accidental access to restricted storage locations. To enable Azure system identities, set LAKEKEEPER__ENABLE_AZURE_SYSTEM_CREDENTIALS to true . Default: false (Azure system credentials disabled) |
LAKEKEEPER__ENABLE_GCP_SYSTEM_CREDENTIALS |
true |
Lakekeeper supports using GCP system identities (i.e. through GOOGLE_APPLICATION_CREDENTIALS environment variables or the Compute Engine Metadata Server) as storage credentials for warehouses. This feature is disabled by default to prevent accidental access to restricted storage locations. To enable GCP system identities, set LAKEKEEPER__ENABLE_GCP_SYSTEM_CREDENTIALS to true . Default: false (GCP system credentials disabled) |
Persistence Store¶
Currently Lakekeeper supports only Postgres as a persistence store. You may either provide connection strings using PG_DATABASE_URL_*
or use the PG_*
environment variables. Connection strings take precedence:
Variable | Example | Description |
---|---|---|
LAKEKEEPER__PG_DATABASE_URL_READ |
postgres://postgres:password@localhost:5432/iceberg |
Postgres Database connection string used for reading. Defaults to LAKEKEEPER__PG_DATABASE_URL_WRITE . |
LAKEKEEPER__PG_DATABASE_URL_WRITE |
postgres://postgres:password@localhost:5432/iceberg |
Postgres Database connection string used for writing. |
LAKEKEEPER__PG_ENCRYPTION_KEY |
This is unsafe, please set a proper key |
If LAKEKEEPER__SECRET_BACKEND=postgres , this key is used to encrypt secrets. It is required to change this for production deployments. |
LAKEKEEPER__PG_READ_POOL_CONNECTIONS |
10 |
Number of connections in the read pool |
LAKEKEEPER__PG_WRITE_POOL_CONNECTIONS |
5 |
Number of connections in the write pool |
LAKEKEEPER__PG_HOST_R |
localhost |
Hostname for read operations. Defaults to LAKEKEEPER__PG_HOST_W . |
LAKEKEEPER__PG_HOST_W |
localhost |
Hostname for write operations |
LAKEKEEPER__PG_PORT |
5432 |
Port number |
LAKEKEEPER__PG_USER |
postgres |
Username for authentication |
LAKEKEEPER__PG_PASSWORD |
password |
Password for authentication |
LAKEKEEPER__PG_DATABASE |
iceberg |
Database name |
LAKEKEEPER__PG_SSL_MODE |
require |
SSL mode (disable, allow, prefer, require) |
LAKEKEEPER__PG_SSL_ROOT_CERT |
/path/to/root/cert |
Path to SSL root certificate |
LAKEKEEPER__PG_ENABLE_STATEMENT_LOGGING |
true |
Enable SQL statement logging |
LAKEKEEPER__PG_TEST_BEFORE_ACQUIRE |
true |
Test connections before acquiring from the pool |
LAKEKEEPER__PG_CONNECTION_MAX_LIFETIME |
1800 |
Maximum lifetime of connections in seconds |
Vault KV Version 2¶
Configuration parameters if a Vault KV version 2 (i.e. Hashicorp Vault) compatible storage is used as a backend. Currently, we only support the userpass
authentication method. Configuration may be passed as single values like LAKEKEEPER__KV2__URL=http://vault.local
or as a compound value:
LAKEKEEPER__KV2='{url="http://localhost:1234", user="test", password="test", secret_mount="secret"}'
Variable | Example | Description |
---|---|---|
LAKEKEEPER__KV2__URL |
https://vault.local |
URL of the KV2 backend |
LAKEKEEPER__KV2__USER |
admin |
Username to authenticate against the KV2 backend |
LAKEKEEPER__KV2__PASSWORD |
password |
Password to authenticate against the KV2 backend |
LAKEKEEPER__KV2__SECRET_MOUNT |
kv/data/iceberg |
Path to the secret mount in the KV2 backend |
Task Queues¶
Lakekeeper uses task queues internally to remove soft-deleted tabulars and purge tabular files. The following global configuration options are available:
Variable | Example | Description |
---|---|---|
LAKEKEEPER__QUEUE_CONFIG__MAX_RETRIES |
5 | Number of retries before a task is considered failed Default: 5 |
LAKEKEEPER__QUEUE_CONFIG__MAX_AGE |
3600 | Amount of seconds before a task is considered stale and could be picked up by another worker. Default: 3600 |
LAKEKEEPER__QUEUE_CONFIG__POLL_INTERVAL |
3600ms/30s | Interval between polling for new tasks. Default: 10s. Supported units: ms (milliseconds) and s (seconds), leaving the unit out is deprecated, it'll default to seconds but is due to be removed in a future release. |
LAKEKEEPER__QUEUE_CONFIG__NUM_WORKERS |
2 | Number of workers launched for each queue. Default: 2 |
Nats¶
Lakekeeper can publish change events to Nats (Kafka is coming soon). The following configuration options are available:
Variable | Example | Description |
---|---|---|
LAKEKEEPER__NATS_ADDRESS |
nats://localhost:4222 |
The URL of the NATS server to connect to |
LAKEKEEPER__NATS_TOPIC |
iceberg |
The subject to publish events to |
LAKEKEEPER__NATS_USER |
test-user |
User to authenticate against nats, needs LAKEKEEPER__NATS_PASSWORD |
LAKEKEEPER__NATS_PASSWORD |
test-password |
Password to authenticate against nats, needs LAKEKEEPER__NATS_USER |
LAKEKEEPER__NATS_CREDS_FILE |
/path/to/file.creds |
Path to a file containing nats credentials |
LAKEKEEPER__NATS_TOKEN |
xyz |
Nats token to use for authentication |
Kafka¶
Lakekeeper uses rust-rdkafka to enable publishing events to Kafka.
The following features of rust-rdkafka are enabled:
- tokio
- ztstd
- gssapi-vendored
- curl-static
- ssl-vendored
- libz-static
This means that all features of librdkafka are usable. All necessary dependencies are statically linked and cannot be disabled. If you want to use dynamic linking or disable a feature, you'll have to fork Lakekeeper and change the features accordingly. Please refer to the documentation of rust-rdkafka for details on how to enable dynamic linking or disable certain features.
To publish events to Kafka, set the following environment variables:
Variable | Example | Description |
---|---|---|
LAKEKEEPER__KAFKA_TOPIC |
lakekeeper |
The topic to which events are published |
LAKEKEEPER__KAFKA_CONFIG |
{"bootstrap.servers"="host1:port,host2:port","security.protocol"="SSL"} |
librdkafka Configuration as "Dictionary". Note that you cannot use "JSON-Style-Syntax". Also see notes below |
LAKEKEEPER__KAFKA_CONFIG_FILE |
/path/to/config_file |
librdkafka Configuration to be loaded from a file. Also see notes below |
Notes¶
LAKEKEEPER__KAFKA_CONFIG
and LAKEKEEPER__KAFKA_CONFIG_FILE
are mutually exclusive and the values are not merged, if both variables are set. In case that both are set, LAKEKEEPER__KAFKA_CONFIG
is used.
A LAKEKEEPER__KAFKA_CONFIG_FILE
could look like this:
{
"bootstrap.servers"="host1:port,host2:port",
"security.protocol"="SASL_SSL",
"sasl.mechanisms"="PLAIN",
}
Checking configuration parameters is deferred to rdkafka
Logging Cloudevents¶
Cloudevents can also be logged, if you do not have Nats up and running. This feature can be enabled by setting Cloudevents can also be logged, if you do not have Nats or Kafka up and running. This feature can be enabled by setting
LAKEKEEPER__LOG_CLOUDEVENTS=true
Authentication¶
To prohibit unwanted access to data, we recommend to enable Authentication.
Authentication is enabled if:
LAKEKEEPER__OPENID_PROVIDER_URI
is set ORLAKEKEEPER__ENABLE_KUBERNETES_AUTHENTICATION
is set to true
In Lakekeeper multiple Authentication mechanisms can be enabled together, for example OpenID + Kubernetes. Lakekeeper builds an internal Authenticator chain of up to three identity providers. Incoming tokens need to be JWT tokens - Opaque tokens are not yet supported. Incoming tokens are introspected, and each Authentication provider checks if the given token can be handled by this provider. If it can be handled, the token is authenticated against this provider, otherwise the next Authenticator in the chain is checked.
The following Authenticators are available. Enabled Authenticators are checked in order:
- OpenID / OAuth2
Enabled if:LAKEKEEPER__OPENID_PROVIDER_URI
is set
Validates Token with: Locally with JWKS Keys fetched from the well-known configuration.
Accepts JWT if (both must be true):- Issuer matches the issuer provided in the
.well-known/openid-configuration
of theLAKEKEEPER__OPENID_PROVIDER_URI
OR issuer matches any of theLAKEKEEPER__OPENID_ADDITIONAL_ISSUERS
. - If
LAKEKEEPER__OPENID_AUDIENCE
is specified, any of the configured audiences must be present in the token
- Issuer matches the issuer provided in the
- Kubernetes
Enabled if:LAKEKEEPER__ENABLE_KUBERNETES_AUTHENTICATION
is true
Validates Token with: KubernetesTokenReview
API Accepts JWT if:- Token audience matches any of the audiences provided in
LAKEKEEPER__KUBERNETES_AUTHENTICATION_AUDIENCE
- If
LAKEKEEPER__KUBERNETES_AUTHENTICATION_AUDIENCE
is not set, all tokens proceed to validation! We highly recommend to configure audiences, for most deploymentshttps://kubernetes.default.svc
works.
- Token audience matches any of the audiences provided in
- Kubernetes Legacy Tokens
Enabled if:LAKEKEEPER__ENABLE_KUBERNETES_AUTHENTICATION
is true andLAKEKEEPER__KUBERNETES_AUTHENTICATION_ACCEPT_LEGACY_SERVICEACCOUNT
is true
Validates Token with: KubernetesTokenReview
API
Accepts JWT if:- Tokens issuer is
kubernetes/serviceaccount
- Tokens issuer is
If LAKEKEEPER__OPENID_PROVIDER_URI
is specified, Lakekeeper will verify access tokens against this provider. The provider must provide the .well-known/openid-configuration
endpoint and the openid-configuration needs to have jwks_uri
and issuer
defined.
Typical values for LAKEKEEPER__OPENID_PROVIDER_URI
are:
- Keycloak:
https://keycloak.local/realms/{your-realm}
- Entra-ID:
https://login.microsoftonline.com/{your-tenant-id-here}/v2.0/
Please check the Authentication Guide for more details.
Variable | Example | Description |
---|---|---|
LAKEKEEPER__OPENID_PROVIDER_URI |
https://keycloak.local/realms/{your-realm} |
OpenID Provider URL. |
LAKEKEEPER__OPENID_AUDIENCE |
the-client-id-of-my-app |
If set, the aud of the provided token must match the value provided. Multiple allowed audiences can be provided as a comma separated list. |
LAKEKEEPER__OPENID_ADDITIONAL_ISSUERS |
https://sts.windows.net/<Tenant>/ |
A comma separated list of additional issuers to trust. The issuer defined in the issuer field of the .well-known/openid-configuration is always trusted. LAKEKEEPER__OPENID_ADDITIONAL_ISSUERS has no effect if LAKEKEEPER__OPENID_PROVIDER_URI is not set. |
LAKEKEEPER__OPENID_SCOPE |
lakekeeper |
Specify a scope that must be present in provided tokens received from the openid provider. |
LAKEKEEPER__OPENID_SUBJECT_CLAIM |
sub or oid |
Specify the field in the user's claims that is used to identify a User. By default Lakekeeper uses the oid field if present, otherwise the sub field is used. We strongly recommend setting this configuration explicitly in production deployments. Entra-ID users want to use the oid claim, users from all other IdPs most likely want to use the sub claim. |
LAKEKEEPER__ENABLE_KUBERNETES_AUTHENTICATION |
true | If true, kubernetes service accounts can authenticate to Lakekeeper. This option is compatible with LAKEKEEPER__OPENID_PROVIDER_URI - multiple IdPs (OIDC and Kubernetes) can be enabled simultaneously. |
LAKEKEEPER__KUBERNETES_AUTHENTICATION_AUDIENCE |
https://kubernetes.default.svc |
Audiences that are expected in Kubernetes tokens. Only has an effect if LAKEKEEPER__ENABLE_KUBERNETES_AUTHENTICATION is true. |
LAKEKEEPER_TEST__KUBERNETES_AUTHENTICATION_ACCEPT_LEGACY_SERVICEACCOUNT |
false |
Add an authenticator that handles tokens with no audiences and the issuer set to kubernetes/serviceaccount . Only has an effect if LAKEKEEPER__ENABLE_KUBERNETES_AUTHENTICATION is true. |
Authorization¶
Authorization is only effective if Authentication is enabled. Authorization must not be enabled after Lakekeeper has been bootstrapped! Please create a new Lakekeeper instance, bootstrap it with authorization enabled, and migrate your tables.
Variable | Example | Description |
---|---|---|
LAKEKEEPER__AUTHZ_BACKEND |
allowall |
The authorization backend to use. If openfga is chosen, you need to provide additional parameters. The allowall backend disables authorization - authenticated users can access all endpoints. Default: allowall , one-of: [openfga , allowall ] |
LAKEKEEPER__OPENFGA__ENDPOINT |
http://localhost:35081 |
OpenFGA Endpoint (gRPC). |
LAKEKEEPER__OPENFGA__STORE_NAME |
lakekeeper |
The OpenFGA Store to use. Default: lakekeeper |
LAKEKEEPER__OPENFGA__API_KEY |
my-api-key |
The API Key used for Pre-shared key authentication to OpenFGA. If LAKEKEEPER__OPENFGA__CLIENT_ID is set, the API Key is ignored. If neither API Key nor Client ID is specified, no authentication is used. |
LAKEKEEPER__OPENFGA__CLIENT_ID |
12345 |
The Client ID to use for Authenticating if OpenFGA is secured via OIDC. |
LAKEKEEPER__OPENFGA__CLIENT_SECRET |
abcd |
Client Secret for the Client ID. |
LAKEKEEPER__OPENFGA__TOKEN_ENDPOINT |
https://keycloak.example.com/realms/master/protocol/openid-connect/token |
Token Endpoint to use when exchanging client credentials for an access token for OpenFGA. Required if Client ID is set |
LAKEKEEPER__OPENFGA__SCOPE |
openfga |
Additional scopes to request in the Client Credential flow. |
LAKEKEEPER__OPENFGA__AUTHORIZATION_MODEL_PREFIX |
collaboration |
Explicitly set the Authorization model prefix. Defaults to collaboration if not set. We recommend to use this setting only in combination with LAKEKEEPER__OPENFGA__AUTHORIZATION_MODEL_PREFIX . |
LAKEKEEPER__OPENFGA__AUTHORIZATION_MODEL_VERSION |
3.1 |
Version of the model to use. If specified, the specified model version must already exist. This can be used to roll-back to previously applied model versions or to connect to externally managed models. Migration is disabled if the model version is set. Version should have the format |
UI¶
When using the built-in UI which is hosted as part of the Lakekeeper binary, most values are pre-set with the corresponding values of Lakekeeper itself. Customization is typically required if Authentication is enabled. Please check the Authentication guide for more information.
Variable | Example | Description |
---|---|---|
LAKEKEEPER__UI__OPENID_PROVIDER_URI |
https://keycloak.local/realms/{your-realm} |
OpenID provider URI used for login in the UI. Defaults to LAKEKEEPER__OPENID_PROVIDER_URI . Set this only if the IdP is reachable under a different URI from the users browser and lakekeeper. |
LAKEKEEPER__UI__OPENID_CLIENT_ID |
lakekeeper-ui |
Client ID to use for the Authorization Code Flow of the UI. Required if Authentication is enabled. Defaults to lakekeeper |
LAKEKEEPER__UI__OPENID_REDIRECT_PATH |
/callback |
Path where the UI receives the callback including the tokens from the users browser. Defaults to: /callback |
LAKEKEEPER__UI__OPENID_SCOPE |
openid email |
Scopes to request from the IdP. Defaults to openid profile email . |
LAKEKEEPER__UI__OPENID_RESOURCE |
lakekeeper-api |
Resources to request from the IdP. If not specified, the resource field is omitted (default). |
LAKEKEEPER__UI__OPENID_POST_LOGOUT_REDIRECT_PATH |
/logout |
Path the UI calls when users are logged out from the IdP. Defaults to /logout |
LAKEKEEPER__UI__LAKEKEEPER_URL |
https://example.com/lakekeeper |
URI where the users browser can reach Lakekeeper. Defaults to the value of LAKEKEEPER__BASE_URI . |
Endpoint Statistics¶
Lakekeeper collects statistics about the usage of its endpoints. Every Lakekeeper instance accumulates endpoint calls for a certain duration in memory before writing them into the database. The following configuration options are available:
Variable | Example | Description |
---|---|---|
LAKEKEEPER__ENDPOINT_STAT_FLUSH_INTERVAL |
30s | Interval in seconds to write endpoint statistics into the database. Default: 30s, valid units are (s|ms) |
SSL Dependencies¶
You may be running Lakekeeper in your own environment which uses self-signed certificates for e.g. Minio. Lakekeeper is built with reqwest's rustls-tls-native-roots
feature activated, this means SSL_CERT_FILE
and SSL_CERT_DIR
environment variables are respected. If both are not set, the system's default CA store is used. If you want to use a custom CA store, set SSL_CERT_FILE
to the path of the CA file or SSL_CERT_DIR
to the path of the CA directory. The certificate used by the server cannot be a CA. It needs to be an end entity certificate, else you may run into CaUsedAsEndEntity
errors.