Production Checklist¶
Lakekeeper is the heart of your data platform and needs to integrate deeply with your existing infrastructure such as IdPs. The easiest way to get Lakekeeper to production is our enterprise support. Please find more information on our commercial offerings at lakekeeper.io
Please find following some general recommendations for productive setups:
- Use an external high-available database as a catalog backend. We recommend using a managed service in your preferred Cloud or host a high available cluster on Kubernetes yourself using your preferred operator. We are using the amazing CloudNativePG internally. Make sure the Database is backed-up regularly.
- Ensure sure both
LAKEKEEPER__PG_DATABASE_URL_READ
andLAKEKEEPER__PG_DATABASE_URL_WRITE
are set for ideal load distribution. Most postgres deployments specify separate URLs for reading and writing to channel writes to the master while distributing reads across replicas. - For high-available setups, ensure that multiple Lakekeeper instances are running on different nodes. We recommend our helm chart for production deployments.
- Ensure that Authentication is enabled, typically by setting
LAKEKEEPER__OPENID_PROVIDER_URI
and / orLAKEKEEPER__ENABLE_KUBERNETES_AUTHENTICATION
. Check our Authentication Guide for more information. - If
LAKEKEEPER__OPENID_PROVIDER_URI
is set, we recommend to setLAKEKEEPER__OPENID_AUDIENCE
as well. - If Authorization is desired, follow our Authorization Guide. Ensure that OpenFGA is hosted in close proximity to Lakekeeper - ideally on the same VM or Kubernetes node. In our Helm-Chart we use
PodAffinity
to achieve this. - If the default Postgres secret backend is used, ensure that
LAKEKEEPER__PG_ENCRYPTION_KEY
is set to a long random string. - Ensure that all Warehouses use distinct storage locations / prefixes and distinct credentials that only grant access to the prefix used for a Warehouse.
- Ensure that SSL / TLS is enabled. Lakekeeper does not terminate connections natively. Please use a reverse proxy like Nginx or Envoy to secure the connection to Lakekeeper. On Kubernetes, any Ingress controller can be used. For high-availability, failover should be handled by the reverse proxy. Lakekeeper exposes a
/health
endpoint that can be used to determine its current status. If you are using our helm-chart, probes are already built-in.