Open Policy Agent (OPA)¶
Lakekeeper's Open Policy Agent bridge enables compute engines that support fine-grained access control via Open Policy Agent (OPA) as authorization engine to respect privileges in Lakekeeper. We have also prepared a self-contained Docker Compose Example to get started quickly.
Let's imagine we have a trusted multi-user query engine such as trino, in addition to single-user query engines like pyiceberg or daft in Jupyter Notebooks. Managing permissions in trino independently of the other tools is not an option, as we do not want to duplicate permissions across query engines. Our multi-user query engine has two options:
- Catalog enforces permissions: The engine contacts the Catalog on behalf of the user. To achieve this, the engine must be able to impersonate the user for the catalog application. In OAuth2 settings, this can be accomplished through downscoping tokens or other forms of Token Exchange.
- Compute enforces permissions: After contacting the catalog with a god-like "I can do everything!" user (e.g.
project_admin), the query engine then contacts the permission system, retrieves, and enforces those permissions. Note that this requires the engine to run in a trusted environment, as whoever has root access to the engine also has access to the god-like credential.
The Lakekeeper OPA Bridge enables solution 2, by exposing all permissions in Lakekeeper via OPA. The Bridge itself is a collection of OPA files in the authz/opa-bridge folder of the Lakekeeper GitHub repository.
The bridge also comes with a translation layer for trino to translate trino to Lakekeeper permissions and thus serve trinos OPA queries. Currently trino is the only iceberg query engine we are aware of that is flexible enough to honor external permissions via OPA. Please let us know if you are aware of other engines, so that we can add support.
Configuration¶
Lakekeeper's OPA bridge needs to access the permissions API of Lakekeeper. As such, we need a technical user for OPA (Client ID, Client Secret) that OPA can use to authenticate to Lakekeeper. Please check the Authentication guide for more information on how to create technical users. We recommend to use the same user for creating the catalog in trino to ensure same access. In most scenarios, this user should have the project_admin role.
The plugin can be customized by either editing the configuration.rego file or by setting environment variables. By editing the configuration.rego files you can also easily connect multiple lakekeeper instances to the same trino instance. Please find all available configuration options explained in the file.
Lakekeeper Connection¶
If configuration is done via environment variables, the following settings are available:
| Variable | Example | Description |
|---|---|---|
LAKEKEEPER_URL |
https://lakekeeper.example.com |
URL where lakekeeper is externally reachable. Default: https://localhost:8181 |
LAKEKEEPER_TOKEN_ENDPOINT |
http://keycloak:8080/realms/iceberg/protocol/openid-connect/token |
Token endpoint of the IdP used to secure Lakekeeper. This endpoint is used to exchange OPAs client credentials for an access token. |
LAKEKEEPER_CLIENT_ID |
trino |
Client ID used by OPA to access Lakekeeper's permissions API. |
LAKEKEEPER_CLIENT_SECRET |
abcd |
Client Secret for the Client ID. |
LAKEKEEPER_SCOPE |
lakekeeper |
Scopes to request from the IdP. Defaults to lakekeeper. Please check the Authentication Guide for setup. |
LAKEKEEPER_MAX_BATCH_CHECK_SIZE |
1000 |
Maximum number of checks per batch-check HTTP request. Larger values mean fewer HTTP calls but more load on the authorization backend. Default: 1000 |
Catalog Mapping¶
All above mentioned configuration options refer to a specific Lakekeeper instance. What is missing is a mapping of trino catalogs to Lakekeeper warehouses. By default we support 4 catalogs in trino, but more can easily be added in the configuration.rego.
| Variable | Example | Description |
|---|---|---|
TRINO_DEV_CATALOG_NAME |
dev |
Name of the development catalog in trino. Default: dev |
LAKEKEEPER_DEV_WAREHOUSE |
development |
Name of the development warehouse in lakekeeper that corresponds to the TRINO_DEV_CATALOG_NAME catalog in trino. Default: development |
TRINO_PROD_CATALOG_NAME |
prod |
Name of the production catalog in trino. Default: prod |
LAKEKEEPER_PROD_WAREHOUSE |
production |
Name of the production warehouse in lakekeeper that corresponds to the TRINO_PROD_CATALOG_NAME catalog in trino. Default: production |
TRINO_DEMO_CATALOG_NAME |
demo |
Name of the demo catalog in trino. Default: demo |
LAKEKEEPER_DEMO_WAREHOUSE |
demo |
Name of the demo warehouse in lakekeeper that corresponds to the TRINO_DEMO_CATALOG_NAME catalog in trino. Default: demo |
TRINO_LAKEKEEPER_CATALOG_NAME |
lakekeeper |
Name of the lakekeeper catalog in trino. Default: lakekeeper |
LAKEKEEPER_LAKEKEEPER_WAREHOUSE |
lakekeeper |
Name of the lakekeeper warehouse in lakekeeper that corresponds to the TRINO_LAKEKEEPER_CATALOG_NAME catalog in trino. Default: lakekeeper |
Unmanaged Catalogs¶
| Variable | Example | Description |
|---|---|---|
TRINO_ALLOW_UNMANAGED_CATALOGS |
true |
Blanket-allow access to all catalogs not listed in the trino_catalog array. When trino has multiple authorizers configured, ALL authorizers must allow an action for it to succeed. If trino uses catalogs managed by other authorizers (e.g. a connected PostgreSQL catalog), set this to true so the OPA bridge does not block access to those catalogs. Default: false. For fine-grained control over unmanaged catalogs, use the allow_unmanaged extension point instead (see below). |
Admin Users¶
Admin users get full access to Trino system schemas and tables across all catalogs (including system.metadata, system.runtime, etc.) and can view queries owned by any user (FilterViewQueryOwnedBy, ViewQueryOwnedBy). Non-admin users can only view their own queries. Note that this only affects Trino-level authorization — access to data in Lakekeeper-managed catalogs is still governed by Lakekeeper's own authorization.
| Variable | Example | Description |
|---|---|---|
TRINO_ADMIN_USERS |
user-id-1,user-id-2 |
Comma-separated list of Trino user IDs (typically OIDC subject identifiers) that receive admin access. Default: empty (no admins). |
Admin users can also be configured directly in the trino_admin_users list in configuration.rego.
Trino Configuration¶
When OPA is running and configured, set the following configurations for trino in access-control.properties:
access-control.name=opa
opa.policy.uri=http://<URL where OPA is reachable>/v1/data/trino/allow
opa.log-requests=true
opa.log-responses=true
opa.policy.batched-uri=http://<URL where OPA is reachable>/v1/data/trino/batch
System Schema Handling¶
The OPA bridge distinguishes between user-created schemas (namespaces) and system schemas. User schemas are authorized via Lakekeeper's permission system, while system schemas are handled locally by the bridge.
Trino system Catalog¶
The following schemas in the trino system catalog are accessible to all authenticated users:
| Schema | Allowed Tables | Description |
|---|---|---|
jdbc |
all | Required by JDBC clients for metadata discovery. |
information_schema |
columns, schemata, tables, views |
Standard SQL metadata tables. |
metadata |
analyze_properties, catalogs, column_properties, materialized_views, schema_properties, table_comments, table_properties |
Catalog metadata. Tables like *_authorization are excluded for non-admins. |
runtime |
queries |
Query monitoring. Non-admins can only see their own queries. |
Admin users have unrestricted access to all tables in all system schemas.
Lakekeeper Catalog System Schemas¶
Within Lakekeeper-managed catalogs, the following schemas are treated as system schemas and require only catalog-level (get_config) access instead of namespace-level permissions:
| Schema | Allowed Tables | Description |
|---|---|---|
information_schema |
columns, schemata, tables, views |
Standard SQL metadata. |
schema_discovery |
discovery, shallow_discovery |
Schema discovery for UI tools. |
system |
iceberg_tables |
Iceberg table metadata. |
User-created schemas are authorized through Lakekeeper's permission system as usual.
Extension Points¶
The OPA bridge provides two extension points for adding custom authorization rules without modifying the built-in policies. Both default to false and can be extended by creating .rego files in the policies/trino/ directory.
allow_managed — Managed Catalog Extensions¶
Use allow_managed for additional rules on catalogs listed in trino_catalog. These rules run alongside Lakekeeper's permission checks (OR logic). For example, granting UDF management permissions on a Lakekeeper-managed catalog.
To use it, create policies/trino/allow_managed.rego in the trino package and define allow_managed rules for the Trino operations you need. These rules bypass Lakekeeper's permission system, so review them carefully.
allow_unmanaged — Unmanaged Catalog Extensions¶
Use allow_unmanaged for catalogs not listed in trino_catalog (e.g., external database catalogs like PostgreSQL or Exasol). These rules are evaluated on a fast path that never triggers Lakekeeper HTTP calls, including in batch operations.
For a simple blanket allow, set TRINO_ALLOW_UNMANAGED_CATALOGS=true (see above). For fine-grained control, create policies/trino/allow_unmanaged.rego in the trino package. For example, to grant read-only access to a specific external catalog for certain users:
package trino
allow_unmanaged if {
input.action.operation in ["AccessCatalog", "FilterCatalogs", "ShowSchemas",
"FilterSchemas", "SelectFromColumns", "FilterTables", "FilterColumns"]
input.action.resource.catalog.name == "my_external_catalog"
}
Batch Optimization¶
For Trino filter operations (FilterTables, FilterColumns, SelectFromColumns, FilterSchemas), the OPA bridge optimizes authorization checks on Lakekeeper-managed catalogs by batching resource checks into Lakekeeper batch-check HTTP requests, instead of making one HTTP call per resource.
For table/column/select operations, each resource generates two checks (table + view) since Trino does not distinguish between tables and views in filter requests. A resource is allowed if either check passes. Schema filter operations generate one namespace check per resource.
When the number of checks exceeds the configured batch size, the bridge automatically splits them into multiple batch-check requests (chunking) and concatenates the results. The batch size can be configured per Lakekeeper instance via the LAKEKEEPER_MAX_BATCH_CHECK_SIZE environment variable or the max_batch_check_size field in configuration.rego (default: 1000, matching Lakekeeper's server limit). When using OpenFGA as the authorization backend, this value should be tuned to match the OpenFGA and Lakekeeper batch check settings to avoid overloading the authorization backend.
System schemas (information_schema, schema_discovery, system) and metadata tables (e.g., foo$snapshots) within managed catalogs are excluded from this batch optimization: they are not included in the batch-check request and are instead evaluated per-resource by the OPA policies. These per-resource evaluations may still trigger Lakekeeper calls (for example, catalog-level get_config checks), but they do not participate in the batched table/schema authorization request.
This optimization is transparent — it produces the same results as per-resource evaluation but with significantly fewer HTTP round-trips to Lakekeeper for non-system resources.
Context Forwarding¶
The OPA bridge forwards resource names to Lakekeeper's batch-check API for create actions. This enables Lakekeeper's authorizer (e.g. Cedar) to make authorization decisions based on the name of the resource being created:
| Trino Operation | Lakekeeper Action | Name Forwarded |
|---|---|---|
CreateSchema (top-level) |
create_namespace (warehouse) |
Schema name |
CreateSchema (nested) |
create_namespace (parent namespace) |
Child schema name |
CreateTable |
create_table (namespace) |
Table name |
CreateView / CreateMaterializedView |
create_view (namespace) |
View name |
Properties specified during creation (e.g. WITH (format='PARQUET')) are also forwarded.
A full self-contained example is available on GitHub.