Skip to content

Generic Tables

Lakekeeper's Generic Table API catalogs non-Iceberg tables — Lance, CSV, Parquet, or any other format — alongside Iceberg tables in the same Warehouse. Each generic table sits in a Namespace, has a name, a format string, an optional base location, schema, statistics, properties, and a free-form doc field. Engines handle writes against the underlying format; Lakekeeper handles identity, governance, access control, and lifecycle.

Unlike Iceberg tables, Lakekeeper does not commit format-specific metadata for generic tables — readers and writers go directly to the storage location after obtaining catalog metadata and credentials. This makes the API format-agnostic: any future or experimental format works without changes to the catalog.

When to use generic tables

  • Raw landing zones — register CSV, JSON, or Parquet drops so they show up in the same catalog and inherit the same permissions as downstream Iceberg tables.
  • Lance for multimodal AI — store text, image embeddings, raw bytes, and scalar features in one table, then run vector + SQL queries via LanceDB, DuckDB, or Polars. See the example below.
  • Experimental formats — prototype new file/table formats behind the same /credentials, soft-delete, and authorization plumbing as production Iceberg tables.

Capabilities

Generic tables are first-class citizens. Most of Lakekeeper's table-side machinery applies:

Feature Generic tables Notes
Credentials vending (S3, GCS, Azure) ✅ GET /lakekeeper/v1/{prefix}/namespaces/{ns}/generic-tables/{t}/credentials
Soft-deletion + undrop ✅ Respects per-warehouse soft-delete settings
Protection flag ✅ protected: bool on load response; toggle via GET/POST /management/v1/warehouse/{wh}/generic-table/{id}/protection. Drops require force=true when set.
Rename ✅ POST /lakekeeper/v1/{prefix}/generic-tables/rename
Listing + pagination ✅ Same cursor scheme as Iceberg tables
Per-action permissions ✅ 16 distinct actions (drop, undrop, read_data, write_data, get_metadata, rename, change_ownership, grant-* relations, ...)
Case-insensitive identifiers ✅ Cross-engine name resolution applies
Name uniqueness across types ✅ A generic table cannot collide with an Iceberg table or view in the same Namespace
Stored schema / statistics fields ✅ Free-form JSON; informational only
Format-agnostic ✅ format is an opaque string
Commit coordination ❌ The catalog does not arbitrate writes — engines write directly
Schema enforcement ❌ Schema is stored, not validated against data files

Lance example

Once a warehouse and namespace exist, create a Lance table via the API:

curl -X POST "$LAKEKEEPER/lakekeeper/v1/$WAREHOUSE/namespaces/ai/generic-tables" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "image_embeddings",
    "format": "lance",
    "doc": "CLIP embeddings + thumbnails for product catalog",
    "properties": {"embedding-dim": "768"}
  }'

Then load credentials and open the dataset with LanceDB:

import requests, lance

meta = requests.get(
    f"{LAKEKEEPER}/lakekeeper/v1/{WAREHOUSE}/namespaces/ai/generic-tables/image_embeddings",
    headers={"Authorization": f"Bearer {TOKEN}"},
).json()

creds = requests.get(
    f"{LAKEKEEPER}/lakekeeper/v1/{WAREHOUSE}/namespaces/ai/generic-tables/image_embeddings/credentials",
    headers={"Authorization": f"Bearer {TOKEN}"},
).json()

ds = lance.dataset(meta["location"], storage_options=creds["storage-options"])
ds.to_table(columns=["caption", "embedding"]).filter("score > 0.8")

The same credentials path works for any format — only the reader library changes.

For a runnable end-to-end example (warehouse setup, STS credentials, create/load/drop, undrop, listing), see tests/integration-tests/lance/test_lance.py.

Authorization model

Generic tables have an OpenFGA object type (lakekeeper_generic_table) parallel to lakekeeper_table and lakekeeper_view. Permissions inherit from the parent Namespace and Warehouse, and can be granted to users or roles via the standard /management/v1/permissions/generic-table/{id} endpoints. See Authorization.

Because grants are per-action, you can — for example — give an ML platform service read_data and get_metadata on every generic table in a namespace without granting drop or change_ownership.

Limits

  • The catalog does not coordinate concurrent writes. If your format requires commit coordination, the engine or the format library is responsible.
  • Schema and statistics fields are informational. Engines that need an authoritative schema should read it from the underlying files.
  • Generic tables and Iceberg tables share the namespace's identifier space — a name conflict across types is rejected at create time.