Skip to content

[SPARK-57518][SQL] Make ThriftServer JDBC metadata operations DataSource V2 catalog-aware#56627

Open
yadavay-amzn wants to merge 1 commit into
apache:masterfrom
yadavay-amzn:fix/SPARK-57518-thriftserver-dsv2-metadata
Open

[SPARK-57518][SQL] Make ThriftServer JDBC metadata operations DataSource V2 catalog-aware#56627
yadavay-amzn wants to merge 1 commit into
apache:masterfrom
yadavay-amzn:fix/SPARK-57518-thriftserver-dsv2-metadata

Conversation

@yadavay-amzn

@yadavay-amzn yadavay-amzn commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Route ThriftServer JDBC metadata operations (getCatalogs, getSchemas, getTables, getColumns) through CatalogManager so they honor DataSource V2 catalogs and the default catalog. Populate TABLE_CAT with the real catalog name. Introduce a new conf spark.sql.thriftServer.catalogMetadata.enabled (default true) with legacy fallback when disabled.

Why are the changes needed?

With spark.sql.catalog.* and spark.sql.defaultCatalog set, JDBC/BI clients get inconsistent metadata because the metadata operations used the V1 SessionCatalog directly, ignoring any configured DSv2 catalogs.

Design notes

(a) A null catalogName resolves to the CURRENT catalog (consistent with Spark's own unspecified-to-current resolution and with Trino/Snowflake behavior), not all catalogs.

(b) getCatalogs returns CatalogManager.listCatalogs() (including spark_catalog, sorted alphabetically).

(c) TABLE_CAT was previously empty or null -- now populated with the actual catalog name. The conf defaults to ON with an escape hatch for clients that relied on parsing empty TABLE_CAT.

(d) KNOWN LIMITATION: listCatalogs() returns only ALREADY-LOADED catalogs, so catalogs that are configured but never accessed will not be listed. This is documented; we do not eagerly load catalogs.

(e) V2-specific metadata authorization is deferred to a follow-up. Existing Hive auth hooks are unchanged and getCatalogs/getSchemas already pass null priv objects.

Does this PR introduce any user-facing change?

Yes. TABLE_CAT now reflects the catalog name (gated by the new conf). New conf: spark.sql.thriftServer.catalogMetadata.enabled.

How was this patch tested?

SparkMetadataOperationSuite covers: default spark_catalog path, configured in-memory DSv2 catalog path, and conf-disabled legacy path (getCatalogs, getSchemas, getTables, getColumns).

Was this patch authored or co-authored using generative AI tooling?

Authored with assistance from Claude Opus 4.8

…rce V2 catalog-aware

Route getCatalogs/getSchemas/getTables/getColumns through CatalogManager to honor DSv2 catalogs and the default catalog. Populate TABLE_CAT with the real catalog name. New conf spark.sql.thriftServer.catalogMetadata.enabled (default true) with legacy fallback.
@yadavay-amzn yadavay-amzn force-pushed the fix/SPARK-57518-thriftserver-dsv2-metadata branch from 6700cf8 to 0b42f09 Compare June 20, 2026 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant