[SPARK-57518][SQL] Make ThriftServer JDBC metadata operations DataSource V2 catalog-aware#56627
Open
yadavay-amzn wants to merge 1 commit into
Open
Conversation
…rce V2 catalog-aware Route getCatalogs/getSchemas/getTables/getColumns through CatalogManager to honor DSv2 catalogs and the default catalog. Populate TABLE_CAT with the real catalog name. New conf spark.sql.thriftServer.catalogMetadata.enabled (default true) with legacy fallback.
6700cf8 to
0b42f09
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Route ThriftServer JDBC metadata operations (getCatalogs, getSchemas, getTables, getColumns) through CatalogManager so they honor DataSource V2 catalogs and the default catalog. Populate TABLE_CAT with the real catalog name. Introduce a new conf
spark.sql.thriftServer.catalogMetadata.enabled(default true) with legacy fallback when disabled.Why are the changes needed?
With
spark.sql.catalog.*andspark.sql.defaultCatalogset, JDBC/BI clients get inconsistent metadata because the metadata operations used the V1 SessionCatalog directly, ignoring any configured DSv2 catalogs.Design notes
(a) A null
catalogNameresolves to the CURRENT catalog (consistent with Spark's own unspecified-to-current resolution and with Trino/Snowflake behavior), not all catalogs.(b) getCatalogs returns CatalogManager.listCatalogs() (including spark_catalog, sorted alphabetically).
(c) TABLE_CAT was previously empty or null -- now populated with the actual catalog name. The conf defaults to ON with an escape hatch for clients that relied on parsing empty TABLE_CAT.
(d) KNOWN LIMITATION: listCatalogs() returns only ALREADY-LOADED catalogs, so catalogs that are configured but never accessed will not be listed. This is documented; we do not eagerly load catalogs.
(e) V2-specific metadata authorization is deferred to a follow-up. Existing Hive auth hooks are unchanged and getCatalogs/getSchemas already pass null priv objects.
Does this PR introduce any user-facing change?
Yes. TABLE_CAT now reflects the catalog name (gated by the new conf). New conf:
spark.sql.thriftServer.catalogMetadata.enabled.How was this patch tested?
SparkMetadataOperationSuite covers: default spark_catalog path, configured in-memory DSv2 catalog path, and conf-disabled legacy path (getCatalogs, getSchemas, getTables, getColumns).
Was this patch authored or co-authored using generative AI tooling?
Authored with assistance from Claude Opus 4.8