Skip to content

Feature Request: Optional native SQLAlchemy Enum for dy.Enum in to_sqlalchemy_columns (Postgres) #354

@jackoberman

Description

TL;DR: Opt-in dy.Enum → sa.Enum on Postgres for Alembic drift detection; keep CHAR/String as the default.

Summary

I would like to use dy.Enum with a Python enum.Enum / StrEnum for Polars validation and Schema.to_sqlalchemy_columns() for SQLAlchemy table definitions paired with Alembic. Today dy.Enum always maps to CHAR(n) or String(n), not sa.Enum. That keeps Polars and SQL storage aligned as strings, but it breaks the common pattern of one Python enum as the source of truth when Postgres uses a native ENUM type and Alembic should detect new members.
It would be helpful to have an opt-in way for dy.Enum to emit sqlalchemy.Enum (and on Postgres, a named PG enum type), similar to how dy.Struct already maps to dialect-specific types (e.g. JSONB on Postgres).

Current behavior

For a schema like this:

from enum import StrEnum, auto
import dataframely as dy

class Status(StrEnum):
    PENDING = auto()
    APPROVED = auto()
    EXPIRED = auto()

class Staged(dy.Schema):
    status = dy.Enum(Status)

Staged.to_sqlalchemy_columns(postgresql.dialect()) produces a string column (e.g. VARCHAR(8) when category lengths differ), because Enum.sqlalchemy_dtype uses sa.CHAR / sa.String based on max category length:

def sqlalchemy_dtype(self, dialect: sa.Dialect) -> sa_TypeEngine:
    category_lengths = [len(c) for c in self.categories]
    if all(length == category_lengths[0] for length in category_lengths):
        return sa.CHAR(category_lengths[0])
    return sa.String(max(category_lengths))

This matches the docstring (“enum (string) values”) and existing tests, and is fine when the database also stores plain strings.

Issue

Many SQLAlchemy + Postgres setups use native enum types for constrained columns:

sa.Column("status", sa.Enum(MyStatus, name="my_status"), nullable=False)

Benefits:

  • DB-level constraint on allowed values
  • Alembic autogenerate / alembic check can surface new enum members as schema drift
  • Consistent with hand-written sa.Table definitions that already use sa.Enum(...)

When the table is built only from to_sqlalchemy_columns(), dy.Enum and sa.Enum diverge: adding a member to the Python enum updates Polars validation but not the reflected SQL type Alembic compares against, unless every enum column is manually overridden after to_sqlalchemy_columns().

That undermines the goal of defining tables once from the dy.Schema and keeping migrations in sync.

Idea

Add an opt-in flag on dy.Enum (names are suggestions):

  • sqlalchemy_native=True, or
  • sqlalchemy_enum=True plus optional sqlalchemy_enum_name: str | None for the Postgres type name

For example:

class Staged(dy.Schema):
    status = dy.Enum(Status, sqlalchemy_native=True)  # name TBD

cols = Staged.to_sqlalchemy_columns(postgresql.dialect())
# status column type should be sa.Enum(Status, name="...", ...)
# not sa.String(8)

Default behavior should stay CHAR / String so existing tests and deployments are unchanged.

Version

  • dataframely: 2.10.1
  • sqlalchemy: 2.1.0b2, PostgreSql, Alembic

Happy to open a PR!

Side note

This is my first time opening an issue on an open source project! Let me know if this is poorly written!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions