TL;DR: Opt-in dy.Enum → sa.Enum on Postgres for Alembic drift detection; keep CHAR/String as the default.
Summary
I would like to use dy.Enum with a Python enum.Enum / StrEnum for Polars validation and Schema.to_sqlalchemy_columns() for SQLAlchemy table definitions paired with Alembic. Today dy.Enum always maps to CHAR(n) or String(n), not sa.Enum. That keeps Polars and SQL storage aligned as strings, but it breaks the common pattern of one Python enum as the source of truth when Postgres uses a native ENUM type and Alembic should detect new members.
It would be helpful to have an opt-in way for dy.Enum to emit sqlalchemy.Enum (and on Postgres, a named PG enum type), similar to how dy.Struct already maps to dialect-specific types (e.g. JSONB on Postgres).
Current behavior
For a schema like this:
from enum import StrEnum, auto
import dataframely as dy
class Status(StrEnum):
PENDING = auto()
APPROVED = auto()
EXPIRED = auto()
class Staged(dy.Schema):
status = dy.Enum(Status)
Staged.to_sqlalchemy_columns(postgresql.dialect()) produces a string column (e.g. VARCHAR(8) when category lengths differ), because Enum.sqlalchemy_dtype uses sa.CHAR / sa.String based on max category length:
def sqlalchemy_dtype(self, dialect: sa.Dialect) -> sa_TypeEngine:
category_lengths = [len(c) for c in self.categories]
if all(length == category_lengths[0] for length in category_lengths):
return sa.CHAR(category_lengths[0])
return sa.String(max(category_lengths))
This matches the docstring (“enum (string) values”) and existing tests, and is fine when the database also stores plain strings.
Issue
Many SQLAlchemy + Postgres setups use native enum types for constrained columns:
sa.Column("status", sa.Enum(MyStatus, name="my_status"), nullable=False)
Benefits:
- DB-level constraint on allowed values
- Alembic autogenerate / alembic check can surface new enum members as schema drift
- Consistent with hand-written sa.Table definitions that already use sa.Enum(...)
When the table is built only from to_sqlalchemy_columns(), dy.Enum and sa.Enum diverge: adding a member to the Python enum updates Polars validation but not the reflected SQL type Alembic compares against, unless every enum column is manually overridden after to_sqlalchemy_columns().
That undermines the goal of defining tables once from the dy.Schema and keeping migrations in sync.
Idea
Add an opt-in flag on dy.Enum (names are suggestions):
sqlalchemy_native=True, or
sqlalchemy_enum=True plus optional sqlalchemy_enum_name: str | None for the Postgres type name
For example:
class Staged(dy.Schema):
status = dy.Enum(Status, sqlalchemy_native=True) # name TBD
cols = Staged.to_sqlalchemy_columns(postgresql.dialect())
# status column type should be sa.Enum(Status, name="...", ...)
# not sa.String(8)
Default behavior should stay CHAR / String so existing tests and deployments are unchanged.
Version
- dataframely: 2.10.1
- sqlalchemy: 2.1.0b2, PostgreSql, Alembic
Happy to open a PR!
Side note
This is my first time opening an issue on an open source project! Let me know if this is poorly written!
TL;DR: Opt-in dy.Enum → sa.Enum on Postgres for Alembic drift detection; keep CHAR/String as the default.
Summary
I would like to use
dy.Enumwith a Pythonenum.Enum/StrEnumfor Polars validation andSchema.to_sqlalchemy_columns()for SQLAlchemy table definitions paired with Alembic. Todaydy.Enumalways maps toCHAR(n)orString(n), notsa.Enum. That keeps Polars and SQL storage aligned as strings, but it breaks the common pattern of one Python enum as the source of truth when Postgres uses a native ENUM type and Alembic should detect new members.It would be helpful to have an opt-in way for
dy.Enumto emitsqlalchemy.Enum(and on Postgres, a named PG enum type), similar to howdy.Structalready maps to dialect-specific types (e.g. JSONB on Postgres).Current behavior
For a schema like this:
Staged.to_sqlalchemy_columns(postgresql.dialect())produces a string column (e.g.VARCHAR(8)when category lengths differ), becauseEnum.sqlalchemy_dtypeusessa.CHAR/sa.Stringbased on max category length:This matches the docstring (“enum (string) values”) and existing tests, and is fine when the database also stores plain strings.
Issue
Many SQLAlchemy + Postgres setups use native enum types for constrained columns:
Benefits:
When the table is built only from
to_sqlalchemy_columns(),dy.Enumandsa.Enumdiverge: adding a member to the Python enum updates Polars validation but not the reflected SQL type Alembic compares against, unless every enum column is manually overridden afterto_sqlalchemy_columns().That undermines the goal of defining tables once from the
dy.Schemaand keeping migrations in sync.Idea
Add an opt-in flag on dy.Enum (names are suggestions):
sqlalchemy_native=True, orsqlalchemy_enum=Trueplus optionalsqlalchemy_enum_name: str | Nonefor the Postgres type nameFor example:
Default behavior should stay CHAR / String so existing tests and deployments are unchanged.
Version
Happy to open a PR!
Side note
This is my first time opening an issue on an open source project! Let me know if this is poorly written!