Skip to content

feat: Support (De)Serialization for different representations of Nullable Unions#458

Open
PookieBuns wants to merge 13 commits intoapache:mainfrom
PookieBuns:main
Open

feat: Support (De)Serialization for different representations of Nullable Unions#458
PookieBuns wants to merge 13 commits intoapache:mainfrom
PookieBuns:main

Conversation

@PookieBuns
Copy link
Contributor

@PookieBuns PookieBuns commented Feb 10, 2026

Fixes The Rust Representation Conundrum as mentioned in #449, which allows (de)serializing, all avro-rs, rusty, and avro_json_encoding_compatible rust representations to avro. Tested using the provided test suite in the mentioned issue.

Note: This does not address #365 since this only applies to schema aware serialization.

I considered using enum variant name as lookup method for picking Union branch when serializing, shown here, but given the current state in which Value doesn't hold the name for named types, I realized it's not possible to do the same for deserialization until further work to support names in Value has been done.

Therefore, this implementation makes the assumption that if you are using Option<T> to represent a nullable union, that means that the null is the first variant in the union and serialization / deserialization just bump / decrement the index lookup respectively.

I believe this doesn't actually produce any breaking changes, because the status quo is that tagged Option "Rusty" representations straight up don't work.

Please let me know if you have any questions!

{
return visitor.visit_enum(EnumUnitDeserializer::new(field));
}
// Assume `null` is the first branch if deserializing some so decrement the variant index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumption is not guaranteed by the spec :-/
We should add some index/length checks to avoid panics/overflows.

Copy link
Contributor Author

@PookieBuns PookieBuns Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use checked_sub to ensure underflow doesn't happen.

Regarding the topic about not being enforced by spec, even though it isn't strictly enforced, it's highly recommended https://avro.apache.org/docs/1.11.1/specification/#unions

Note that when a default value is specified for a record field whose type is a union, the type of the default value must match the first element of the union. Thus, for unions containing “null”, the “null” is usually listed first, since the default value of such unions is typically null.

For serialization we technically could get around this by resolving union branches using variant name, but for deserialization cannot yet because it's not schema aware and/or Value doesn't hold name information, which is why I chose to use index bumping on serialization too for consistent behavior. This seems like the closest we can get without a significant refactor or feature for deserialization.

For now, the behavior will intentionally throw an error if you try to (de)serialize an Option with null not being the first branch. This doesn't break existing behavior because the status quo is non-functional so although not a perfect solution imo it's a step better than what we have now

}
}
}
let branch_index = variant_index as usize + usize::from(self.serializing_some);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this assumes that the null is the first variant in the union.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to throw error if null isn't first branch, reasoning stated in #458 (comment)

"name": "MyUnion",
"type": [
"null",
"int"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test where the null is not the first variant too.

Copy link
Contributor Author

@PookieBuns PookieBuns Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests in 6f691db to ensure null as non-first variant when using Option fails with error message communicating intended behavior

if branch_index >= union_schema.schemas.len() {
return Err(create_error(format!(
"Variant index out of bounds: {}. The union schema has '{}' schemas",
variant_index,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
variant_index,
branch_index,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

if branch_index >= union_schema.schemas.len() {
return Err(create_error(format!(
"Variant index out of bounds: {}. The union schema has '{}' schemas",
variant_index,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
variant_index,
branch_index,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

}
encode_int(i as i32, &mut self.writer)?;
return encode_int(variant_index as i32, &mut self.writer);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably support Schema::Ref too, to refer to Schema::Enum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in 6c58e62 with test to verify behavior

}

#[test]
#[ignore]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment why it is ignored

Copy link
Contributor Author

@PookieBuns PookieBuns Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to instead expect a failure / comment on why it will fail. I'm also open to just removing it if you think its out of scope, since that mod is mostly to demonstrate how untagged enums have edge cases that fail and check that tagged enums do not

Comment on lines +862 to +864
let variant_idx = (*idx as usize)
.checked_sub(usize::from(self.deserializing_some))
.ok_or_else(|| Details::GetNull(inner.deref().clone()))?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add an Error variant here to communicate more clearly that deserializing some when null isn't first isn't supported? Right now it just tells the user Expects null for 0th when deserializing some

@PookieBuns
Copy link
Contributor Author

PookieBuns commented Feb 11, 2026

Updated to fix clippy / license checks cb749c3

@PookieBuns PookieBuns requested a review from martin-g February 11, 2026 13:25
@Kriskras99 Kriskras99 mentioned this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants