feat: Support (De)Serialization for different representations of Nullable Unions#458
feat: Support (De)Serialization for different representations of Nullable Unions#458PookieBuns wants to merge 13 commits intoapache:mainfrom
Conversation
| { | ||
| return visitor.visit_enum(EnumUnitDeserializer::new(field)); | ||
| } | ||
| // Assume `null` is the first branch if deserializing some so decrement the variant index |
There was a problem hiding this comment.
This assumption is not guaranteed by the spec :-/
We should add some index/length checks to avoid panics/overflows.
There was a problem hiding this comment.
Updated to use checked_sub to ensure underflow doesn't happen.
Regarding the topic about not being enforced by spec, even though it isn't strictly enforced, it's highly recommended https://avro.apache.org/docs/1.11.1/specification/#unions
Note that when a default value is specified for a record field whose type is a union, the type of the default value must match the first element of the union. Thus, for unions containing “null”, the “null” is usually listed first, since the default value of such unions is typically null.
For serialization we technically could get around this by resolving union branches using variant name, but for deserialization cannot yet because it's not schema aware and/or Value doesn't hold name information, which is why I chose to use index bumping on serialization too for consistent behavior. This seems like the closest we can get without a significant refactor or feature for deserialization.
For now, the behavior will intentionally throw an error if you try to (de)serialize an Option with null not being the first branch. This doesn't break existing behavior because the status quo is non-functional so although not a perfect solution imo it's a step better than what we have now
| } | ||
| } | ||
| } | ||
| let branch_index = variant_index as usize + usize::from(self.serializing_some); |
There was a problem hiding this comment.
Again, this assumes that the null is the first variant in the union.
There was a problem hiding this comment.
Updated to throw error if null isn't first branch, reasoning stated in #458 (comment)
avro/tests/nullable_union.rs
Outdated
| "name": "MyUnion", | ||
| "type": [ | ||
| "null", | ||
| "int" |
There was a problem hiding this comment.
Please add a test where the null is not the first variant too.
There was a problem hiding this comment.
Added tests in 6f691db to ensure null as non-first variant when using Option fails with error message communicating intended behavior
avro/src/serde/ser_schema.rs
Outdated
| if branch_index >= union_schema.schemas.len() { | ||
| return Err(create_error(format!( | ||
| "Variant index out of bounds: {}. The union schema has '{}' schemas", | ||
| variant_index, |
There was a problem hiding this comment.
| variant_index, | |
| branch_index, |
avro/src/serde/ser_schema.rs
Outdated
| if branch_index >= union_schema.schemas.len() { | ||
| return Err(create_error(format!( | ||
| "Variant index out of bounds: {}. The union schema has '{}' schemas", | ||
| variant_index, |
There was a problem hiding this comment.
| variant_index, | |
| branch_index, |
| } | ||
| encode_int(i as i32, &mut self.writer)?; | ||
| return encode_int(variant_index as i32, &mut self.writer); | ||
| } |
There was a problem hiding this comment.
This should probably support Schema::Ref too, to refer to Schema::Enum
There was a problem hiding this comment.
updated in 6c58e62 with test to verify behavior
avro/tests/nullable_union.rs
Outdated
| } | ||
|
|
||
| #[test] | ||
| #[ignore] |
There was a problem hiding this comment.
Please add a comment why it is ignored
There was a problem hiding this comment.
Updated to instead expect a failure / comment on why it will fail. I'm also open to just removing it if you think its out of scope, since that mod is mostly to demonstrate how untagged enums have edge cases that fail and check that tagged enums do not
Co-authored-by: Martin Grigorov <[email protected]>
Co-authored-by: Martin Grigorov <[email protected]>
Co-authored-by: Martin Grigorov <[email protected]>
| let variant_idx = (*idx as usize) | ||
| .checked_sub(usize::from(self.deserializing_some)) | ||
| .ok_or_else(|| Details::GetNull(inner.deref().clone()))?; |
There was a problem hiding this comment.
Do we want to add an Error variant here to communicate more clearly that deserializing some when null isn't first isn't supported? Right now it just tells the user Expects null for 0th when deserializing some
|
Updated to fix clippy / license checks cb749c3 |
Fixes
The Rust Representation Conundrumas mentioned in #449, which allows (de)serializing, allavro-rs,rusty, andavro_json_encoding_compatiblerust representations to avro. Tested using the provided test suite in the mentioned issue.Note: This does not address #365 since this only applies to schema aware serialization.
I considered using enum variant name as lookup method for picking Union branch when serializing, shown here, but given the current state in which
Valuedoesn't hold the name for named types, I realized it's not possible to do the same for deserialization until further work to support names inValuehas been done.Therefore, this implementation makes the assumption that if you are using
Option<T>to represent a nullable union, that means that thenullis the first variant in the union and serialization / deserialization just bump / decrement the index lookup respectively.I believe this doesn't actually produce any breaking changes, because the status quo is that tagged Option "Rusty" representations straight up don't work.
Please let me know if you have any questions!