Skip to content

Conversation

@nlimpid
Copy link
Contributor

@nlimpid nlimpid commented Dec 17, 2025

Which issue does this PR close?

Rationale for this change

If the input stream yields no RecordBatch at all, nothing gets sent downstream, and the writer never has a chance to produce a valid file. I added a small fallback: when single_file_output is enabled and no batches were received, we send a single empty RecordBatch with the input schema.

Are these changes tested?

Yes.

Are there any user-facing changes?

  1. I’m not fully convinced this logic belongs in the demuxer. Conceptually, it might be cleaner to handle this one layer downstream on the consumer side. However, that layer doesn’t seem to have access to the schema now, so moving the logic there would require a larger refactor. Currently, I choose the minimal change that fixes the issue while keeping the impact small.
  2. Arrow seems like a special case, and there wasn’t much test coverage around. I have written some test cases for it.

@github-actions github-actions bot added core Core DataFusion crate datasource Changes to the datasource crate labels Dec 17, 2025
Copy link
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nlimpid

Thanks for working on this.

}

// if there is no batch send but with a single file, send an empty batch
if single_file_output && !is_batch_received {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DataFrameWriteOptions::with_single_file_output() method should also be updated about empty DataFrame behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Updated the doc comment for with_single_file_output()

Copy link
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
with some minor comments.

df.write_csv(&path, crate::dataframe::DataFrameWriteOptions::new(), None)
.await?;
// Expected the file to exist
assert!(std::path::Path::new(&path).exists());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why there is no assertion of 0 lines here like you did for arrow, parquet files?

df.write_json(&path, crate::dataframe::DataFrameWriteOptions::new(), None)
.await?;
// Expected the file to exist
assert!(std::path::Path::new(&path).exists());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why there is no assertion of 0 lines here like you did for arrow, parquet files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How to write csv file to disk from a empty dataframe?

2 participants