Add readParquetFiles for partitioned parquet datasets#131
Merged
mchav merged 4 commits intoDataHaskell:mainfrom Jan 19, 2026
Merged
Add readParquetFiles for partitioned parquet datasets#131mchav merged 4 commits intoDataHaskell:mainfrom
mchav merged 4 commits intoDataHaskell:mainfrom
Conversation
mchav
reviewed
Jan 17, 2026
mchav
reviewed
Jan 17, 2026
Contributor
Author
|
Kindly check if its alright |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The function:
Accepts either a single Parquet file or a directory
Recursively discovers .parquet files when given a directory
Reads each file using the existing readParquet
Vertically merges the results using the existing DataFrame Semigroup / Monoid instance
The existing readParquet behavior is unchanged.
readParquetFiles is re-exported from DataFrame so it is available as D.readParquetFiles.
Performance considerations
The implementation relies on existing DataFrame merge semantics (mconcat) and performs a recursive filesystem traversal for file discovery. No changes were made to Parquet decoding or in-memory column handling.
Testing
Manually tested by reading a partitioned dataset stored as nested directories of Parquet files.
If there is something Which i am missing kindly mention and all suggestions are welcom.