Skip to content

Conversation

@mspruc
Copy link
Contributor

@mspruc mspruc commented Jan 19, 2026

lets have a discussion about how we are going to handle Continuous sources e.g. in flink. And how we are going to integrate this vs the old data set implementation.

@novatechflow
Copy link
Member

yes, pls start in our dev mailing list.

@mspruc mspruc marked this pull request as draft January 26, 2026 11:25
@mspruc
Copy link
Contributor Author

mspruc commented Jan 29, 2026

Please LMK if I need to deprecate any parts of the old DataSet API.

also I'm unsure whether its a smart idea to implement Join for both continuous and bounded streams like this, thoughts?

@mspruc mspruc marked this pull request as ready for review January 29, 2026 10:05
@mspruc
Copy link
Contributor Author

mspruc commented Jan 29, 2026

@juripetersen @zkaoudi @novatechflow Review when you have time pls

private DataStream<?> dataStream;

// TODO: this.size is currently always 0
private long size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have any effects?

Copy link
Contributor Author

@mspruc mspruc Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftovers from the old DataSetChannel implementation the user could theoretically extend the DataStreamChannel and provide their own, but its such a niche situation I'm fine with removing it.

@@ -56,6 +56,12 @@ public class FlinkExecutor extends PushExecutorTemplate {
*/
public ExecutionEnvironment fee;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally we just remove the old Env and also DataSet with it, so that the end user of wayang doesn't even notice this change (other than a few operators missing maybe)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion in the dev-list I've delegated this to the configuration.

import org.apache.wayang.flink.operators.FlinkBoundedTextFileSource;
import org.apache.wayang.flink.platform.FlinkPlatform;

public class BoundedTextFileSourceMapping implements Mapping {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As everything is bounded as of now in Wayang, I would just call this TextFileSource and replace TextFileSource from Flink with it.

We can later add a StreamedTextFileSource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevertheless since our discussion on continuous streams hasn't concluded either I still propose we keep the bounded semantic as a way to leave the door open for continuous sources later.

/**
* Mapping from {@link JoinOperator} to {@link FlinkDataStreamJoinOperator}.
*/
public class StreamedJoinMapping implements Mapping {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes for this operator. I think we should just remove the DataSet ones and then replace them with the DataStream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

Copy link
Contributor

@juripetersen juripetersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, but we should just discuss if that's the way to go.

@juripetersen
Copy link
Contributor

I guess we just need more activity in the discussion on the mailing list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants