Skip to content

Conversation

@suddendust
Copy link
Contributor

@suddendust suddendust commented Jan 12, 2026

Description

This PR implements Collection#create(Key key, Document document) for FlatPostgresCollection.

Testing

  • Added integration tests.

Checklist:

  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • Any dependent changes have been merged and published in downstream modules

@suddendust suddendust changed the title [Draft] Pg write create FlatPostgresCollection Create Doc Impl Jan 12, 2026
suresh-prakash
suresh-prakash previously approved these changes Jan 13, 2026
private final DataType canonicalType;
@Getter private final PostgresDataType postgresType;
private final boolean nullable;
private final boolean array;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: isArray since array is a slightly confusing without looking at this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@codecov
Copy link

codecov bot commented Jan 14, 2026

Codecov Report

❌ Patch coverage is 84.90566% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.60%. Comparing base (4853757) to head (c3024b0).

Files with missing lines Patch % Lines
...documentstore/postgres/FlatPostgresCollection.java 83.91% 16 Missing and 7 partials ⚠️
...rg/hypertrace/core/documentstore/CreateResult.java 85.71% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #266      +/-   ##
============================================
+ Coverage     80.51%   80.60%   +0.09%     
- Complexity     1385     1390       +5     
============================================
  Files           234      235       +1     
  Lines          6194     6348     +154     
  Branches        554      579      +25     
============================================
+ Hits           4987     5117     +130     
- Misses          831      848      +17     
- Partials        376      383       +7     
Flag Coverage Δ
integration 80.60% <84.90%> (+0.09%) ⬆️
unit 57.37% <5.66%> (-1.29%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@suddendust
Copy link
Contributor Author

suddendust commented Jan 14, 2026

@suresh-prakash Can you take a look at the following:

  1. If the code isn't able to find the columnMetadata of a particular column even after retries, this column is skipped (rather than failing). Is this behaviour correct? I went with this design to write on a best-effort basis.
  2. Similarly, if for some reason we're unable to parse the value of a column, we add it to skippedFields list rather than failing the query.
  3. If any columns are skipped this way, we return the list to the client in the CreateResult. For this, CreateResult has been enhanced.
  4. We retry once by default in case we get the following SQL state in the first attempt: UNDEFINED_COLUMN("42703") OR DATATYPE_MISMATCH("42804"). The logic is maybe the schema has changed to better to refresh it and retry.

@suresh-prakash
Copy link
Contributor

@suresh-prakash Can you take a look at the following:

  1. If the code isn't able to find the columnMetadata of a particular column even after retries, this column is skipped (rather than failing). Is this behaviour correct? I went with this design to write on a best-effort basis.
  2. Similarly, if for some reason we're unable to parse the value of a column, we add it to skippedFields list rather than failing the query.
  3. If any columns are skipped this way, we return the list to the client in the CreateResult. For this, CreateResult has been enhanced.
  4. We retry once by default in case we get the following SQL state in the first attempt: UNDEFINED_COLUMN("42703") OR DATATYPE_MISMATCH("42804"). The logic is maybe the schema has changed to better to refresh it and retry.
  1. If the code isn't able to find the columnMetadata of a particular column even after retries, this column is skipped (rather than failing). Is this behaviour correct? I went with this design to write on a best-effort basis.

Best-effort makes sense. But, most of the times such log messages are ignored though we may be setting them in the CreateResult. Rather I'd prefer we fail it so that the client can take a necessary action (they may choose to retry with the column ignored, if we throw a custom exception with meaningful info.).

  1. Similarly, if for some reason we're unable to parse the value of a column, we add it to skippedFields list rather than failing the query.

This is fine, I guess, because that's the way the Mongo impl. works today. If I give an invalid selection, it doesn't throw. Rather avoids it in the JSON, which in turn results in a null (or missing) value on the caller/client side.

  1. We retry once by default in case we get the following SQL state in the first attempt: UNDEFINED_COLUMN("42703") OR DATATYPE_MISMATCH("42804"). The logic is maybe the schema has changed to better to refresh it and retry.

100% makes sense. 🙂

@puneet-traceable
Copy link

puneet-traceable commented Jan 14, 2026

@suresh-prakash Can you take a look at the following:

  1. If the code isn't able to find the columnMetadata of a particular column even after retries, this column is skipped (rather than failing). Is this behaviour correct? I went with this design to write on a best-effort basis.
  2. Similarly, if for some reason we're unable to parse the value of a column, we add it to skippedFields list rather than failing the query.
  3. If any columns are skipped this way, we return the list to the client in the CreateResult. For this, CreateResult has been enhanced.
  4. We retry once by default in case we get the following SQL state in the first attempt: UNDEFINED_COLUMN("42703") OR DATATYPE_MISMATCH("42804"). The logic is maybe the schema has changed to better to refresh it and retry.
  1. If the code isn't able to find the columnMetadata of a particular column even after retries, this column is skipped (rather than failing). Is this behaviour correct? I went with this design to write on a best-effort basis.

Best-effort makes sense. But, most of the times such log messages are ignored though we may be setting them in the CreateResult. Rather I'd prefer we fail it so that the client can take a necessary action (they may choose to retry with the column ignored, if we throw a custom exception with meaningful info.).

  1. Similarly, if for some reason we're unable to parse the value of a column, we add it to skippedFields list rather than failing the query.

This is fine, I guess, because that's the way the Mongo impl. works today. If I give an invalid selection, it doesn't throw. Rather avoids it in the JSON, which in turn results in a null (or missing) value on the caller/client side.

  1. We retry once by default in case we get the following SQL state in the first attempt: UNDEFINED_COLUMN("42703") OR DATATYPE_MISMATCH("42804"). The logic is maybe the schema has changed to better to refresh it and retry.

100% makes sense. 🙂

Why retry on DATATYPE_MISMATCH? This can be a bad situation to be in. If client data type mismatches the db column type then retries won't help. DB Column data type change should never be the case in postgres.

@suddendust
Copy link
Contributor Author

DB Column data type change should never be the case in postgres.

Well theoretically speaking, the rationale behind the retry is: Maybe the data type changed and the schema is stale, so refresh it and try again.

@suddendust
Copy link
Contributor Author

@suresh-prakash I am not throwing an exception because it won't throw an exception in Mongo as well. So to keep the interface behaviour consistent, shall we stick to it?

@suresh-prakash
Copy link
Contributor

@suresh-prakash I am not throwing an exception because it won't throw an exception in Mongo as well. So to keep the interface behaviour consistent, shall we stick to it?

True. But silently neglecting seems dangerous. Most clients would be unware of it. While it may give the immediate compatibility, it could create far more issues later. Since this write path anyways require code changes in the clients, I still think it would be better to let the clients handle it rather than the library doing it behind the scenes.

@kotharironak Thoughts here?

@suddendust
Copy link
Contributor Author

suddendust commented Jan 16, 2026

@suresh-prakash How about a config to control this (in customParameters)? Maybe something like bestEffortWrites: true/false. If true, it would write on a best-effort basis. If not, if would do a strict match - That is, all fields passed in the doc should be present in the schema along with the right value type.

@suddendust
Copy link
Contributor Author

suddendust commented Jan 16, 2026

@suresh-prakash Have add a bestEffortWrites custom param that controls the dataflow as following:

  1. If true, then PG would skip any fields passed in the document that are not present in the schema, or whose passed values' types don't conform to what is present in the schema.
  2. If false, it does a strict match. All fields present in the doc should be present in the schema + the passed values' types should conform to the defined schema.

Wdyt?

@suresh-prakash
Copy link
Contributor

@suresh-prakash Have add a bestEffortWrites custom param that controls the dataflow as following:

  1. If true, then PG would skip any fields passed in the document that are not present in the schema, or whose passed values' types don't conform to what is present in the schema.
  2. If false, it does a strict match. All fields present in the doc should be present in the schema + the passed values' types should conform to the defined schema.

Wdyt?

This is a nice middle-ground. 🙂 Just a small suggestion though. Instead of a boolean, can we make it as an enum (say, MissingColumnStrategy) please? That way, if at all there arises a third strategy in the future, it's straight-forward to extend it. E.g. values: SKIP, THROW, IGNORE_DOCUMENT, MENTION_IN_RESPONSE, etc. (Out of these, we can just implement the necessary ones today).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants