Skip to content

Remove task_execution_id column from task_instances table #31

@mcruzdev

Description

@mcruzdev

Remove task_execution_id column from task_instances table

Summary

The task_execution_id column in task_instances is redundant. Since the composite primary key (instance_id, task_position) is the true identity for a task execution, task_execution_id should be removed from the database schema and all layers of the stack.

Problem

In MODE 3 (Kafka), task_execution_id is already derived from instanceId + taskPosition — confirming that the composite key carries all the identity information and the column is purely redundant.

Keeping task_execution_id around:

  • Wastes storage and index space (idx_task_instances_task_execution_id)
  • Misleads API consumers into thinking it's a stable identifier
  • Adds unnecessary complexity to triggers, mappers, and ingestion code

Scope of changes

Database (MODE 1 — PostgreSQL)

  • New Flyway migration: drop task_execution_id column and idx_task_instances_task_execution_id index
  • Update normalize_task_event() trigger function (both fast path and FK-violation fallback) to remove task_execution_id from INSERT and ON CONFLICT clauses

Domain model

  • TaskExecution.java — derive id from instanceId + taskPosition at the domain/mapper level (no database column needed)

JPA / PostgreSQL storage (MODE 1)

  • TaskInstanceEntity.java — remove taskExecutionId field, update equals(), hashCode(), getId()
  • TaskInstanceEntityMapper.java — remove taskExecutionIdid mappings
  • TaskExecutionJPAStorage.java — remove getTaskExecutionId reference

Kafka ingestion (MODE 3)

  • TaskExecutionProcessor.java — remove generateTaskExecutionId()
  • Mapper.java — remove generateTaskExecutionId() and its call
  • task-instance-upsert.sql — remove task_execution_id from INSERT columns

Elasticsearch (MODE 2)

  • task-events.json index template — remove taskExecutionId field mapping

GraphQL API

  • WorkflowInstanceGraphQLApi.java — update getTaskExecution(id) identity strategy

Tests

  • WorkflowInstanceGraphQLApiTest.java — remove setTaskExecutionId() calls, update assertions

Documentation

  • Update CLAUDE.md (remove "Don't remove task_execution_id" guidance, update field mapping table)
  • Update relevant docs in data-index-docs/, data-index-storage-postgresql/README.md, etc.

Identity strategy (resolved)

TaskExecution.id will be derived from instanceId + taskPosition at the domain/mapper level — the same derivation MODE 3 (Kafka) already uses today. No database column needed; the composite primary key (instance_id, task_position) is the source of truth, and id is computed on read.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions