Skip to content

Pipeline parameters different behavior in comparison to v2 #5614

@d-vesely

Description

@d-vesely

PySDK Version

  • PySDK V2 (2.x)
  • PySDK V3 (3.x)

Describe the bug
In v2, it was possible to pass pipeline variables to environment variables into an Estimator. For example, an environment variable "RANDOM_STATE" could be set to a value given as a pipeline variable. This does not work in v3 in a ModelTrainer instance, resulting in a validation error: ValidationError: 1 validation error for ModelTrainer. A similar issue exists in the HyperparameterTuner which does not accept a pipeline variable for the random_seed, returning ValidationError: 1 validation error for HyperParameterTuningJobConfig.

To reproduce
Create ModelTrainer instance and pass a pipeline parameter into the environment argument dictionary.
Create a HyperparameterTuner instance and pass a pipeline parameter into the random_seed argument.

model = ModelTrainer(
    source_code=source_code,
    compute=compute,
    networking=self.networking,
    base_job_name=base_job_name,
    training_image=self.image_uris["train"],
    output_data_config=OutputDataConfig(
        s3_output_path=Join(
            on="/",
            values=[
                self.s3_uri_runtime,
                ExecutionVariables.PIPELINE_EXECUTION_ID,
                "02_mt_output",
            ],
        ),
        kms_key_id=self.aws_params["kms_key_hub"],
    ),
    stopping_condition=StoppingCondition(max_runtime_in_seconds=28800),
    role=self.aws_params["exec_role"],
    sagemaker_session=self.pipeline_session,
    environment={
        "RANDOM_STATE": self.pipeline_params["RandomState"].to_string(),   # <-- This line causes issues
        **self.default_env_vars,
    },
)

hyperparameter_tuner = HyperparameterTuner(
    model_trainer=model,
    base_tuning_job_name=base_job_name,
    metric_definitions=metric_definitions,
    objective_metric_name=self.hpt_params["objective_metric_name"],
    objective_type=self.hpt_params["objective_type"],
    hyperparameter_ranges=self.hpt_params["hyperparameter_ranges"],
    max_jobs=self.hpt_params["max_jobs"],
    strategy="Bayesian",
    max_parallel_jobs=4,
    random_seed=self.pipeline_params["RandomState"], # <-- This line causes issues
    tags=self.tags,
)

Expected behavior
Pipeline variables should be able to affect environment variables, as well as the random_seed argument of the HyperparameterTuner.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 3.5.0

Additional context
This is a roadblock for us regarding a migration from v2 to v3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions