Skip to content

WANNA Job#

class wanna.core.models.training_custom_job.BaseCustomJobModel(*, name, project_id, zone=None, region, labels=None, description=None, service_account=None, network=None, bucket, tags=None, metadata=None, enable_web_access=False, base_output_directory=None, tensorboard_ref=None, timeout_seconds=86400, encryption_spec=None, env_vars=None)
  • name - [str] Custom name for this instance
  • project_id' - [str] (optional) Overrides GCP Project ID from thegcp_profile` segment
  • zone - [str] (optional) Overrides zone from the gcp_profile segment
  • region - [str] (optional) Overrides region from the gcp_profile segment
  • labels- [Dict[str, str]] (optional) Custom labels to apply to this instance
  • service_account - [str] (optional) Overrides service account from the gcp_profile segment
  • network - [str] (optional) Overrides network from the gcp_profile segment
  • tags- [Dict[str, str]] (optional) Tags to apply to this instance
  • metadata- [str] (optional) Custom metadata to apply to this instance
  • enable_web_access - [bool] Whether you want Vertex AI to enable interactive shell access to training containers. Default is False
  • bucket - [str] Overrides bucket from the gcp_profile segment
  • base_output_directory - [str] (optional) Path to where outputs will be saved
  • tensorboard_ref - [str] (optional) Name of the Vertex AI Experiment
  • timeout_seconds - [int] Job timeout. Defaults to 60 * 60 * 24 s = 24 hours
  • encryption_spec- [str] (optional) The Cloud KMS resource identifier. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key The key needs to be in the same region as where the compute resource is created
  • env_vars - Dict[str, str] (optional) Environment variables to be propagated to the job

Hyper-parameter tuning#

class wanna.core.models.training_custom_job.HyperparameterTuning(*, metrics, parameters, max_trial_count=15, parallel_trial_count=3, search_algorithm=None, encryption_spec=None)
  • metrics - Dictionary of type [str, Literal["minimize", "maximize"]]
  • parameters - List[HyperParamater] defined per var_name, type, min, max, scale
  • max_trial_count - [int] defaults to 15
  • parallel_trial_count - [int] defaults to 3
  • search_algorithm - [str] (optional) Can be "grid" or "random"
  • encryption_spec - [str] (optional) The Cloud KMS resource identifier. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key The key needs to be in the same region as where the compute resource is created

A custom job can be simply converted to a hyper-parameter tuning job just by adding one extra parameter called hp_tuning. This will start a series of jobs (instead of just one job) and try to find the best combination of hyper-parameters in regard to a target variable that you specify.

Read the official documentation for more information.

In general, you have to set which hyper-parameters are changeable, which metric you want to optimize over and how many trials you want to run. You also need to adjust your training script so it would accept hyper-parameters as script arguments and report the optimized metric back to Vertex-Ai.

Setting hyper-parameter space#

Your code should accept a script arguments with name matching wanna.yaml config. For example, if you want to fine-tune the learning rate in your model:

In wanna.yaml config:

    hp_tuning:
      parameters:
        - var_name: learning_rate
          type: double
          min: 0.001
          max: 1
          scale: log

And the python script should accept the same argument with the same type:

  parser = argparse.ArgumentParser()
  parser.add_argument(
      '--learning_rate',
      required=True,
      type=float,
      help='learning rate')

Currently, you can use parameters of type double, integer, discrete and categorical. Each of them must be specified by var_name, type and additionaly:

  • double: min, max and scale (linear / log)
  • integer: min, max and scale (linear / log)
  • discrete: values (list of possible values) and scale (linear / log)
  • categorical: values (list of possible values)

Setting target metric#

You can choose to either maximize or minimize your optimized metric. Example in wanna.yaml:

    hp_tuning:
      metrics: {'accuracy':'maximize'}
      parameters:
        ...

Your python script must report back the metric during training, you should use cloudml-hypertune library.

import hypertune

hpt = hypertune.HyperTune()
hpt.report_hyperparameter_tuning_metric(
    hyperparameter_metric_tag='accuracy',
    metric_value=0.987,
    global_step=1000)

Setting number of trials and search algorithm#

The number of trials can be influenced by max_trial_count and parallel_trial_count.

Search through hyper-parameter space can be grid, random or if not any of those two are set, the default Bayesian Optimization will be used.