WANNA Job#
wanna.core.models.training_custom_job.BaseCustomJobModel(*, name, project_id, zone=None, region, labels=None, description=None, service_account=None, network=None, bucket, tags=None, metadata=None, enable_web_access=False, base_output_directory=None, tensorboard_ref=None, timeout_seconds=86400, encryption_spec=None, env_vars=None)name- [str] Custom name for this instanceproject_id' - [str] (optional) Overrides GCP Project ID from thegcp_profile` segmentzone- [str] (optional) Overrides zone from thegcp_profilesegmentregion- [str] (optional) Overrides region from thegcp_profilesegmentlabels- [dict[str, str]] (optional) Custom labels to apply to this instanceservice_account- [str] (optional) Overrides service account from thegcp_profilesegmentnetwork- [str] (optional) Overrides network from thegcp_profilesegmenttags- [dict[str, str]] (optional) Tags to apply to this instancemetadata- [str] (optional) Custom metadata to apply to this instanceenable_web_access- [bool] Whether you want Vertex AI to enable interactive shell access to training containers. Default is Falsebucket- [str] Overrides bucket from thegcp_profilesegmentbase_output_directory- [str] (optional) Path to where outputs will be savedtensorboard_ref- [str] (optional) Name of the Vertex AI Experimenttimeout_seconds- [int] Job timeout. Defaults to 60 * 60 * 24 s = 24 hoursencryption_spec- [str] (optional) The Cloud KMS resource identifier. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key The key needs to be in the same region as where the compute resource is createdenv_vars- dict[str, str] (optional) Environment variables to be propagated to the job
Hyper-parameter tuning#
wanna.core.models.training_custom_job.HyperparameterTuning(*, metrics, parameters, max_trial_count=15, parallel_trial_count=3, search_algorithm=None, encryption_spec=None)metrics- Dictionary of type [str, Literal["minimize", "maximize"]]parameters- list[HyperParamater] defined per var_name, type, min, max, scalemax_trial_count- [int] defaults to 15parallel_trial_count- [int] defaults to 3search_algorithm- [str] (optional) Can be "grid" or "random"encryption_spec- [str] (optional) The Cloud KMS resource identifier. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key The key needs to be in the same region as where the compute resource is created
A custom job can be simply converted to a hyper-parameter tuning job just by adding
one extra parameter called hp_tuning. This will start a series of jobs (instead of just one job)
and try to find the best combination of hyper-parameters in regard to a target variable that you specify.
Read the official documentation for more information.
In general, you have to set which hyper-parameters are changeable, which metric you want to optimize over and how many trials you want to run. You also need to adjust your training script so it would accept hyper-parameters as script arguments and report the optimized metric back to Vertex-Ai.
Setting hyper-parameter space#
Your code should accept a script arguments with name matching wanna.yaml config.
For example, if you want to fine-tune the learning rate in your model:
In wanna.yaml config:
hp_tuning:
parameters:
- var_name: learning_rate
type: double
min: 0.001
max: 1
scale: log
And the python script should accept the same argument with the same type:
parser = argparse.ArgumentParser()
parser.add_argument(
'--learning_rate',
required=True,
type=float,
help='learning rate')
Currently, you can use parameters of type double, integer, discrete and categorical.
Each of them must be specified by var_name, type and additionaly:
double:min,maxandscale(linear/log)integer:min,maxandscale(linear/log)discrete:values(list of possible values) andscale(linear/log)categorical:values(list of possible values)
Setting target metric#
You can choose to either maximize or minimize your optimized metric. Example in wanna.yaml:
hp_tuning:
metrics: {'accuracy':'maximize'}
parameters:
...
Your python script must report back the metric during training, you should use cloudml-hypertune library.
import hypertune
hpt = hypertune.HyperTune()
hpt.report_hyperparameter_tuning_metric(
hyperparameter_metric_tag='accuracy',
metric_value=0.987,
global_step=1000)
Setting number of trials and search algorithm#
The number of trials can be influenced by max_trial_count and parallel_trial_count.
Search through hyper-parameter space can be grid, random or if not any of those two are set,
the default Bayesian Optimization will be used.