WANNA Notebook#

We offer a simple way of managing Jupyter Notebooks on GCP using Vertex AI Workbench Instances, with multiple ways to set your environment, mount a GCS bucket, and more.

class wanna.core.models.workbench.InstanceModel(*, name, project_id, zone, region=None, labels=None, description=None, service_account=None, network=None, bucket=None, tags=None, metadata=None, machine_type='e2-standard-2', gpu=None, data_disk=None, subnet=None, tensorboard_ref=None, type='instance', owner=None, boot_disk=None, environment=NotebookEnvironment(vm_image=VMImage(version=None), docker_image_ref=None), no_public_ip=True, enable_dataproc=False, enable_ip_forwarding=False, no_proxy_access=False, enable_monitoring=True, idle_shutdown_timeout=720, collaborative=False, env_vars=None, bucket_mounts=None, post_startup_script=None, post_startup_script_behavior='run_once', environment_auto_upgrade=None, delete_to_trash=False, report_health=True)

type - [str] (optional) Type of the notebook, instance, for dispatching purposes.
name- [str] Custom name for this instance
project_id - [str] (optional) Overrides GCP Project ID from the gcp_profile segment
zone - [str] (optional) Overrides zone from the gcp_profile segment
region - [str] (optional) Overrides region from the gcp_profile segment
labels- [dict[str, str]] (optional) Custom labels to apply to this instance
service_account - [str] (optional) Overrides service account from the gcp_profile segment
network - [str] (optional) Overrides network from the gcp_profile segment
tags- [dict[str, str]] (optional) Tags to apply to this instance
metadata- [Optional[dict[str, Any]]] (optional) Custom metadata to apply to this instance
owner - [str] This can be either a single user email address and that would be the only one able to access the notebook. Or service account and then everyone who has the iam.serviceAccounts.actAs permission on the specified service account will be able to connect.
machine_type - [str] (optional) GCP Compute Engine machine type
gpu- [GPU] (optional) The hardware GPU accelerator used on this instance.
data_disk - [Disk] (optional) Data disk configuration to attach to this instance.
kernels - [list[str]] (optional) Custom kernels given as links to container registry
tensorboard_ref - [str] (optional) Reference to Vertex Experimetes
subnet- [str] (optional) Subnetwork of a given network
internal_ip_only - [bool] (optional) Public or private (default) IP address
idle_shutdown_timeout - [int] (optional) Time in minutes, between 10 and 1440, defaults to 720. If None, there is no idle shutdown
post_startup_script - [str] (optional) Path to a script that will be executed after the instance is started.
post_startup_script_behavior - [str] Defines the behavior of the post startup script. Documentation https://cloud.google.com/vertex-ai/docs/workbench/instances/manage-metadata
environment_auto_upgrade - [str] (optional) Cron schedule for environment auto-upgrade.
delete_to_trash - [bool] (optional) If true, the instance will be deleted to trash.
report_health - [bool] (optional) If true, the instance will report health to Cloud Monitoring

Notebook Environments#

There are two distinct possibilities for your environment.

Use a custom docker image, we recommend you build on top of GCP notebook ready images, either with using one of their images as a base or by using the notebook_ready_image docker type. It is also possible to build your image from scratch, but please follow GCP's recommended principles and port settings as described here.

docker:
  images:
    - build_type: local_build_image
      name: custom-notebook-container
      context_dir: .
      dockerfile: Dockerfile.notebook
  repository: wanna-samples
  cloud_build: true

notebooks:
  - name: wanna-notebook-custom-container
    environment:
      docker_image_ref: custom-notebook-container

Use a virtual machine image with preconfigured python libraries containing TensorFlow, PyTorch, R and more. Currently, GCP does not offer any customization, so you just pass empty dict to the vm_image.

notebooks:
  - name: wanna-notebook-vm
    machine_type: n1-standard-4
    environment:
     vm_image: {}

Mounting buckets#

We can automatically mount GCS buckets with gcsfuse during the notebook startup.

Example:

    bucket_mounts:
      - bucket_name: us-burger-gcp-poc-mooncloud
        bucket_dir: data
        local_path: /home/jupyter/mounted/gcs

Tensorboard integration#

tb-gcp-uploader is needed to upload the logs to the tensorboard instance. A detailed tutorial on this tool can be found here.

If you set the tensorboard_ref in the WANNA yaml config, we will export the tensorboard resource name as AIP_TENSORBOARD_LOG_DIR.

Roles and permissions#

Permission and suggested roles (applying the principle of least privilege) required for notebook manipulation:

WANNA action	Permissions	Suggested Roles
create	See full list	`roles/notebooks.runner`, `roles/notebooks.admin`
delete	see full list	`roles/notebooks.admin`

For accessing the JupyterLab web interface, you must grant the user access to the service account used by the notebooks instance. If the instance owner is set, only this user can access the web interface.

Full list of available roles and permission.

Local development and SSH#

If you wish to develop code in your local IDE and run it on Vertex-AI notebooks, we have a solution for you. Assuming your notebook is already running, you can set up an SSH connection via:

wanna notebook ssh --background -n notebook_name

Wanna will create an SSH tunnel using GCP IAP from your local environment to your notebook.

The --background/-b flag means that the tunnel will be created in the background and you can access the notebook running in GCP at localhost:8080 (port can be customized with --port). The second possibility is to use --interactive/-i and that will start a bash inside the Compute Engine instance backing your Vertex-AI notebook.

Once you set an --background connection to the notebook, you can use your favorite IDE to develop in the notebook. Here we share instructions on how to use VSCode for this.

Connecting with VSCode#

Install Jupyter Extension
Create a new file with the type Jupyter notebook
Select the Jupyter Server: local button in the global Status bar or run the Jupyter: Specify local or remote Jupyter server for connections command from the Command Palette (⇧⌘P).
Select option Existing URL and input http://localhost:8080
You should be connected. If you get an error saying something like '_xsrf' argument missing from POST., it is because the VSCode cannot start a python kernel in GCP. The current workaround is to manually start a kernel at http://localhost:8080 and then in the VSCode connect to the exiting kernel in the right upper corner.

A more detailed guide on setting a connection with VSCode to Jupyter can be found at https://code.visualstudio.com/docs/datascience/jupyter-notebooks.

Example#

notebooks:
  - name: wanna-notebook-trial
    service_account:
    owner: 
    machine_type: n1-standard-4
    labels:
      notebook_usecase: wanna-notebook-sample-simple-pip
    environment:
      vm_image: {}
    gpu:
      count: 1
      accelerator_type: NVIDIA_TESLA_V100
      install_gpu_driver: true
    boot_disk:
      disk_type: pd_standard
      size_gb: 100
    data_disk:
      disk_type: pd_standard
      size_gb: 100
    bucket_mounts:
      - bucket_name: us-burger-gcp-poc-mooncloud
        bucket_dir: data
        local_path: /home/jupyter/mounted/gcs
    tensorboard_ref: my-super-tensorboard