Skip to content

WANNA Notebook#

We offer a simple way of managing Jupyter Notebooks on GCP, with multiple ways to set your environment, mount a GCS bucket, and more.

class wanna.core.models.notebook.NotebookModel(*, name, project_id, zone, region=None, labels=None, description=None, service_account=None, network=None, bucket=None, tags=None, metadata=None, machine_type='n1-standard-4', environment=NotebookEnvironment(vm_image=VMImage(framework='common', version='cpu', os=None), docker_image_ref=None), owner=None, gpu=None, boot_disk=None, data_disk=None, bucket_mounts=None, subnet=None, tensorboard_ref=None, enable_monitoring=True, collaborative=False, no_public_ip=True, no_proxy_access=False, idle_shutdown_timeout=None, env_vars=None, backup=None)
  • name- [str] Custom name for this instance
  • project_id' - [str] (optional) Overrides GCP Project ID from thegcp_profile` segment
  • zone - [str] (optional) Overrides zone from the gcp_profile segment
  • region - [str] (optional) Overrides region from the gcp_profile segment
  • labels- [Dict[str, str]] (optional) Custom labels to apply to this instance
  • service_account - [str] (optional) Overrides service account from the gcp_profile segment
  • network - [str] Overrides network from the gcp_profile segment
  • tags- [Dict[str, str]] (optional) Tags to apply to this instance
  • metadata- [str] (optional) Custom metadata to apply to this instance
  • machine_type - [str] (optional) GCP Compute Engine machine type
  • environment [NotebookEnvironment] (optional) Notebook Environment defined by a docker image reference
  • owner - [str] (optional) Currently supports one owner only. If not specified, all of the service account users of your VM instance’s service account can use the instance. If specified, only the owner will be able to access the notebook.
  • gpu- [GPU] (optional) The hardware GPU accelerator used on this instance.
  • boot_disk - [Disk] (optional) Boot disk configuration to attach to this instance.
  • data_disk - [Disk] (optional) Data disk configuration to attach to this instance.
  • bucket_mounts - [List[BucketMount]] (optional) List of buckets to be accessible from the notebook
  • subnet- [str] (optional) Subnetwork of a given network
  • tensorboard_ref - [str] (optional) Reference to Vertex Experimetes
  • enable_monitoring - [bool] (optional) Reports system health and notebook metrics to Cloud Monitoring
  • collaborative - [bool] (optional) Enable JupyterLab realtime collaboration https://jupyterlab.readthedocs.io/en/stable/user/rtc.html
  • no_public_ip - [bool] (optional) Public or private (default) IP address
  • no_proxy_access - [bool] (optional) If true, the notebook instance will not register with the proxy
  • idle_shutdown_timeout - [int] (optional) Time in minutes, between 10 and 1440. After this time of inactivity, notebook will be stopped. If the parameter is not set, we don't do anything.
  • env_vars - Dict[str, str] (optional) Environment variables to be propagated to the notebook
  • backup - [str] (optional) Name of the bucket where a data backup is copied (no 'gs://' needed in the name). After creation, any changes (including deletion) made to the data disk contents will be synced to the GCS location It’s recommended that you enable object versioning for the selected location so you can restore accidentally deleted or overwritten files. To prevent sync conflicts, avoid assigning the same location to multiple instances. Works only for non-Docker notebooks!

Notebook Environments#

There are two distinct possibilities for your environment.

  • Use a custom docker image, we recommend you build on top of GCP notebook ready images, either with using one of their images as a base or by using the notebook_ready_image docker type. It is also possible to build your image from scratch, but please follow GCP's recommended principles and port settings as described here.
docker:
  images:
    - build_type: local_build_image
      name: custom-notebook-container
      context_dir: .
      dockerfile: Dockerfile.notebook
  repository: wanna-samples
  cloud_build: true

notebooks:
  - name: wanna-notebook-custom-container
    environment:
      docker_image_ref: custom-notebook-container
  • Use a virtual machine image with preconfigured python libraries or TensorFlow / PyTorch / R and more. A complete list of available images can be found here.
notebooks:
  - name: wanna-notebook-vm
    machine_type: n1-standard-4
    environment:
     vm_image:
       framework: pytorch
       version: 1-9-xla
       os: debian-10

Mounting buckets#

We can automatically mount GCS buckets with gcsfuse during the notebook startup.

Example:

    bucket_mounts:
      - bucket_name: us-burger-gcp-poc-mooncloud
        bucket_dir: data
        local_path: /home/jupyter/mounted/gcs

Tensorboard integration#

tb-gcp-uploader is needed to upload the logs to the tensorboard instance. A detailed tutorial on this tool can be found here.

If you set the tensorboard_ref in the WANNA yaml config, we will export the tensorboard resource name as AIP_TENSORBOARD_LOG_DIR.

Roles and permissions#

Permission and suggested roles (applying the principle of least privilege) required for notebook manipulation:

WANNA action Permissions Suggested Roles
create See full list roles/notebooks.runner, roles/notebooks.admin
delete see full list roles/notebooks.admin

For accessing the JupyterLab web interface, you must grant the user access to the service account used by the notebooks instance. If the instance owner is set, only this user can access the web interface.

Full list of available roles and permission.

Local development and SSH#

If you wish to develop code in your local IDE and run it on Vertex-AI notebooks, we have a solution for you. Assuming your notebook is already running, you can set up an SSH connection via:

wanna notebook ssh --background -n notebook_name

Wanna will create an SSH tunnel using GCP IAP from your local environment to your notebook.

The --background/-b flag means that the tunnel will be created in the background and you can access the notebook running in GCP at localhost:8080 (port can be customized with --port). The second possibility is to use --interactive/-i and that will start a bash inside the Compute Engine instance backing your Vertex-AI notebook.

Once you set an --background connection to the notebook, you can use your favorite IDE to develop in the notebook. Here we share instructions on how to use VSCode for this.

Connecting with VSCode#

  1. Install Jupyter Extension
  2. Create a new file with the type Jupyter notebook
  3. Select the Jupyter Server: local button in the global Status bar or run the Jupyter: Specify local or remote Jupyter server for connections command from the Command Palette (⇧⌘P).

  4. Select option Existing URL and input http://localhost:8080

  5. You should be connected. If you get an error saying something like '_xsrf' argument missing from POST., it is because the VSCode cannot start a python kernel in GCP. The current workaround is to manually start a kernel at http://localhost:8080 and then in the VSCode connect to the exiting kernel in the right upper corner.

A more detailed guide on setting a connection with VSCode to Jupyter can be found at https://code.visualstudio.com/docs/datascience/jupyter-notebooks.

Example#

notebooks:
  - name: wanna-notebook-trial
    service_account:
    owner: 
    machine_type: n1-standard-4
    labels:
      notebook_usecase: wanna-notebook-sample-simple-pip
    environment:
      vm_image:
        framework: pytorch
        version: 1-9-xla
        os: debian-10
    gpu:
      count: 1
      accelerator_type: NVIDIA_TESLA_V100
      install_gpu_driver: true
    boot_disk:
      disk_type: pd_standard
      size_gb: 100
    data_disk:
      disk_type: pd_standard
      size_gb: 100
    bucket_mounts:
      - bucket_name: us-burger-gcp-poc-mooncloud
        bucket_dir: data
        local_path: /home/jupyter/mounted/gcs
    tensorboard_ref: my-super-tensorboard