WANNA Notebook#
We offer a simple way of managing Jupyter Notebooks on GCP, with multiple ways to set your environment, mount a GCS bucket, and more.
wanna.core.models.notebook.NotebookModel
(*, name, project_id, zone, region=None, labels=None, description=None, service_account=None, network=None, bucket=None, tags=None, metadata=None, machine_type='n1-standard-4', environment=NotebookEnvironment(vm_image=VMImage(framework='common', version='cpu', os=None), docker_image_ref=None), owner=None, gpu=None, boot_disk=None, data_disk=None, bucket_mounts=None, subnet=None, tensorboard_ref=None, enable_monitoring=True, collaborative=False, no_public_ip=True, no_proxy_access=False, idle_shutdown_timeout=None, env_vars=None, backup=None)name
- [str] Custom name for this instanceproject_id' - [str] (optional) Overrides GCP Project ID from the
gcp_profile` segmentzone
- [str] (optional) Overrides zone from thegcp_profile
segmentregion
- [str] (optional) Overrides region from thegcp_profile
segmentlabels
- [Dict[str, str]] (optional) Custom labels to apply to this instanceservice_account
- [str] (optional) Overrides service account from thegcp_profile
segmentnetwork
- [str] Overrides network from thegcp_profile
segmenttags
- [Dict[str, str]] (optional) Tags to apply to this instancemetadata
- [str] (optional) Custom metadata to apply to this instancemachine_type
- [str] (optional) GCP Compute Engine machine typeenvironment
[NotebookEnvironment] (optional) Notebook Environment defined by a docker image referenceowner
- [str] (optional) Currently supports one owner only. If not specified, all of the service account users of your VM instance’s service account can use the instance. If specified, only the owner will be able to access the notebook.gpu
- [GPU] (optional) The hardware GPU accelerator used on this instance.boot_disk
- [Disk] (optional) Boot disk configuration to attach to this instance.data_disk
- [Disk] (optional) Data disk configuration to attach to this instance.bucket_mounts
- [List[BucketMount]] (optional) List of buckets to be accessible from the notebooksubnet
- [str] (optional) Subnetwork of a given networktensorboard_ref
- [str] (optional) Reference to Vertex Experimetesenable_monitoring
- [bool] (optional) Reports system health and notebook metrics to Cloud Monitoringcollaborative
- [bool] (optional) Enable JupyterLab realtime collaboration https://jupyterlab.readthedocs.io/en/stable/user/rtc.htmlno_public_ip
- [bool] (optional) Public or private (default) IP addressno_proxy_access
- [bool] (optional) If true, the notebook instance will not register with the proxyidle_shutdown_timeout
- [int] (optional) Time in minutes, between 10 and 1440. After this time of inactivity, notebook will be stopped. If the parameter is not set, we don't do anything.env_vars
- Dict[str, str] (optional) Environment variables to be propagated to the notebookbackup
- [str] (optional) Name of the bucket where a data backup is copied (no 'gs://' needed in the name). After creation, any changes (including deletion) made to the data disk contents will be synced to the GCS location It’s recommended that you enable object versioning for the selected location so you can restore accidentally deleted or overwritten files. To prevent sync conflicts, avoid assigning the same location to multiple instances. Works only for non-Docker notebooks!
Notebook Environments#
There are two distinct possibilities for your environment.
- Use a custom docker image, we recommend you build on top of GCP notebook ready images, either with
using one of their images as a base or by using the
notebook_ready_image
docker type. It is also possible to build your image from scratch, but please follow GCP's recommended principles and port settings as described here.
docker:
images:
- build_type: local_build_image
name: custom-notebook-container
context_dir: .
dockerfile: Dockerfile.notebook
repository: wanna-samples
cloud_build: true
notebooks:
- name: wanna-notebook-custom-container
environment:
docker_image_ref: custom-notebook-container
- Use a virtual machine image with preconfigured python libraries or TensorFlow / PyTorch / R and more. A complete list of available images can be found here.
notebooks:
- name: wanna-notebook-vm
machine_type: n1-standard-4
environment:
vm_image:
framework: pytorch
version: 1-9-xla
os: debian-10
Mounting buckets#
We can automatically mount GCS buckets with gcsfuse
during the notebook startup.
Example:
bucket_mounts:
- bucket_name: us-burger-gcp-poc-mooncloud
bucket_dir: data
local_path: /home/jupyter/mounted/gcs
Tensorboard integration#
tb-gcp-uploader
is needed to upload the logs to the tensorboard instance. A detailed
tutorial on this tool can be found here.
If you set the tensorboard_ref
in the WANNA yaml config, we will export the tensorboard resource name
as AIP_TENSORBOARD_LOG_DIR
.
Roles and permissions#
Permission and suggested roles (applying the principle of least privilege) required for notebook manipulation:
WANNA action | Permissions | Suggested Roles |
---|---|---|
create | See full list | roles/notebooks.runner , roles/notebooks.admin |
delete | see full list | roles/notebooks.admin |
For accessing the JupyterLab web interface, you must grant the user access to the service account used by the notebooks instance. If the instance owner is set, only this user can access the web interface.
Full list of available roles and permission.
Local development and SSH#
If you wish to develop code in your local IDE and run it on Vertex-AI notebooks, we have a solution for you. Assuming your notebook is already running, you can set up an SSH connection via:
wanna notebook ssh --background -n notebook_name
Wanna will create an SSH tunnel using GCP IAP from your local environment to your notebook.
The --background/-b
flag means that the tunnel will be created in the background and you can
access the notebook running in GCP at localhost:8080
(port can be customized with --port
).
The second possibility is to use --interactive/-i
and that will start a bash inside the Compute Engine
instance backing your Vertex-AI notebook.
Once you set an --background
connection to the notebook, you can use your favorite IDE to develop
in the notebook. Here we share instructions on how to use VSCode for this.
Connecting with VSCode#
- Install Jupyter Extension
- Create a new file with the type Jupyter notebook
-
Select the Jupyter Server: local button in the global Status bar or run the Jupyter: Specify local or remote Jupyter server for connections command from the Command Palette (⇧⌘P).
-
Select option
Existing URL
and inputhttp://localhost:8080
- You should be connected. If you get an error saying something like
'_xsrf' argument missing from POST.
, it is because the VSCode cannot start a python kernel in GCP. The current workaround is to manually start a kernel athttp://localhost:8080
and then in the VSCode connect to the exiting kernel in the right upper corner.
A more detailed guide on setting a connection with VSCode to Jupyter can be found at https://code.visualstudio.com/docs/datascience/jupyter-notebooks.
Example#
notebooks:
- name: wanna-notebook-trial
service_account:
owner:
machine_type: n1-standard-4
labels:
notebook_usecase: wanna-notebook-sample-simple-pip
environment:
vm_image:
framework: pytorch
version: 1-9-xla
os: debian-10
gpu:
count: 1
accelerator_type: NVIDIA_TESLA_V100
install_gpu_driver: true
boot_disk:
disk_type: pd_standard
size_gb: 100
data_disk:
disk_type: pd_standard
size_gb: 100
bucket_mounts:
- bucket_name: us-burger-gcp-poc-mooncloud
bucket_dir: data
local_path: /home/jupyter/mounted/gcs
tensorboard_ref: my-super-tensorboard