4/22/201741k
elizabeth.golden@digitalglobe.com
Public

custom_task_tutorial


DigitalGlobe (Required Remote Kernel)
Some of the data for this notebook was provided by the DigitalGlobe remote kernel. You will need access to this remote kernel to be able to make full use of this notebook.

Intro

Users can and are encouraged to build their own custom analytic capability, what we call a 'Task', to use on GBDX. This process starts with packaging up your code, along with the libraries and dependencies needed to run it, in a Docker, and pushing it to a Docker Hub repository. You can then register a definition of your Task - which includes the Docker reference, expected inputs and outputs, etc - to GBDX. From there, it is simply a matter of asking the GBDX Workflow system to execute your Task, in combination with any other desired Tasks, typically via Postman or gbdxtools. The steps for converting your analytic capability to a Task that runs on GBDX are as follows:

  1. MODIFY the inputs and outputs of your code to align with expected Platform inputs and outputs
  2. DOCKERFILE - write a set of instructions that will build your Docker
  3. BUILD, RUN, and TEST your Docker
  4. PUSH DOCKER to Docker Hub
  5. GBDX COLLABORATORS - give GBDX access to run your Docker
  6. TASK DEFINITION - write a JSON Task definition that describes and defines your Task
  7. REGISTER TASK to the Platform using your JSON Task definition
  8. WORKFLOW - test your Task by executing within the Workflow system on the Platform

File and naming conventions within this tutorial

Before starting, it's helpful to establish file names, port names, etc that you'll use throughout the tutorial.

FILE NAMES

filename description
clip_raster_task.py example analysis code that you'll Dockerize and put on GBDX as a Task
Dockerfile instructions to build a Docker that contains the Task code and its dependencies
clip-raster-definition.json defines the Task name, inputs, outputs, etc, and will be used to register the Task to GBDX
gbdxtrainer/clip_raster_docker_gt Docker Hub username and repository name, replace with your username for 'gbdxtrainer' and your initials for 'gt'
clip_raster_gt the name of your Task when you register it on GBDX, replace 'gt' with your initials

PORT NAMES

Input and output ports are how GBDX passes data in and out of your Task when it's executed. The developer defines the port names. When you write the code that does the actual analysis (in whichever code you're writing it in), this is the Task, you'll point to these port names in the code for input and ouput data. When you execute the Task within a Workflow (using gbdxtools), you use these port names to specify the data that should be passed into and out of the Task/Docker. Here are the port names you'll use in this tutorial.

port name within the Task code what it is in use with gbdxtools
mnt/mnt/work/input/input_shapefile input port name for a shapefile clip_task = gbdx.Task('clip_raster_gt', input_shapefile='s3://.../a_shapefile, ...)'
mnt/mnt/work/input/input_raster input port name for a raster clip_task = gbdx.Task('clip_raster_gt', input_raster='s3://.../a_raster, ...)
mnt/mnt/work/output/data_out output port name for output data next_task = gbdx.Task('another_task', input_data=clip_task.outputs.data_out.value)

Directory structure

Let's also establish a directory structure that supports and simplifies the steps you'll take in this tutorial. By the end, your directory structure should look something like this:

/
└── Notebooks (or wherever you keep this IPython Notebook)
    ├── custom_task_tutorial.pynb (this notebook)
    └── task_tutorial_files/ 
        ├── clip-raster-definition.json
        └── docker_projects/
            ├── Dockerfile
            └── bin/
                └── clip_raster_task.py

1. Modify code inputs and outputs

The example Task you're going to put on GBDX during this tutorial simply takes a shapefile and raster as input, clips the raster to the shapefile, and writes out the clipped raster. We'll walk through what the code does, step by step. The important thing to note is how we define inputs and outputs in this script to work on GBDX.

First, import the required libraries. 'Fiona' and 'rasterio' for working with raster and vector data, 'os' and 'glob' for general operating system and filepath functions.

import fiona
import rasterio
import rasterio.mask
import os
import glob

Set the input port paths. When GBDX spins up a Task, it creates the directory 'mnt/work/input', fetches the required input from the S3 location specified in the call to this Task, and copies it to the 'mnt/work/input' directory. Your code will need to point to this path for its input data.

in_path = '/mnt/work/input'
shape_path = in_path + '/input_shapefile'
raster_path = in_path + '/input_raster'

Grab the shapefile and raster from the input port filepath that you just defined.

my_shape = glob.glob1(shape_path, '*.shp')[0]
my_raster = glob.glob1(raster_path, '*.tif')[0]

Define the output data port to write out the cropped tif. Similar to the input port, there is a standard filepath convention you need to follow, '/mnt/work/output'. There is only one output for this Task, which we are calling '/data_out'.

out_path = '/mnt/work/output/data_out'

While the input path is created by GBDX during Task execution, you'll need to create the output path and data port in the code. Create the output path/port and navigate to this directory.

if os.path.exists(out_path) == False:
  os.makedirs(out_path)
os.chdir(out_path)

Open the input shapefile and get the polygon features for clipping.

with fiona.open(os.path.join(shape_path, my_shape), "r") as shapefile:
  features = [feature["geometry"] for feature in shapefile]

Open the input raster, clip the raster with the shapefile and get the raster metadata.

with rasterio.open(os.path.join(raster_path, my_raster)) as src:
  out_raster, out_transform = rasterio.mask.mask(src, features, crop=True)
  out_meta = src.meta.copy()

Write out the metadata to the cropped raster.

out_meta.update({"driver": "GTiff",
  "height": out_raster.shape[1],
  "width": out_raster.shape[2],
  "transform": out_transform})

Write out the output image

with rasterio.open("masked.tif", "w", **out_meta) as dest:
  dest.write(out_raster)

Optionally, write out a status file at code completion to give the user more feedback.

status = {}
status['status'] = 'Success'
status['reason'] = "===== Task successfully completed ======"

with open('/mnt/work/status.json', 'w') as statusfile:
        json.dump(status,statusfile)

Now that we've walked through what the example Task does and how to define its inputs and outputs in a way that GBDX recognizes, write this script to a working directory that we're going to call /task_tutorial_files/docker_projects/bin.

1.1 Run the code in the following cell to check what directory you're in. You should be in a directory that contains this Jupyter Notebook (custom_task_tutorial.pynb).

ls

1.2 Run the code in the following cell to create the 'task_tutorial_files/docker_projects/bin' directory.

import os
if os.path.exists('task_tutorial_files') == False:
  os.makedirs('task_tutorial_files/docker_projects/bin')

1.3. Run the code in the following cell to navigate to the directory you just created.

cd task_tutorial_files/docker_projects/bin

1.4. Run the code in the following cell to write the code that we just reviewed to 'clip_raster_task.py'.

%%writefile clip_raster_task.py

import fiona
import rasterio
import rasterio.mask
import os
import glob

# set the input ports path
in_path = '/mnt/work/input'
shape_path = in_path + '/input_shapefile'
raster_path = in_path + '/input_raster'

# search the input shapefile port for the first shapefile that we specify in the call to this task
my_shape = glob.glob1(shape_path, '*.shp')[0]

# search the input image port for the first geotiff that we specify in the call to this task
my_raster = glob.glob1(raster_path, '*.tif')[0]

# define the name of the output data port
out_path = '/mnt/work/output/data_out'

# create the output data port
if os.path.exists(out_path) == False:
  os.makedirs(out_path)

# change directories to the output data port
os.chdir(out_path)

# open the input shapefile and get the polygon features for clipping
with fiona.open(os.path.join(shape_path, my_shape), "r") as shapefile:
  features = [feature["geometry"] for feature in shapefile]

# open the input image, clip the image with the shapefile and get the image metadata
with rasterio.open(os.path.join(raster_path, my_raster)) as src:
  out_raster, out_transform = rasterio.mask.mask(src, features, crop=True)
  out_meta = src.meta.copy()

# write out the metadata to the raster
out_meta.update({"driver": "GTiff",
  "height": out_raster.shape[1],
  "width": out_raster.shape[2],
  "transform": out_transform})

# write out the output raster
with rasterio.open("masked.tif", "w", **out_meta) as dest:
  dest.write(out_raster)

1.5 Run the code in the following cell to check that the file clip_raster_task.py exists.

ls

1.6 Run the code in the following cell to check check the contents of 'clip_raster_task.py'.

load clip_raster_task.py

2. Dockerfile

A Dockerfile is a set of instructions to package up your Task code, along with the libraries and dependencies needed to run that code, into a lightweight, portable Docker.

Before writing the Dockerfile, let's walk through what it's building.

The first line of code in a Dockerfile will typically pull an image from Docker Hub that builds an operating system, and this will serve as a foundation for the rest of the build. There are also several base Docker images provided by the Docker community that are already pre-configured for certain programming applications. The first line of code in our Dockerfile pulls the base image continuumio/miniconda - a lightweight Docker image configured for Python development.

FROM continuumio/miniconda

The following two lines of code install the geoprocessing libraries we need to run the Task code.

RUN conda install rasterio
RUN conda install fiona

Create a directory inside the Docker called /my_scripts.

RUN mkdir /my_scripts

Copy the contents of the local directory ./bin into the Docker directory /my_scripts. (Remember, /bin is where you just wrote 'clip_raster_task.py')

ADD ./bin /my_scripts

Finally, add a command that executes 'clip_raster_task.py', within the Docker during runtime.

CMD python /my_scripts/clip_raster_task.py

You might be wondering, why are we adding the Task code to the Docker, but not any data? Remember, we built the input ports with the path mnt/work/input. Later on when you use your Task within a Workflow, you will specify the S3 location of the data you want to analyze, and GBDX will go fetch the data and plug it into the Docker via the input ports.

Now that we've covered what a Dockerfile does, let's go ahead and write the Dockerfile. As a best practice, we like to keep the Dockerfile separate from the Task code.

2.1 Run the code in the following cell to navigate back one folder to /docker_projects.

cd ..

2.2 Run the code in the following cell to write the Docker instructions we just reviewed to 'Dockerfile' (no extension).

%%writefile Dockerfile
FROM continuumio/miniconda

RUN conda install rasterio
RUN conda install fiona

RUN mkdir /my_scripts
ADD ./bin /my_scripts
CMD python /my_scripts/clip_raster_task.py

2.3 Run the code in the following cell to check that the file wrote as expected.

ls

2.4 If you would like to check that the code was written as expected, run the code in the following cell to look at the contents of 'Dockerfile'.

load Dockerfile

3. Build, run, and test your Docker

NOTE: AT THIS POINT IN THE TUTORIAL, WE'RE GOING TO LEAVE THE JUPYTER NOTEBOOK AND SWITCH TO DOCKER

You've written the Dockerfile that contains instructions to build a Docker, the next step is to actually build it. Docker needs to be installed on your computer to complete this section.

3.1 Bring up a terminal (Mac) or cmd (Windows) window on your computer, copy and paste the following line of code to see if Docker is running.

docker --version

You should receive an output similar to this:

Docker version 1.13.0, build 49bf474

3.2 Within the terminal/cmd, navigate to the folder containing the Dockerfile you wrote in the previous section. It should be located somewhere such as 'user/your name/Notebooks/task_tutorial_files/docker_projects'.

(*tip - you can type 'pwd' on a Mac ('cd' on Windows) to see your current directory, then use 'cd ..' to navigate one folder back and 'cd <directory>' to navigate into that directory)

cd <full/path/to>/task_tutorial_files/docker_projects

3.3 Copy and paste the following Docker command to build a Docker from your Dockerfile, but FIRST REPLACE 'gbdxtrainer' WITH YOUR DOCKER USERNAME AND 'gt' WITH YOUR INITIALS.

(*The -t option allows you to name the Docker for easy reference. The . at the end of the command is so that it looks for your Dockerfile in the current working directory.)

docker build -t gbdxtrainer/clip_raster_docker_gt .

3.4 Copy and paste the following Docker command to list the Docker image you just built.

docker images

The output should look something like this, but with your Docker username and initials:

REPOSITORY                     TAG                 IMAGE ID            CREATED             SIZE
gbdxtrainer/clip_raster_gt     latest              dfc953879205        2 minutes ago        1.58 GB

Now that you've built a Docker image, you can run a Docker container (a runtime instance of the image) and poke around inside of it.

3.5 Copy and paste the following Docker command, BUT WITH YOUR USERNAME AND INITIALS.

(*The -it option allows you to run the container in interactive mode with the bash prompt running, and the --rm option removes the container once you're done poking around so that it's not taking up disk space.)

docker run -it --rm gbdxtrainer/clip_raster_docker_gt bash

It will be obvious if you're inside the container because you're terminal/cmd prompt will look something like this:

root@b1b71e42372d:/#

3.6 You are now at the root directory of your Docker container. Copy and paste the following command to list the directories within your Docker container.

ls

You should see something that looks like this:

bin  boot  dev    etc  home  lib    lib64  media  mnt  my_scripts  opt  proc  root    run  sbin  srv    ...

3.7 Note, there is a 'my_scripts' directory. Copy and paste the following command to navigate into the 'my_scripts' directory.

cd my_scripts

3.8 You are now inside the directory that you specified in the Dockerfile. Copy and paste the following command to see the 'clip_raster_task.py' file that you placed there when you wrote the Dockerfile.

ls

If you ran this script now, it would fail because there is no input data. When GBDX executes your Task, it will copy input data from S3 and plug it into the appropriate input ports. In a future tutorial, we'll cover how to test the script with input data mounted locally.

3.9 Quit the container using the following command.

exit

4. Push your Docker to Docker Hub

At this point, the Docker image you just created only exists on your machine. For GBDX to access it, the Docker image needs to be available on Docker Hub.

4.1 While still within the terminal/cmd, log in to Docker Hub using the following Docker command USING YOUR DOCKER HUB LOGIN CREDENTIALS.

docker login --username gbdxtrainer --password a_fake_password

4.2 Once logged in, use the following Docker command to push your Docker image to Docker Hub, CHANGE TO YOUR DOCKER USERNAME AND INITIALS. Note: this might a few minutes.

docker push gbdxtrainer/clip_raster_docker_gt

5. Add GBDX collaborators to your Docker Hub repository

Your Docker repository on Docker Hub can be public or private, but certain GBDX collaborators must be added to the repository in order for the Platform to pull and run the Docker.

5.1 Log in to Docker Hub https://hub.docker.com/

You should now see the Docker image that you just pushed to Docker Hub, in it's own repository of the same name.

5.2 Open the repository and select the 'Collaborators' tab. Under 'Username', enter each of the following as Collaborators to your repository. This is what will allow GBDX to pull and execute your Task.

tdgpbuild
tdgpdeploy
tdgplatform

6. Task definitition

NOTE: WE'RE BACK TO THE JUPYTER NOTEBOOK FOR THE REST OF THE TUTORIAL

Tasks must be registered with the Task Registry before they can be used in a Workflow. In this next step, you'll write a JSON document that describes and defines your Task according to a standard schema, and later use this JSON document to submit the Task to the Task Registry. Let's first walk through the components of a Task definition schema.

Define the input ports with the name you gave them in the Task code, which were named input_raster and input_shapefile. Indicate if the input port must be specified for the Task to run, include a human readable description of the port, and specify the input port type. (Don't worry about this too much now, but the two input port types are 'string' and 'directory'. String ports are typically used to pass in parameters, directory ports are for file-based data.) Specify the raster and shapefile input ports as type 'directory'.

{
    "inputPortDescriptors": [{
        "required": true,
        "description": "Directory containing a raster.",
        "name": "input_raster",
        "type": "directory"
    }, {
        "required": true,
        "description": "Directory containing a shapefile",
        "name": "input_shapefile",
        "type": "directory"
    }],

Define the output port similarly, indicating if it's required, a description, the name you gave it data_out, and port type.

    "outputPortDescriptors": [{
        "required": true,
        "description": "A cropped tif.",
        "name": "data_out",
        "type": "directory"
    }],

Then tell GBDX where to find your Dockerized code on Docker Hub. Specify the type of container, DOCKER, and the full name of the Docker image, which YOU NEED TO CHANGE TO YOUR DOCKER HUB USERNAME AND REPOSITORY NAME. Include ':latest' to pull the latest version of your Docker image.

    "containerDescriptors": [{
        "type": "DOCKER",
        "command": "",
        "properties": {
            "image": "gbdxtrainer/clip_raster_docker_gt:latest"
        }
    }],

Finally, include a description of the Task itself, the Task name as it will appear in the Task registry, a version number, indicate if it will be a public or private Task, and the amount of time the Task will be allowed to run before failing. (Note- if you ever want to re-register a Task, you'll need to increment this version number.)

    "description": "Clips a raster to shapefile.",
    "name": "clip_raster_gt",
    "version": "0.0.1",
    "properties": {
        "isPublic": false,
        "timeout": 36000
    }
}

6.1 Now we can get around to actually writing and saving the Task definition. Run the code in the following cell to navigate back one directory (back out of the '/docker_projects' directory to the '/task_tutorial_files' directory).

cd ..

6.2 MODIFY THE TASK NAME WITH YOUR INITIALS, then run the code in the following cell to write the full JSON document that we just reviewed to clip-raster-definition.json.

%%writefile clip-raster-definition.json
{
    "inputPortDescriptors": [{
        "required": true,
        "description": "Directory containing a raster.",
        "name": "input_raster",
        "type": "directory"
    }, {
        "required": true,
        "description": "Directory containing a shapefile",
        "name": "input_shapefile",
        "type": "directory"
    }],
    "outputPortDescriptors": [{
        "required": true,
        "description": "A cropped tif.",
        "name": "data_out",
        "type": "directory"
    }],
    "containerDescriptors": [{
        "type": "DOCKER",
        "command": "",
        "properties": {
            "image": "gbdxtrainer/clip_raster_docker_gt:latest"
        }
    }],
    "description": "Clips a raster to shapefile.",
    "name": "clip_raster_gt",
    "version": "0.0.1",
    "properties": {
        "isPublic": false,
        "timeout": 36000
    }
}

6.3 Run the following cell to check that the file wrote as expected.

ls

6.4 If you would like to check that the code was written as expected, run the following cell to look at the contents of clip-raster-definition.json.

load clip-raster-definition.json

7. Register Task

All of the pieces are in place to register your Task to the Platform. To review, we

  • wrote a Task (some piece of analysis) using input and output ports
  • wrote a Dockerfile with instructions to build a Docker that contains the Task code and its dependencies
  • built a Docker from the Dockerfile
  • pushed the Docker to Docker Hub
  • Added GBDX collaborators to the Task's Docker Hub repository
  • wrote a Task definition

The final step for putting a custom Task on GBDX is to submit the Task to the Task registry. We can use the gbdxtools task_registry API to do that for us.

7.1 Run the following cell to initiate a gbdxtools session.

from gbdxtools import Interface
gbdx = Interface()

We point the following gbdxtools call to our Task definition, clip-raster-definition.json.

7.2 Run the code in the following cell to submit your Task to the Task registry.

gbdx.task_registry.register(json_filename = 'clip-raster-definition.json')

It might take a few minutes for your Task to show up in the registry. Once you've checked that your Task is successfully registered, you can use the gbdxtools Task API to interact with your newly created Task.

7.3 Wait a few minutes, then run the code in the following cell to create an instance of your Task. FIRST REPLACE 'gt' IN THE TASK NAME WITH YOUR INITIALS.

clip_task = gbdx.Task("clip_raster_gt")

7.4 Run the code in the following cell to interact with the Task object.

clip_task.definition

You should see the definition you entered when you registered the task.

7.5 Run the code in the following cell to see input that you specified within the Task definition and the Task code itself.

clip_task.inputs

7.6 Run the code in the following cell to see the output ports you specified.

clip_task.outputs

You can drill down further into the inputs and outputs of your Task.

7.7 Run the code in the following cell to see the input_shapefile port description, which should match what you entered in the Task definition.

clip_task.inputs.input_shapefile

7.8 Run the code in the following cell to see the input_raster port description.

clip_task.inputs.input_raster

8. Workflow

Now that you've registered a Task to GBDX, the final step of this tutorial is to use your Task in a Workflow. Here's a potential Workflow that uses the Task you just created. You're going to use a DigitalGlobe image as the raster input, and a shapefile that we've created that aligns with that image, and that we've placed in a publicly accessible S3 bucket.

First initiate a gbdxtools session.

from gbdxtools import Interface
gbdx = Interface()

Define the S3 path for an image by passing in its Catalog ID to the following method.

source_s3 = gbdx.catalog.get_data_location(catalog_id='10400100245B7800')

Next, define an input shapefile. Inputs and outputs to the Workflow have to come from somewhere on S3, so we've placed a shapefile that will clip the image in a publicly accessible S3 bucket.

shape_path = 's3://tutorial-files/this_shp_will_clip_10400100245B7800/'

So far in this script, you've signed into GBDX and defined the image and shapefile inputs. You can now start setting up the Tasks that you'll execute within a Workflow. Before you use the 'clip_raster' Task, you'll want to first pre-process the image. You can use the Advanced Image Processor Task for this, which orthorectifies raw imagery and offers other image pre-processing options. Documentation at https://gbdxdocs.digitalglobe.com/docs/advanced-image-preprocessor.

Create this Task, using its registered Task name, 'AOP_Strip_Processor', and the image you defined earlier as input to its input port, 'data'.

aop_task = gbdx.Task('AOP_Strip_Processor', data=raster_path)

The next step would be to set up your 'clip_raster' Task, using the output from 'aop_task' as its raster input. However, 'aop_task' outputs several files in addition to the processed image, but your Task takes just one TIF as input. How you can address this obstacle is to set up the 'gdal-cli' Task to extract just the TIF from the 'aop_task' output.

Define this Task, which we call the 'glue_task', using it's registered Task name 'gdal_cli'. Set the aop_task output via its output port, 'data', to the glue_task input port, also called 'data'.

glue_task = gbdx.Task('gdal-cli', data=aop_task.outputs.data.value, execution_strategy='runonce',
                         command="""mv $indir/*/*.tif $outdir/""")

You can now set up your custom 'clip_raster' Task. Specify the Task with the name you used to register the Task (this will be with your initials, not 'gt'). Set the glue_task output via its output port, 'data', to the clip_task import port, which we named 'input_raster'.

clip_task = gbdx.Task("clip_raster_gt", input_raster=glue_task.outputs.data.value, input_shapefile=shape_path)

Now build a Workflow using the Workflow call and a list of the Tasks you defined above.

workflow = gbdx.Workflow([ aop_task, glue_task, clip_task ])

The Workflow is ready to go, but before you execute it, you'll need to specify where GBDX should save the output data. Gbdxtools has a feature that will automatically save output generated from your GBDX credentials to your GBDX customer S3 bucket.

Specify that you want to save the output from the 'clip_raster' output port, which we named 'data_out', and the directory you want to save it to within your customer S3 bucket.

workflow.savedata(clip_task.outputs.data_out, location='task_demo/aop_clip_raster')

Execute the Workflow. This will kick off the series of Tasks that will pre-process the input image, select just the image from the pre-processing output, clip that image to a shapefile, and save the output to your customer bucket.

Also, it's a good idea to hold on to the Workflow ID. This will allow you to track the status of the Workflow, which could take several minutes to several hours depending on the kind of processing and size of the image strip. The Workflow ID will also come in handy if you need to debug a Task or Workflow later.

workflow.execute()
print workflow.id

8.1 You are now ready to test your custom Task in a Workflow. Run the following line of code, which will execute the steps we outlined above.

# initiate the Interface object that handles GBDX authentification
from gbdxtools import Interface
gbdx = Interface()

# define the S3 path for an image by passing in its Catalog ID 
source_s3 = gbdx.catalog.get_data_location(catalog_id='10400100245B7800')

# define an input shapefile from S3
shape_path = 's3://tutorial-files/this_shp_will_clip_10400100245B7800/'

# define the 'AOP_Strip_Processor' 
aop_task = gbdx.Task('AOP_Strip_Processor', data=image_path, enable_pansharpen=True)

# define the 'gdal_cli' Task
glue_task = gbdx.Task('gdal-cli', data=aop_task.outputs.data.value, execution_strategy='runonce',
                         command="""mv $indir/*/*.tif $outdir/""")

# define the 'clip_raster' Task 
clip_task = gbdx.Task("clip_raster_gt", input_raster=glue_task.outputs.data.value, input_shapefile=shape_path)

# build a Workflow to run the 'clip_raster' Task
workflow= gbdx.Workflow([aop_task, glue_task, clip_task])

# specify where to save the output within your customer bucket
workflow.savedata(clip_task.outputs.data_out, location='task_demo/clip_raster')

# kick off the Workflow and keep track of the Workflow ID
workflow.execute()
print workflow.id

GBDX is now running your Workflow. While the Workflow is running, you can interact with the Workflow object and track its status.

8.2 Run the code in the following cell to get the status of the Workflow. This call will return the the status of whatever event is currently underway.

workflow.status

You can also look at a list that contains each Task 'event' and its 'state'. Note there is also a 'task_id' associated with each Task. The Task ID is helful if you want to debug a particular Task in the Workflow.

8.3 Run the code in the following cell to get a list of the Task events that have occurred so far in this Workflow.

workflow.events

You can also return just the information on whether the Workflow completed and if it succeeded.

8.4 Run the code in the following two cells to see if your Workflow completed and if it succeeded.

workflow.complete
workflow.succeeded

Once the Workflow has finished, you can run the Run the following two cell blocks to get the stdout and stderr from your Workflow.

8.5 Wait until the Workflow has completed, then run the code in the following two cells to get the stdout and stderr from your Workflow.

workflow.stdout
workflow.stderr

You can also get the stderr and stdout for a particular task, given its Task ID.

8.6 Run the code in the following cell to get a lisk of Task IDs from your Workflow.

task_ids = workflow.task_ids

8.7 Then run the code in the following two cells to get the stdout and stderr of a particular Task, using the Workflow ID and a Task ID.

gbdx.workflow.get_stdout(workflow.id, workflow.task_ids[0])
gbdx.workflow.get_stderr(workflow.id, workflow.task_ids[0])

You can always bring up information about a Workflow after the fact by using its Workflow ID in the following call.

8.8 Run the code in the following cell to retrieve information about your Workflow, REPLACE THE WORKFLOW ID WITH YOUR WORKFLOW ID.

gbdx.workflow.get('4574506640983582982')

NOTE: AT THIS POINT IN THE TUTORIAL, WE'RE GOING TO LEAVE THE JUPYTER NOTEBOOK AND SWITCH TO THE S3 BROWSER

Once your Workflow has completed (and succeeded!), you will be able to see the output in your customer S3 bucket.

8.9 Log into the S3 browser http://s3browser.geobigdata.io using your GBDX credentials.

8.10 Navigate to 'task_demo/clip_raster' to see the saved output of your Workflow.

Congratulations on completing this tutorial and for successfully putting a custom Task on GBDX! We challenge you to create a custom Task with your own analysis code, using this tutorial as a guide.