Dask Tips and Tricks – HighLevelGraphs
Oct 24, 2019Dask Syntax
Normally when using dask you wrapped dask.delayed around a function call, then when all those are queued up tell dask to compute your results. This is great, and I really like this syntax, but what about when you are fed a list of tasks and need to somehow feed these to Dask?
That is where a HighLevelGraph comes in!
Dask HighLevelGraphs
Dask HighLevelGraphs allow you to define a data structure that is essential a series of jobs. Each one of those jobs has one or more tasks. You can also think of your jobs as being a bucket for your tasks. Each task in a job can be executed in parallel, meaning tasks within a job must not be dependent upon one another!
Then you define your job dependencies and COMPUTE!
Here we are task-2 depends upon task-1.
We can also have Dask draw this out for us using graphviz (more on that below).
Let's see some Code!
Now that we've laid down the foundations let's execute some code!
We have our layers, or our jobs plus tasks, and the dependencies. Once we have that the world is ours!
Using graph.visualize() gets you the image I showed earlier, that maps out all of your dependencies.
From there, call client.get(graph, 'name-of-task'), and it will get you the result from that task!