Flows in cnvrg.io are machine learning pipelines that allow you to build complex DAG pipelines and run your ML components (tasks) with just drag-n-drop. Tasks in cnvrg.io are the components in the flow.
A task in cnvrg.io represents a machine learning component. A task could be any component that you'd want and there is full flexibility to design and code it however you need.
A task holds the following information
- Command: Every task starts with a command. It could be your python script or any other executable (R/Bash/Java/etc). For example: python3 train.py
Parameters: Hyperparameters, data params, any kind of argument you'd like to pass to your command. All parameters are automatically captured for reproducible data science.
You can also use comma separated values and cnvrg.io will automatically test all combinations, similarly to Grid Search
- Environment: Any kind of docker image you want.
- Compute: Your Kubernetes cluster, AWS instances, GCP compute engine — or anything.
Tasks in a Flow
Accessing previous task's tags:
When a task starts, cnvrg will check if there is a task that ran before it. if there is, cnvrg will add the previous task's tags to the current taks's environment variables as:
CNVRG_TASK_NAME_TAG_KEY=TAG VALUE. i.e your previous task was named Validator and one of the tags was accuracy=0.6, so you could access it via:
Accessing previous task's artifacts
Same in task's tags, if there is a previous taks to the current task - the current task will clone the project, and then will download the previous task's artifacts to the exact location. if you use cnvrg default docker it will be under - /home/ds/notebooks. That way You current task will be able to read or load from the same path the artifacts and your code.
Running a task and Experiments
Once you run your task, cnvrg.io will create an Experiment. This way, you can track live your run, see how your task is performing, compare it to other experiments and visualize your models.