What is CNVRG?
cnvrg.io is a full-stack data science platform. cnvrg.io empowers data science teams with a collaborative place for their entire data science and machine learning workflows – from research, development and experimentation to the actual deployment of the model in production. cnvrg.io makes data science work reproducible, accessible and faster.
In this guide we will:
- Create a project
- Link the project to an example git repository
- Create a Dataset
- Link the project to cnvrg-cli
- Run an experiment
- Run Grid Search
- Publish a model
Let's get started!
Create a project & connect a git repository
To create a new project, go to your organization's home page and click on
Set a name, description(optional) and click on start.
Now you have a new project in your organization!
Connecting a git project is done easily via the project home page or from project settings' page.
From the project's home page:
- Click on Link to Git repo and
- In the git repository url insert: https://github.com/cnvrg/mnist_with_dataset.git
- branch - master
- Leave the SSH key field to be empty
And click on Save.
Now cnvrg mnist example repository is connected!
Connecting cnvrg projects to cnvrg-cli
if you haven't done it yet, download and login cnvrg-cli
- In your terminal, clone the git repository by running git clone https://github.com/cnvrg/mnist.git
- Enter into the folder that was created: cd mnist
- Inside the folder run: cnvrg link_git "cnvrg_project_url" , for example: cnvrg link_git https://app.cnvrg.io/MyOrg/projects/mnist
Create a Dataset
- Open the terminal and create a directory for the dataset:
- Enter to the directory and run
cnvrg data init
- Download the following file by running
- sync the dataset to cnvrg, by running:
cnvrg data sync
Running your first experiment
Inside the project directory we will run the mnist.py using the dataset we just created.
This will download the dataset and will create an experiment that will run mnist.py on a medium machine.
cnvrg run --data=mnist_dataset python3 mnist.py --dataset_path=/data/mnist_dataset/mnist.npz
To run the experiment on a GPU machine, run:
cnvrg run --gpuxl --data=mnist_dataset python3 mnist.py --dataset_path=/data/mnist_dataset/mnist.npz
Running your first Grid Search
Grid Search enables you to run multiple experiments with different parameters in a single command.
To run a grid search command, you'll need to provide a yaml that defines the parameters for the run.
Below you can see an example yaml file:
# Float parameter is a range of possible values between a minimum (inclusive)
# and maximum (not inclusive) values.
- param_name: "learning_rate"
type: "float" # precision is 9 after period
scale: "log2" # Could be log10 as well
# Discrete parameter is an array of numerical values.
- param_name: "c"
values: [0, 0.1 ,0.001]
# Categorical parameter is an array of string values
- param_name: "kernel"
values: ["linear", "rbf"]
After saving the yaml inside the project directory you can now run:
cnvrg run --grid=grid.yaml --data=mnist_dataset python3 mnist.py --dataset_path=/data/mnist_dataset/mnist.npz
And cnvrg will create and run all different combinations of the provided yaml.
You can follow the status of the experiments in the experiments tab in the project UI.
Publishing a model
With cnvrg, it's easy to publish a predictive model to a secured endpoint.
- In cnvrg project's page, go to the Publish Tab
- Click on Publish new model
- Fill in the following details: File: model.py , Function to execute: predict & machine type. Click on Publish!
- When the model is published, you can start sending it request as specified in the example on the lower part of the page.