What is CNVRG?

cnvrg.io is a full-stack data science platform. cnvrg.io empowers data science teams with a collaborative place for their entire data science and machine learning workflows – from research, development and experimentation to the actual deployment of the model in production. cnvrg.io makes data science work reproducible, accessible and faster. 

About this guide

The following guide was written to help you get to know the cnvrg.io platform and to show you how cnvrg.io help you build and organise machine learning projects. With a real use-case, we will briefly go through the platform: from data loading and tagging to researching and experimentation to deploying models as REST APIs.

Got any further questions? Feedback? Comments?
Reach out at hi@cnvrg.io

Registration and Installation

cnvrg.io is composed of two main modules, first one is a powerful web interface where you can track, manage and build machine learning projects and datasets. 

The second module of cnvrg.io is its powerful CLI. cnvrg.io CLI helps you to sync your local dev environment to cnvrg.io cloud and enjoy powerful features. 

To install the cnvrg.io CLI, please use the following guide: http://help.cnvrg.io/cnvrg-cli/cli-download-install

Getting started with cnvrg.io with MNIST as an example

If you have been doing deep-learning in the past few years, you’re probably familiar with “mnist” as it’s probably the “Hello World” project of deep learning. For those who aren’t familiar with the name; mnist is a dataset of handwritten digits. 

Mnist is one of the first use-cases where deep learning outperformed classic computer vision / machine learning efforts.

In the following guide we’ll show how CNVRG can be used to solve mnist digits classification problem – from uploading the dataset, experimenting to deploying models

Creating your first dataset

cnvrg.io datasets capabilities enable you to upload any kind of dataset (csv, images and more), tag it, and query it. cnvrg.io will automatically enable version control on your dataset so you can easily reproduce workflows, switch versions and more.

Interacting with datasets can be done through both the CLI and the web-interface. In the following example, we’ll demonstrate how to use the CLI to create a dataset, tag it and upload it to cnvrg.io storage. 

To create a dataset, simply go to your dataset directory on your local development machine and run:

$ cd mnist_data
$ cnvrg data init

Great! You have created your first dataset! 

Your new dataset is named after the directory it was in (in this case: mnist_data) and the CLI provided you a link so you can track and view its content.

But wait. There’s no data in the web? That’s right, now that we’ve created our first dataset record in cnvrg.io – let’s upload the data. To do so, please run the upload command. This will upload everything that’s in the local dataset directory to cnvrg.io cloud.

$ cnvrg data upload

Great! Now you have files in your dataset! 

Projects

The projects tab is the dedicated place to centralize all research & development associated with a specific task. Continuing our mnist challenge, we can create a new project (through web or CLI) and have all the research, development, experiments and models associated with digit-classification in a single place.  Additionally, we can add collaborators, gather ideas and share results.

Create a new project

To create our new mnist project, we can use the web and simply click the create project button. Or – we can use the CLI. To create a project from scratch (i.e no code/research written yet) – type:

$ cnvrg new mnist

The CLI will create a new directory named “mnist” with a scaffolding for my project. Also, a project will be created in cnvrg.io databases – and I can access it through the provided URL by the CLI.

Linking an existing project

In case there already code written, and research made, simply go to your local project directory and type 

$ cnvrg link

cnvrg.io will use the current directory, create a project in the database (named after the directory) and sync everything that’s required.

Syncing Project

For every project in cnvrg.io, we apply automatic version control to make sure everything is up to date and reproducible (!). You can think of it like the power of git, but with the simplicity of dropbox – every change you’re making in the directory is captured before and after activities. Also, you can update changes on your own using the sync command:

$ cnvrg sync

The command will initiate download - if there is any new changes in the remote repository, followed by upload of new changes made in your local environment. You can also add the “-f” flag to the command in case you want to force changes (no version comparison and no download).

Notebooks

Many data scientists love using open-source and interactive environment tools like Jupyter or RStudio to explore data and experiment with ideas. We have integrated support for these tools and have enabled 1-click setup for Jupyter sessions on remote machines.

Your dependencies, dataset and project files will be pre-installed and pre-configured!

Additionally, in your project’s files tab, you can see the fully rendered Jupyter Notebook, check previous versions, share and collaborate.

Experiments

Experiments are the core of every machine learning project. When building a model, it’s all about trying new ideas, testing new hypotheses, testing hyperparameters, and exploring different neural-network architecture.

At cnvrg.io, we help you to experiment 10x faster and get everything 100% reproducible and trackable, so you can focus on the important stuff.

An experiment is basically a “run” of a script, locally or remotely. Usually, to run an experiment on a remote GPU – you’ll have to handle a lot of things before getting the actual results – and that includes: getting data, code, dependencies on the machine, SSH back-and-forth to see what’s new. cnvrg.io completely automates that – and allow you to run an experiment in a single click.

Running an Experiment

Let’s go back to our mnist project. I have written a simple Python and Keras script. It is synced in my project directory and now I want to run it on a remote GPU.

You can choose to run it on either gpu or gpuxl preconfigured machines by adding the prefix cnvrg run followed by the machine name: 

$ cnvrg run --gpu python mnist.py	

cnvrg.io will automatically take a snapshot of your current directory, get the remote GPU machine up and running with everything pre-installed and run your command. That simple. No need to configure machines anymore.

Once running the command, you will receive a URL, where you can see exactly what’s going on with your experiment, track metrics (like loss/acc) and even use cnvrg.io’s built in TensorBoard to analyze your work.

You may also use additional parameters in the run command like dataset, different compute type (gpu, small, medium, or even running locally is supported!), syncing and more.

Running Experiments/Notebooks with Custom Libraries

Occasionally, you will need a specific TensorFlow version, or a new PyPi (PIP) dependency. Instead of rebuilding a docker image, all you need to do is create a requirements.txt file in your project directory and list all your unique dependencies. cnvrg.io will automatically install its content before every experiment/notebook/deployment.

Running Experiments via Web

Experiments can be run via web. This is especially useful when you want to run an experiment fast without syncing your code. To run an experiment via the web interface, simply go to your project’s experiments tab – and click “New Experiment”, type the command you want to run, select compute and hit the submit button. cnvrg.io will use the latest code version (Commit) and will run the command.

Experiments Dashboard & Tracking

Every experiment gets a special report on a dedicated page. Easily find Information like CPU/GPU usage, duration, start and end commits (code versions) and also hyperparemters and metrics collected by cnvrg Research Assistant (details to follow).

cnvrg Research Assistant

The research assistant will automatically go over your experiment’s Standard Output and extract valuable information. Keras and other popular frameworks have automatic support for charts. It’s customizable so you can add your own tags and charts using STDout rules.

Tags

cnvrg_tag_lr: “0.1” 

This will create a tag with key “lr” and value “0.1"

Charts

cnvrg_linechart_loss: key: “Epoch 1”, value: “0.1” #(key is optional)		

Experiments’ Table and Comparing Experiments

In the experiments main page (the table view), you may see all experiments that have been created during this project. You may browse, search and filter to find your experiment. You also have full-control on the table columns and you may customize it to your requirements. 

Additionally, you may select several experiments and run deeper, side-by-side experiments comparison.

Deploying models

Ok, so we’ve ran several experiments, optimized our model, reached our benchmarks and we’re good to go. Let’s deploy the model as a REST API. There’s no need to contact IT / DevOps – all you need to do is to go to the Publish tab in your project, select the file, type in the function and cnvrg.io will take care of the rest (pun intended ☺).

What exactly is happening when clicking the Publish button? cnvrg.io takes your project (dependencies, code) and wraps it with a thin and scalable REST API. It will import your specified function from the specified file, expose a URL and monitor its activity. Now, the secured URL can used by your customer and can be embedded in an application/dashboard.

Monitoring Models

The fact that models are being deployed with cnvrg.io – helps data scientists to control the entire machine learning pipeline – from research and development to production grade models. All endpoints in cnvrg.io are constantly being monitored, so data scientists can see if a model needs their attention (retrain/shutdown). All Input/output of the models is being stored to enable deeper research and help reproduce predictions. 

Summary

cnvrg.io was designed to help data science teams to build better AI, faster. In this guide we have covered the basics of the platform using MNIST as an example. We went through creating datasets, tagging objects, running and tracking experiments and also deploying models as REST endpoints.

The guide covers the basic parts of cnvrg.io’s data science platform and capabilities, and there is a lot more for you to discover ☺. The described fundamentals will help you to get started with cnvrg.io and understand how you can use it for your own research and development processes. 

For additional documentation, guides and support – please refer to http://help.cnvrg.io or reach out directly to us at hi@cnvrg.io

Good luck! And keep building intelligent machines ☺

Did this answer your question?