At cnvrg, we believe datasets should be managed at the organization level and not per project separately.

Hence, once you uploaded datasets to the organizations you can reuse them for every project, experiment, and notebook.


Create a new dataset

Change to the directory with the data you want to link and run:

$ cnvrg data init


Upload data

In the dataset directory, (after running data init) run:

$ cnvrg data upload

Note: if the dataset size is very large, it may take a while

Sync data

In the dataset directory, sync will download any remote changes and will upload local changes :

$ cnvrg data sync

Note: if the dataset size is very large, it may take a while

Clone dataset

In your terminal, run:

$ cnvrg data clone <dataset_URL>

Note: if the dataset size is very large, it may take a while

List datasets

To view all datasets the organization own:

$ cnvrg data list


List dataset commits

To view a specific dataset commits list:

$ cnvrg data commits


Run an experiment with dataset

In order to run an experiment with dataset you uploaded, simply add the flag --data to the running command, i.e.: --data=DATA_ID.

The DATA_ID can be found with the data list commands under the data_id column.

$ cnvrg run --data=DATA_ID python train.py 

The data path to access in the experiment will be at: /data/DATA_ID

Did this answer your question?