In this guide you'll be able to upload datasets and connect it to your spark clusters.
cnvrg will configure your S3 bucket automatically to the spark cluster, so zero configuration is needed when accessing your data files.
Upload your dataset to cnvrg
Users can use cnvrg to upload datasets securely to S3 from the WEB and the CLI:
- Follow the CLI guide to upload your dataset
- Go to your Organization's homepage, choose datasets and you'll see your newly created dataset.
- Go to your organization's home page
- Click on Datasets tab
- create a dataset
- Use Drag & Drop for uploading your dataset files (Limit size of 100MB)
When your dataset files are successfully uploaded, you'll see a copy S3 path on each file:
Click on Copy S3 Path to put it in your clipboard
Now you can simply create a new Notebook or submit a new Experiment, and you'll be able to access your S3 file path:
spark = SparkSession.builder.appName("my_spark").getOrCreate()
df = spark.read.format("csv").option("header", "true").load("<PASTE HERE YOUR S3 FILE PATH>")