In this guide you'll be able to upload datasets and connect it to your spark clusters.
cnvrg will configure your S3 bucket automatically to the spark cluster, so zero configuration is needed when accessing your data files.

Upload your dataset to cnvrg

Users can use cnvrg  to upload datasets securely to S3 from the WEB and the CLI:

CLI

WEB

  • Go to your organization's home page
  • Click on Datasets tab
  • create a dataset
  • Use Drag & Drop for uploading your dataset files (Limit size of 100MB)

When your dataset files are successfully uploaded, you'll see a copy S3 path on each file:

Click on Copy S3 Path to put it in your clipboard

Now you can simply create a new Notebook or submit a new Experiment, and you'll be able to access your S3 file path:

spark = SparkSession.builder.appName("my_spark").getOrCreate()


df = spark.read.format("csv").option("header", "true").load("<PASTE HERE YOUR S3 FILE PATH>")
df.show()

Did this answer your question?