Upload files
You can upload files of the following types to your file store:
- CSV, and other character separated types like TSV.
- JSON, with a single row per line, objects spanning many lines, or arrays with objects.
- Text, formatted with one line per row.
- XLSX, and the older XLS format.
- XML, using a row tag selector.
Steps to upload
To upload a file and incorporate it into your Spark Pipeline, you can use a Source Gem. There are a few ways to get started:
Drag and drop the file directly to the Pipeline canvas.
Open the Source/Target Gem drawer and click Upload file.
Create a new Source Gem, click + New Dataset, and select Upload file.
After following any of the above steps, you will see the Type & Format settings for your file.
File configuration
Follow these steps to complete the file configuration:
Make sure the file type is correct, and click Next.
Either upload the file to a known file store location, or create a new table in your file store using the Upload and create a table option. Then, click Next.
noteOnce you define the target location and click Next, the file is uploaded to the file path, regardless of whether you complete the Gem configuration.
Fill in any properties depending on your requirements.
Click Infer Schema. This step is required.
Validate or update the schema and click Next.
Load the data if you want to preview the table.
Click Create Dataset. This action creates the dataset and also creates the table if using the Upload and create a table option.
Now, your data is ready for use in your Pipeline via the Source Gem!