Skip to main content

Upload files

You can upload files of the following types to your file store:

  • CSV, and other character separated types like TSV.
  • JSON, with a single row per line, objects spanning many lines, or arrays with objects.
  • Text, formatted with one line per row.
  • XLSX, and the older XLS format.
  • XML, using a row tag selector.

Steps to upload

To upload a file and incorporate it into your Spark Pipeline, you can use a Source Gem. There are a few ways to get started:

  • Drag and drop the file directly to the Pipeline canvas.

    Drag and drop file

  • Open the Source/Target Gem drawer and click Upload file.

    Source/Target Gem drawer

  • Create a new Source Gem, click + New Dataset, and select Upload file.

    Source Gem

After following any of the above steps, you will see the Type & Format settings for your file.

Type & Format

File configuration

Follow these steps to complete the file configuration:

  1. Make sure the file type is correct, and click Next.

  2. Either upload the file to a known file store location, or create a new table in your file store using the Upload and create a table option. Then, click Next.

    Upload and create a table

    note

    Once you define the target location and click Next, the file is uploaded to the file path, regardless of whether you complete the Gem configuration.

  3. Fill in any properties depending on your requirements.

  4. Click Infer Schema. This step is required.

  5. Validate or update the schema and click Next.

  6. Load the data if you want to preview the table.

  7. Click Create Dataset. This action creates the dataset and also creates the table if using the Upload and create a table option.

Now, your data is ready for use in your Pipeline via the Source Gem!