Skip to main content

Unpivot

Use the Unpivot gem to transform your data from a wide format to a long format.

note

If you want to pivot the data, rather than unpivot, use the Aggregate gem.

Parameters

ParameterDescription
Column(s) to use as identifiersThe column(s) that will identify to which group or entity the observation corresponds to.
Columns to unpivotThe columns (wide format) that you would like to transform into a single column (long format).
Variable column nameThe name of the column that contains the names of the unpivoted columns. This helps describe the values in the value column.
Value column nameThe name of the column that will contain the values from the unpivoted columns.

Example

Imagine you have sales data for different products, with each quarter's sales stored in its own column—this is known as wide format. Before modeling seasonal trends or doing time series analysis, it's often helpful to convert this into long format, where each row represents a single observation.

ProductQ1Q2Q3Q4
A100150130170
B90120110160

To configure a Unpivot gem for this table:

  1. Select the identifier columns. In the example above, the Product column is the identifier column.
  2. Select the columns to unpivot. In the example above, all of the quarter columns (Q1, Q2, etc.) are your columns to unpivot.
  3. Name the variable column Quarter because it identifies the sales period.
  4. Name the value column UnitsSold because it contains number of units sold per quarter.
  5. Save and run the gem.

After the transformation:

  • The quarter names (Q1, Q2, etc.) will move into a new Quarter column.
  • The corresponding sales values will be stored in a UnitsSold column.
ProductQuarterUnitsSold
AQ1100
AQ2150
AQ3130
AQ4170
BQ190
BQ2120
BQ3110
BQ4160

This makes your data easier to analyze over time, since each row now represents one product's sales for a specific quarter.

Example code

tip

To see the compiled code of your project, switch to the Code view in the project header.

def unpivot_products_by_quarter(spark: SparkSession, in0: DataFrame) -> DataFrame:
return in0.unpivot(["Product"], [col for col in in0.columns if col not in ["Product"]], "Quarter", "UnitsSold")