SampleRows
ProphecySparkBasicsPython0.2.25+ProphecySparkBasicsScala0.0.1+Databricks UC Single ClusterNot SupportedDatabricks UC Shared14.3+Livy3.0.1+
Use the SampleRows gem to sample records by choosing a specific number or percentage of records.
Parameters
Parameter | Description |
---|---|
Sampling strategy | An option between sampling by number of records or percentage of records |
Sampling ratio | The ratio of records that you wish to sample |
Random seed | A number that lets you reproduce the random sample |
With replacement | When enabled, this allows records to be returned to the sample pool after selection |
Example code
- Python
- Scala
def SampleRows_1(spark: SparkSession, in0: DataFrame) -> DataFrame:
return in0.sample(withReplacement = False, fraction = 0.5)
object SampleRows_1 {
def apply(context: Context, in: DataFrame): DataFrame =
in.sample(false, "0.5".toDouble)
}