Skip to main content

DataFrame class

A distributed collection of data grouped into named columns.

A DataFrame is equivalent to a relational table in Spark SQL, and can be created using various functions in SparkSession.

important

A DataFrame should not be directly created using the constructor.

Supports Spark Connect

Properties

Property

Description

sparkSession

Returns SparkSession that created this DataFrame.

rdd

Returns the content as an RDD of Row (Classic mode only).

na

Returns a DataFrameNaFunctions for handling missing values.

stat

Returns a DataFrameStatFunctions for statistic functions.

write

Interface for saving the content of the non-streaming DataFrame out into external storage.

writeStream

Interface for saving the content of the streaming DataFrame out into external storage.

schema

Returns the schema of this DataFrame as a StructType.

dtypes

Returns all column names and their data types as a list.

columns

Retrieves the names of all columns in the DataFrame as a list.

storageLevel

Get the DataFrame's current storage level.