メインコンテンツまでスキップ

SQL data type rules

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Databricks uses several rules to resolve conflicts among data types:

You can also explicitly cast between many types:

Type promotion

Type promotion is the process of casting a type into another type of the same type family which contains all possible values of the original type. Therefore type promotion is a safe operation. For example TINYINT has a range from -128 to 127. All its possible values can be safely promoted to INTEGER.

Type precedence list

The type precedence list defines whether values of a given data type can be implicitly promoted to another data type.

Data type

Precedence list (from narrowest to widest)

TINYINT

TINYINT -> SMALLINT -> INT -> BIGINT -> DECIMAL -> FLOAT (1) -> DOUBLE

SMALLINT

SMALLINT -> INT -> BIGINT -> DECIMAL -> FLOAT (1) -> DOUBLE

INT

INT -> BIGINT -> DECIMAL -> FLOAT (1) -> DOUBLE

BIGINT

BIGINT -> DECIMAL -> FLOAT (1) -> DOUBLE

DECIMAL

DECIMAL -> FLOAT (1) -> DOUBLE

FLOAT

FLOAT (1) -> DOUBLE

DOUBLE

DOUBLE

DATE

DATE -> TIMESTAMP

TIME

TIME (4)

TIMESTAMP

TIMESTAMP

ARRAY

ARRAY (2)

BINARY

BINARY

BOOLEAN

BOOLEAN

INTERVAL

INTERVAL

GEOGRAPHY

GEOGRAPHY(ANY)

GEOMETRY

GEOMETRY(ANY)

MAP

MAP (2)

STRING

STRING

STRUCT

STRUCT (2)

VARIANT

VARIANT

OBJECT

OBJECT (3)

(1) For least common type resolution FLOAT is skipped to avoid loss of precision.

(2) For a complex type the precedence rule applies recursively to its component elements.

(3) OBJECT exists only within a VARIANT.

(4) The least common type of TIME(n) and TIME(m) is TIME(max(n, m)). TIME does not promote to TIMESTAMP or any other type.

Strings and NULL

Special rules apply for STRING and untyped NULL:

  • NULL can be promoted to any other type.
  • STRING can be promoted to BIGINT, BINARY, BOOLEAN, DATE, DOUBLE, INTERVAL, TIME, and TIMESTAMP. If the actual string value cannot be cast to least common type Databricks raises a runtime error. When promoting to INTERVAL the string value must match the intervals units.

Type precedence graph

This is a graphical depiction of the precedence hierarchy, combining the type precedence list and strings and NULLs rules.

Graphical representation of precedence rules

Least common type resolution

The least common type from a set of types is the narrowest type reachable from the type precedence graph by all elements of the set of types.

The least common type resolution is used to:

  • Decide whether a function that expects a parameter of a given type can be invoked using an argument of a narrower type.
  • Derive the argument type for a function that expects a shared argument type for multiple parameters, such as coalesce, in, least, or greatest.
  • Derive the operand types for operators such as arithmetic operations or comparisons.
  • Derive the result type for expressions such as the case expression.
  • Derive the element, key, or value types for array and map constructors.
  • Derive the result type of UNION, INTERSECT, or EXCEPT set operators.

Parameterized types

Some data types carry parameters that affect their precision or scale. When the least common type involves parameterized types, the result parameters are computed so that values of both input types can be represented without loss.

DECIMAL(p, s)

The least common type of DECIMAL(p1, s1) and DECIMAL(p2, s2) is a DECIMAL whose scale and precision accommodate all values from both types:

resultScale     = max(s1, s2)
maxIntegerDigits = max(p1 - s1, p2 - s2)
resultPrecision = min(38, resultScale + maxIntegerDigits)

If resultScale + maxIntegerDigits exceeds 38 (the maximum DECIMAL precision), the precision is capped at 38 and the scale is reduced to preserve integer digits.

For example, the least common type of DECIMAL(10, 2) and DECIMAL(12, 5) is DECIMAL(15, 5) because max(2, 5) = 5 scale digits and max(8, 7) = 8 integer digits require a precision of 13, widened to 15 to hold 8 + 5 = 13 significant digits with 5 decimal places.

TIME(p)

The least common type of TIME(n) and TIME(m) is TIME(max(n, m)).

For example, the least common type of TIME(0) and TIME(6) is TIME(6).

Additional rules

Special rules are applied if the least common type resolves to FLOAT. If any of the contributing types is an exact numeric type (TINYINT, SMALLINT, INTEGER, BIGINT, or DECIMAL) the least common type is pushed to DOUBLE to avoid potential loss of digits.

When the least common type is a STRING the collation is computed following the collation precedence rules.

Implicit downcasting and crosscasting

Databricks employs these forms of implicit casting only on function and operator invocation, and only where it can unambiguously determine the intent.

  • Implicit downcasting

    Implicit downcasting automatically casts a wider type to a narrower type without requiring you to specify the cast explicitly. Downcasting is convenient, but it carries the risk of unexpected runtime errors if the actual value fails to be representable in the narrow type.

    Downcasting applies the type precedence list in reverse order. The GEOGRAPHY and GEOMETRY data types are never downcast.

  • Implicit crosscasting

    Implicit crosscasting casts a value from one type family to another without requiring you to specify the cast explicitly.

    Databricks supports implicit crosscasting from:

    • Any simple type, except BINARY, GEOGRAPHY, and GEOMETRY, to STRING.
    • A STRING to any simple type, except GEOGRAPHY and GEOMETRY.

Casting on function invocation

Given a resolved function or operator, the following rules apply, in the order they are listed, for each parameter and argument pair:

  • If a supported parameter type is part of the argument's