Export models

This page shows you how to export BigQuery ML models. You can export BigQuery ML models to Cloud Storage, and use them for online prediction, or edit them in Python. You can export a BigQuery ML model by:

You can export the following model types:

  • AUTOENCODER
  • AUTOML_CLASSIFIER
  • AUTOML_REGRESSOR
  • BOOSTED_TREE_CLASSIFIER
  • BOOSTED_TREE_REGRESSOR
  • DNN_CLASSIFIER
  • DNN_REGRESSOR
  • DNN_LINEAR_COMBINED_CLASSIFIER
  • DNN_LINEAR_COMBINED_REGRESSOR
  • KMEANS
  • LINEAR_REG
  • LOGISTIC_REG
  • MATRIX_FACTORIZATION
  • RANDOM_FOREST_CLASSIFIER
  • RANDOM_FOREST_REGRESSOR
  • TENSORFLOW (imported TensorFlow models)
  • PCA
  • TRANSFORM_ONLY

Export model formats and samples

The following table shows the export destination formats for each BigQuery ML model type and provides a sample of files that get written in the Cloud Storage bucket.

Model type Export model format Exported files sample
AUTOML_CLASSIFIER TensorFlow SavedModel (TF 2.1.0) gcs_bucket/
  assets/
    f1.txt
    f2.txt
  saved_model.pb
  variables/
    variables.data-00-of-01
    variables.index
AUTOML_REGRESSOR
AUTOENCODER TensorFlow SavedModel (TF 1.15 or higher)
DNN_CLASSIFIER
DNN_REGRESSOR
DNN_LINEAR_COMBINED_CLASSIFIER
DNN_LINEAR_COMBINED_REGRESSOR
KMEANS
LINEAR_REGRESSOR
LOGISTIC_REG
MATRIX_FACTORIZATION
PCA
TRANSFORM_ONLY
BOOSTED_TREE_CLASSIFIER Booster (XGBoost 0.82) gcs_bucket/
  assets/
    0.txt
    1.txt
    model_metadata.json
  main.py
  model.bst
  xgboost_predictor-0.1.tar.gz
    ....
     predictor.py
    ....


main.py is for local run. See Model deployment for more details.
BOOSTED_TREE_REGRESSOR
RANDOM_FOREST_REGRESSOR
RANDOM_FOREST_REGRESSOR
TENSORFLOW (imported) TensorFlow SavedModel Exactly the same files that were present when importing the model

Export model trained with TRANSFORM

If the model is trained with the TRANSFORM clause, then an additional preprocessing model performs the same logic in the TRANSFORM clause and is saved in the TensorFlow SavedModel format under the subdirectory transform. You can deploy a model trained with the TRANSFORM clause to Vertex AI as well as locally. For more information, see model deployment.

Export model format Exported files sample
Prediction model: TensorFlow SavedModel or Booster (XGBoost 0.82).
Preprocessing model for TRANSFORM clause: TensorFlow SavedModel (TF 2.5 or higher)
gcs_bucket/
  ....(model files)
  transform/
    assets/
        f1.txt/
        f2.txt/
    saved_model.pb
    variables/
        variables.data-00-of-01
        variables.index

The model doesn't contain the information about the feature engineering performed outside the TRANSFORM clause during training. For example, anything in the SELECT statement . So you would need to manually convert the input data before feeding into the preprocessing model.

Supported data types

When exporting models trained with the TRANSFORM clause, the following data types are supported for feeding into the TRANSFORM clause.

TRANSFORM input type TRANSFORM input samples Exported preprocessing model input samples
INT64 10,
11
tf.constant(
  [10, 11],
  dtype=tf.int64)
NUMERIC NUMERIC 10,
NUMERIC 11
tf.constant(
  [10, 11],
  dtype=tf.float64)
BIGNUMERIC BIGNUMERIC 10,
BIGNUMERIC 11
tf.constant(
  [10, 11],
  dtype=tf.float64)
FLOAT64 10.0,
11.0
tf.constant(
  [10, 11],
  dtype=tf.float64)
BOOL TRUE,
FALSE
tf.constant(
  [True, False],
  dtype=tf.bool)
STRING 'abc',
'def'
tf.constant(
  ['abc', 'def'],
  dtype=tf.string)
BYTES b'abc',
b'def'
tf.constant(
  ['abc', 'def'],
  dtype=tf.string)
DATE DATE '2020-09-27',
DATE '2020-09-28'
tf.constant(
  [
    '2020-09-27',
    '2020-09-28'
  ],
  dtype=tf.string)

"%F" format
DATETIME DATETIME '2023-02-02 02:02:01.152903',
DATETIME '2023-02-03 02:02:01.152903'
tf.constant(
  [
    '2023-02-02 02:02:01.152903',
    '2023-02-03 02:02:01.152903'
  ],
  dtype=tf.string)

"%F %H:%M:%E6S" format
TIME TIME '16:32:36.152903',
TIME '17:32:36.152903'
tf.constant(
  [
    '16:32:36.152903',
    '17:32:36.152903'
  ],
  dtype=tf.string)

"%H:%M:%E6S" format
TIMESTAMP TIMESTAMP '2017-02-28 12:30:30.45-08',
TIMESTAMP '2018-02-28 12:30:30.45-08'
tf.constant(
  [
    '2017-02-28 20:30:30.4 +0000',
    '2018-02-28 20:30:30.4 +0000'
  ],
  dtype=tf.string)

"%F %H:%M:%E1S %z" format
ARRAY ['a', 'b'],
['c', 'd']
tf.constant(
  [['a', 'b'], ['c', 'd']],
  dtype=tf.string)
ARRAY< STRUCT< INT64, FLOAT64>> [(1, 1.0), (2, 1.0)],
[(2, 1.0), (3, 1.0)]
tf.sparse.from_dense(
  tf.constant(
    [
      [0, 1.0, 1.0, 0],
      [0, 0, 1.0, 1.0]
    ],
    dtype=tf.float64))
NULL NULL,
NULL
tf.constant(
  [123456789.0e10, 123456789.0e10],
  dtype=tf.float64)

tf.constant(
  [1234567890000000000, 1234567890000000000],
  dtype=tf.int64)

tf.constant(
  [' __MISSING__ ', ' __MISSING__ '],
  dtype=tf.string)

Supported SQL functions

When exporting models trained with the TRANSFORM clause, you can use the following SQL functions inside the TRANSFORM clause .

  • Operators
    • +, -, *, /, =, <, >, <=, >=, !=, <>, [NOT] BETWEEN, [NOT] IN, IS [NOT] NULL, IS [NOT] TRUE, IS [NOT] FALSE, NOT, AND, OR.
  • Conditional expressions
    • CASE expr, CASE, COALESCE, IF, IFNULL, NULLIF.
  • Mathematical functions
    • ABS, ACOS, ACOSH, ASINH, ATAN, ATAN2, ATANH, CBRT, CEIL, CEILING, COS, COSH, COT, COTH, CSC, CSCH, EXP, FLOOR, IS_INF, IS_NAN, LN, LOG, LOG10, MOD, POW, POWER, SEC, SECH, SIGN, SIN, SINH, SQRT, TAN, TANH.
  • Conversion functions
    • CAST AS INT64, CAST AS FLOAT64, CAST AS NUMERIC, CAST AS BIGNUMERIC, CAST AS STRING, SAFE_CAST AS INT64, SAFE_CAST AS FLOAT64
  • String functions
    • CONCAT, LEFT, LENGTH, LOWER, REGEXP_REPLACE, RIGHT, SPLIT, SUBSTR, SUBSTRING, TRIM, UPPER.
  • Date functions
    • Date, DATE_ADD, DATE_SUB, DATE_DIFF, DATE_TRUNC, EXTRACT, FORMAT_DATE, PARSE_DATE, SAFE.PARSE_DATE.
  • Datetime functions
    • DATETIME, DATETIME_ADD, DATETIME_SUB, DATETIME_DIFF, DATETIME_TRUNC, EXTRACT, PARSE_DATETIME, SAFE.PARSE_DATETIME.
  • Time functions
    • TIME, TIME_ADD, TIME_SUB, TIME_DIFF, TIME_TRUNC, EXTRACT, FORMAT_TIME, PARSE_TIME, SAFE.PARSE_TIME.
  • Timestamp functions
    • TIMESTAMP, TIMESTAMP_ADD, TIMESTAMP_SUB, TIMESTAMP_DIFF, TIMESTAMP_TRUNC, FORMAT_TIMESTAMP, PARSE_TIMESTAMP, SAFE.PARSE_TIMESTAMP, TIMESTAMP_MICROS, TIMESTAMP_MILLIS, TIMESTAMP_SECONDS, EXTRACT, STRING, UNIX_MICROS, UNIX_MILLIS, UNIX_SECONDS.
  • Manual preprocessing functions
    • ML.IMPUTER, ML.HASH_BUCKETIZE, ML.LABEL_ENCODER, ML.MULTI_HOT_ENCODER, ML.NGRAMS, ML.ONE_HOT_ENCODER, ML.BUCKETIZE, ML.MAX_ABS_SCALER, ML.MIN_MAX_SCALER, ML.NORMALIZER, ML.QUANTILE_BUCKETIZE, ML.ROBUST_SCALER, ML.STANDARD_SCALER.

Limitations

The following limitations apply when exporting models:

  • Model export is not supported if any of the following features were used during training:

    • ARRAY, TIMESTAMP, or GEOGRAPHY feature types were present in the input data.
  • Exported models for model types AUTOML_REGRESSOR and AUTOML_CLASSIFIER do not support Vertex AI deployment for online prediction.

  • The model size limit is 1 GB for matrix factorization model export. The model size is roughly proportional to num_factors, so you can reduce num_factors during training to shrink the model size if you reach the limit.

  • For models trained with the BigQuery ML TRANSFORM clause for manual feature preprocessing, see the data types and functions supported for exporting.

  • Models trained with the BigQuery ML TRANSFORM clause before 18 September 2023 must be re-trained before they can be