Export models

This page shows you how to export BigQuery ML models. You can export BigQuery ML models to Cloud Storage, and use them for online prediction, or edit them in Python. You can export a BigQuery ML model by:

Using the Google Cloud console.
Using the EXPORT MODEL statement.
Using the bq extract command in the bq command-line tool.
Submitting an extract job through the API or client libraries.

You can export the following model types:

AUTOENCODER
AUTOML_CLASSIFIER
AUTOML_REGRESSOR
BOOSTED_TREE_CLASSIFIER
BOOSTED_TREE_REGRESSOR
DNN_CLASSIFIER
DNN_REGRESSOR
DNN_LINEAR_COMBINED_CLASSIFIER
DNN_LINEAR_COMBINED_REGRESSOR
KMEANS
LINEAR_REG
LOGISTIC_REG
MATRIX_FACTORIZATION
RANDOM_FOREST_CLASSIFIER
RANDOM_FOREST_REGRESSOR
TENSORFLOW (imported TensorFlow models)
PCA
TRANSFORM_ONLY

Export model formats and samples

The following table shows the export destination formats for each BigQuery ML model type and provides a sample of files that get written in the Cloud Storage bucket.

Model type	Export model format	Exported files sample
AUTOML_CLASSIFIER	TensorFlow SavedModel (TF 2.1.0)	`gcs_bucket/ assets/ f1.txt f2.txt saved_model.pb variables/ variables.data-00-of-01 variables.index`
AUTOML_REGRESSOR	TensorFlow SavedModel (TF 2.1.0)
AUTOENCODER	TensorFlow SavedModel (TF 1.15 or higher)
DNN_CLASSIFIER
DNN_REGRESSOR
DNN_LINEAR_COMBINED_CLASSIFIER
DNN_LINEAR_COMBINED_REGRESSOR
KMEANS
LINEAR_REGRESSOR
LOGISTIC_REG
MATRIX_FACTORIZATION
PCA
TRANSFORM_ONLY
BOOSTED_TREE_CLASSIFIER	Booster (XGBoost 0.82)	`gcs_bucket/ assets/ 0.txt 1.txt model_metadata.json main.py model.bst xgboost_predictor-0.1.tar.gz .... predictor.py ....` `main.py` is for local run. See Model deployment for more details.
BOOSTED_TREE_REGRESSOR
RANDOM_FOREST_REGRESSOR
RANDOM_FOREST_REGRESSOR
TENSORFLOW (imported)	TensorFlow SavedModel	Exactly the same files that were present when importing the model

Export model trained with `TRANSFORM`

If the model is trained with the TRANSFORM clause, then an additional preprocessing model performs the same logic in the TRANSFORM clause and is saved in the TensorFlow SavedModel format under the subdirectory transform. You can deploy a model trained with the TRANSFORM clause to Vertex AI as well as locally. For more information, see model deployment.

Export model format	Exported files sample
Prediction model: TensorFlow SavedModel or Booster (XGBoost 0.82). Preprocessing model for TRANSFORM clause: TensorFlow SavedModel (TF 2.5 or higher)	`gcs_bucket/ ....(model files) transform/ assets/ f1.txt/ f2.txt/ saved_model.pb variables/ variables.data-00-of-01 variables.index`

The model doesn't contain the information about the feature engineering performed outside the TRANSFORM clause during training. For example, anything in the SELECT statement . So you would need to manually convert the input data before feeding into the preprocessing model.

Supported data types

When exporting models trained with the TRANSFORM clause, the following data types are supported for feeding into the TRANSFORM clause.

TRANSFORM input type	TRANSFORM input samples	Exported preprocessing model input samples
INT64	`10, 11`	`tf.constant( [10, 11], dtype=tf.int64)`
NUMERIC	`NUMERIC 10, NUMERIC 11`	`tf.constant( [10, 11], dtype=tf.float64)`
BIGNUMERIC	`BIGNUMERIC 10, BIGNUMERIC 11`	`tf.constant( [10, 11], dtype=tf.float64)`
FLOAT64	`10.0, 11.0`	`tf.constant( [10, 11], dtype=tf.float64)`
BOOL	`TRUE, FALSE`	`tf.constant( [True, False], dtype=tf.bool)`
STRING	`'abc', 'def'`	`tf.constant( ['abc', 'def'], dtype=tf.string)`
BYTES	`b'abc', b'def'`	`tf.constant( ['abc', 'def'], dtype=tf.string)`
DATE	`DATE '2020-09-27', DATE '2020-09-28'`	`tf.constant( [ '2020-09-27', '2020-09-28' ], dtype=tf.string) "%F" format`
DATETIME	`DATETIME '2023-02-02 02:02:01.152903', DATETIME '2023-02-03 02:02:01.152903'`	`tf.constant( [ '2023-02-02 02:02:01.152903', '2023-02-03 02:02:01.152903' ], dtype=tf.string) "%F %H:%M:%E6S" format`
TIME	`TIME '16:32:36.152903', TIME '17:32:36.152903'`	`tf.constant( [ '16:32:36.152903', '17:32:36.152903' ], dtype=tf.string) "%H:%M:%E6S" format`
TIMESTAMP	`TIMESTAMP '2017-02-28 12:30:30.45-08', TIMESTAMP '2018-02-28 12:30:30.45-08'`	`tf.constant( [ '2017-02-28 20:30:30.4 +0000', '2018-02-28 20:30:30.4 +0000' ], dtype=tf.string) "%F %H:%M:%E1S %z" format`
ARRAY	`['a', 'b'], ['c', 'd']`	`tf.constant( [['a', 'b'], ['c', 'd']], dtype=tf.string)`
ARRAY< STRUCT< INT64, FLOAT64>>	`[(1, 1.0), (2, 1.0)], [(2, 1.0), (3, 1.0)]`	`tf.sparse.from_dense( tf.constant( [ [0, 1.0, 1.0, 0], [0, 0, 1.0, 1.0] ], dtype=tf.float64))`
NULL	`NULL, NULL`	`tf.constant( [123456789.0e10, 123456789.0e10], dtype=tf.float64) tf.constant( [1234567890000000000, 1234567890000000000], dtype=tf.int64) tf.constant( [' __MISSING__ ', ' __MISSING__ '], dtype=tf.string)`

Supported SQL functions

When exporting models trained with the TRANSFORM clause, you can use the following SQL functions inside the TRANSFORM clause .

Operators
- +, -, *, /, =, <, >, <=, >=, !=, <>, [NOT] BETWEEN, [NOT] IN, IS [NOT] NULL, IS [NOT] TRUE, IS [NOT] FALSE, NOT, AND, OR.
Conditional expressions
- CASE expr, CASE, COALESCE, IF, IFNULL, NULLIF.
Mathematical functions
- ABS, ACOS, ACOSH, ASINH, ATAN, ATAN2, ATANH, CBRT, CEIL, CEILING, COS, COSH, COT, COTH, CSC, CSCH, EXP, FLOOR, IS_INF, IS_NAN, LN, LOG, LOG10, MOD, POW, POWER, SEC, SECH, SIGN, SIN, SINH, SQRT, TAN, TANH.
Conversion functions
- CAST AS INT64, CAST AS FLOAT64, CAST AS NUMERIC, CAST AS BIGNUMERIC, CAST AS STRING, SAFE_CAST AS INT64, SAFE_CAST AS FLOAT64
String functions
- CONCAT, LEFT, LENGTH, LOWER, REGEXP_REPLACE, RIGHT, SPLIT, SUBSTR, SUBSTRING, TRIM, UPPER.
Date functions
- Date, DATE_ADD, DATE_SUB, DATE_DIFF, DATE_TRUNC, EXTRACT, FORMAT_DATE, PARSE_DATE, SAFE.PARSE_DATE.
Datetime functions
- DATETIME, DATETIME_ADD, DATETIME_SUB, DATETIME_DIFF, DATETIME_TRUNC, EXTRACT, PARSE_DATETIME, SAFE.PARSE_DATETIME.
Time functions
- TIME, TIME_ADD, TIME_SUB, TIME_DIFF, TIME_TRUNC, EXTRACT, FORMAT_TIME, PARSE_TIME, SAFE.PARSE_TIME.
Timestamp functions
- TIMESTAMP, TIMESTAMP_ADD, TIMESTAMP_SUB, TIMESTAMP_DIFF, TIMESTAMP_TRUNC, FORMAT_TIMESTAMP, PARSE_TIMESTAMP, SAFE.PARSE_TIMESTAMP, TIMESTAMP_MICROS, TIMESTAMP_MILLIS, TIMESTAMP_SECONDS, EXTRACT, STRING, UNIX_MICROS, UNIX_MILLIS, UNIX_SECONDS.
Manual preprocessing functions
- ML.IMPUTER, ML.HASH_BUCKETIZE, ML.LABEL_ENCODER, ML.MULTI_HOT_ENCODER, ML.NGRAMS, ML.ONE_HOT_ENCODER, ML.BUCKETIZE, ML.MAX_ABS_SCALER, ML.MIN_MAX_SCALER, ML.NORMALIZER, ML.QUANTILE_BUCKETIZE, ML.ROBUST_SCALER, ML.STANDARD_SCALER.

Limitations

The following limitations apply when exporting models:

Model export is not supported if any of the following features were used during training:
- ARRAY, TIMESTAMP, or GEOGRAPHY feature types were present in the input data.
Exported models for model types AUTOML_REGRESSOR and AUTOML_CLASSIFIER do not support Vertex AI deployment for online prediction.
The model size limit is 1 GB for matrix factorization model export. The model size is roughly proportional to num_factors, so you can reduce num_factors during training to shrink the model size if you reach the limit.
For models trained with the BigQuery ML TRANSFORM clause for manual feature preprocessing, see the data types and functions supported for exporting.
Models trained with the BigQuery ML TRANSFORM clause before 18 September 2023 must be re-trained before they can be