API Reference#

This is the class and function reference of scikit-learn. Please refer to the full user guide for further details, as the raw specifications of classes and functions may not be enough to give full guidelines on their use. For reference on concepts repeated across the API, see Glossary of Common Terms and API Elements.

Object

Description

config_context

Context manager to temporarily change the global scikit-learn configuration.

get_config

Retrieve the current scikit-learn configuration.

set_config

Set global scikit-learn configuration.

show_versions

Print useful debugging information.

BaseEstimator

Base class for all estimators in scikit-learn.

BiclusterMixin

Mixin class for all bicluster estimators in scikit-learn.

ClassNamePrefixFeaturesOutMixin

Mixin class for transformers that generate their own names by prefixing.

ClassifierMixin

Mixin class for all classifiers in scikit-learn.

ClusterMixin

Mixin class for all cluster estimators in scikit-learn.

DensityMixin

Mixin class for all density estimators in scikit-learn.

MetaEstimatorMixin

Mixin class for all meta estimators in scikit-learn.

OneToOneFeatureMixin

Provides get_feature_names_out for simple transformers.

OutlierMixin

Mixin class for all outlier detection estimators in scikit-learn.

RegressorMixin

Mixin class for all regression estimators in scikit-learn.

TransformerMixin

Mixin class for all transformers in scikit-learn.

clone

Construct a new unfitted estimator with the same parameters.

is_classifier

Return True if the given estimator is (probably) a classifier.

is_clusterer

Return True if the given estimator is (probably) a clusterer.

is_regressor

Return True if the given estimator is (probably) a regressor.

is_outlier_detector

Return True if the given estimator is (probably) an outlier detector.

CalibratedClassifierCV

Calibrate probabilities using isotonic, sigmoid, or temperature scaling.

calibration_curve

Compute true and predicted probabilities for a calibration curve.

CalibrationDisplay

Calibration curve (also known as reliability diagram) visualization.

AffinityPropagation

Perform Affinity Propagation Clustering of data.

AgglomerativeClustering

Agglomerative Clustering.

Birch

Implements the BIRCH clustering algorithm.

BisectingKMeans

Bisecting K-Means clustering.

DBSCAN

Perform DBSCAN clustering from vector array or distance matrix.

FeatureAgglomeration

Agglomerate features.

HDBSCAN

Cluster data using hierarchical density-based clustering.

KMeans

K-Means clustering.

MeanShift

Mean shift clustering using a flat kernel.

MiniBatchKMeans

Mini-Batch K-Means clustering.

OPTICS

Estimate clustering structure from vector array.

SpectralBiclustering

Spectral biclustering (Kluger, 2003) [R2af9f5762274-1].

SpectralClustering

Apply clustering to a projection of the normalized Laplacian.

SpectralCoclustering

Spectral Co-Clustering algorithm (Dhillon, 2001) [R0dd0f3306ba7-1].

affinity_propagation

Perform Affinity Propagation Clustering of data.

cluster_optics_dbscan

Perform DBSCAN extraction for an arbitrary epsilon.

cluster_optics_xi

Automatically extract clusters according to the Xi-steep method.

compute_optics_graph

Compute the OPTICS reachability graph.

dbscan

Perform DBSCAN clustering from vector array or distance matrix.

estimate_bandwidth

Estimate the bandwidth to use with the mean-shift algorithm.

k_means

Perform K-means clustering algorithm.

kmeans_plusplus

Init n_clusters seeds according to k-means++.

mean_shift

Perform mean shift clustering of data using a flat kernel.

spectral_clustering

Apply clustering to a projection of the normalized Laplacian.

ward_tree

Ward clustering based on a Feature matrix.

ColumnTransformer

Applies transformers to columns of an array or pandas DataFrame.

TransformedTargetRegressor

Meta-estimator to regress on a transformed target.

make_column_selector

Create a callable to select columns to be used with

make_column_transformer

Construct a ColumnTransformer from the given transformers.

EllipticEnvelope

An object for detecting outliers in a Gaussian distributed dataset.

EmpiricalCovariance

Maximum likelihood covariance estimator.

GraphicalLasso

Sparse inverse covariance estimation with an l1-penalized estimator.

GraphicalLassoCV

Sparse inverse covariance w/ cross-validated choice of the l1 penalty.

LedoitWolf

LedoitWolf Estimator.

MinCovDet

Minimum Covariance Determinant (MCD): robust estimator of covariance.

OAS

Oracle Approximating Shrinkage Estimator.

ShrunkCovariance

Covariance estimator with shrinkage.

empirical_covariance

Compute the Maximum likelihood covariance estimator.

graphical_lasso

L1-penalized covariance estimator.

ledoit_wolf

Estimate the shrunk Ledoit-Wolf covariance matrix.

ledoit_wolf_shrinkage

Estimate the shrunk Ledoit-Wolf covariance matrix.

oas

Estimate covariance with the Oracle Approximating Shrinkage.

shrunk_covariance

Calculate covariance matrices shrunk on the diagonal.

CCA

Canonical Correlation Analysis, also known as “Mode B” PLS.

PLSCanonical

Partial Least Squares transformer and regressor.

PLSRegression

PLS regression.

PLSSVD

Partial Least Square SVD.

clear_data_home

Delete all the content of the data home cache.

dump_svmlight_file

Dump the dataset in svmlight / libsvm file format.

fetch_20newsgroups

Load the filenames and data from the 20 newsgroups dataset (classification).

fetch_20newsgroups_vectorized

Load and vectorize the 20 newsgroups dataset (classification).

fetch_california_housing

Load the California housing dataset (regression).

fetch_covtype

Load the covertype dataset (classification).

fetch_file

Fetch a file from the web if not already present in the local folder.

fetch_kddcup99

Load the kddcup99 dataset (classification).

fetch_lfw_pairs

Load the Labeled Faces in the Wild (LFW) pairs dataset (classification).

fetch_lfw_people

Load the Labeled Faces in the Wild (LFW) people dataset (classification).

fetch_olivetti_faces

Load the Olivetti faces data-set from AT&T (classification).

fetch_openml

Fetch dataset from openml by name or dataset id.

fetch_rcv1

Load the RCV1 multilabel dataset (classification).

fetch_species_distributions

Loader for species distribution dataset from Phillips et. al. (2006).

get_data_home

Return the path of the scikit-learn data directory.

load_breast_cancer

Load and return the breast cancer Wisconsin dataset (classification).

load_diabetes

Load and return the diabetes dataset (regression).

load_digits

Load and return the digits dataset (classification).

load_files

Load text files with categories as subfolder names.

load_iris

Load and return the iris dataset (classification).

load_linnerud

Load and return the physical exercise Linnerud dataset.

load_sample_image

Load the numpy array of a single sample image.

load_sample_images

Load sample images for image manipulation.

load_svmlight_file

Load datasets in the svmlight / libsvm format into sparse CSR matrix.

load_svmlight_files

Load dataset from multiple files in SVMlight format.

load_wine

Load and return the wine dataset (classification).

make_biclusters

Generate a constant block diagonal structure array for biclustering.

make_blobs

Generate isotropic Gaussian blobs for clustering.

make_checkerboard

Generate an array with block checkerboard structure for biclustering.

make_circles

Make a large circle containing a smaller circle in 2d.

make_classification

Generate a random n-class classification problem.

make_friedman1

Generate the “Friedman #1” regression problem.

make_friedman2

Generate the “Friedman #2” regression problem.

make_friedman3

Generate the “Friedman #3” regression problem.

make_gaussian_quantiles

Generate isotropic Gaussian and label samples by quantile.

make_hastie_10_2

Generate data for binary classification used in Hastie et al. 2009, Example 10.2.

make_low_rank_matrix

Generate a mostly low rank matrix with bell-shaped singular values.

make_moons

Make two interleaving half circles.

make_multilabel_classification

Generate a random multilabel classification problem.

make_regression

Generate a random regression problem.

make_s_curve

Generate an S curve dataset.

make_sparse_coded_signal

Generate a signal as a sparse combination of dictionary elements.

make_sparse_spd_matrix

Generate a sparse symmetric definite positive matrix.

make_sparse_uncorrelated

Generate a random regression problem with sparse uncorrelated design.

make_spd_matrix

Generate a random symmetric, positive-definite matrix.

make_swiss_roll

Generate a swiss roll dataset.

DictionaryLearning

Dictionary learning.

FactorAnalysis

Factor Analysis (FA).

FastICA

FastICA: a fast algorithm for Independent Component Analysis.

IncrementalPCA

Incremental principal components analysis (IPCA).

KernelPCA

Kernel Principal component analysis (KPCA).

LatentDirichletAllocation

Latent Dirichlet Allocation with online variational Bayes algorithm.

MiniBatchDictionaryLearning

Mini-batch dictionary learning.

MiniBatchNMF

Mini-Batch Non-Negative Matrix Factorization (NMF).

MiniBatchSparsePCA

Mini-batch Sparse Principal Components Analysis.

NMF

Non-Negative Matrix Factorization (NMF).

PCA

Principal component analysis (PCA).

SparseCoder

Sparse coding.

SparsePCA

Sparse Principal Components Analysis (SparsePCA).

TruncatedSVD

Dimensionality reduction using truncated SVD (aka LSA).

dict_learning

Solve a dictionary learning matrix factorization problem.

dict_learning_online

Solve a dictionary learning matrix factorization problem online.

fastica

Perform Fast Independent Component Analysis.

non_negative_factorization

Compute Non-negative Matrix Factorization (NMF).

sparse_encode

Sparse coding.

LinearDiscriminantAnalysis

Linear Discriminant Analysis.

QuadraticDiscriminantAnalysis

Quadratic Discriminant Analysis.

DummyClassifier

DummyClassifier makes predictions that ignore the input features.

DummyRegressor

Regressor that makes predictions using simple rules.

AdaBoostClassifier

An AdaBoost classifier.

AdaBoostRegressor

An AdaBoost regressor.

BaggingClassifier

A Bagging classifier.

BaggingRegressor

A Bagging regressor.

ExtraTreesClassifier

An extra-trees classifier.

ExtraTreesRegressor

An extra-trees regressor.

GradientBoostingClassifier

Gradient Boosting for classification.

GradientBoostingRegressor

Gradient Boosting for regression.

HistGradientBoostingClassifier

Histogram-based Gradient Boosting Classification Tree.

HistGradientBoostingRegressor

Histogram-based Gradient Boosting Regression Tree.

IsolationForest

Isolation Forest Algorithm.

RandomForestClassifier

A random forest classifier.

RandomForestRegressor

A random forest regressor.

RandomTreesEmbedding

An ensemble of totally random trees.

StackingClassifier

Stack of estimators with a final classifier.

StackingRegressor

Stack of estimators with a final regressor.

VotingClassifier

Soft Voting/Majority Rule classifier for unfitted estimators.

VotingRegressor

Prediction voting regressor for unfitted estimators.

ConvergenceWarning

Custom warning to capture convergence problems

DataConversionWarning

Warning used to notify implicit data conversions happening in the code.

DataDimensionalityWarning

Custom warning to notify potential issues with data dimensionality.

EfficiencyWarning

Warning used to notify the user of inefficient computation.

FitFailedWarning

Warning class used if there is an error while fitting the estimator.

InconsistentVersionWarning

Warning raised when an estimator is unpickled with an inconsistent version.

NotFittedError

Exception class to raise if estimator is used before fitting.

UndefinedMetricWarning

Warning used when the metric is invalid

EstimatorCheckFailedWarning

Warning raised when an estimator check from the common tests fails.

enable_halving_search_cv

Enables Successive Halving search-estimators

enable_iterative_imputer

Enables IterativeImputer

DictVectorizer

Transforms lists of feature-value mappings to vectors.

FeatureHasher

Implements feature hashing, aka the hashing trick.