Skip to content
Daft Documentation
video_frames
Initializing search
Daft
Guide
Examples
Python API
SQL Reference
Contributing
Daft Skills
Daft Documentation
Daft
Guide
Guide
Quickstart
Installation
AI Functions
AI Functions
Overview
Prompt
Embed
Classify
Providers
Modalities
Modalities
Overview
Text
Images
Audio
Videos
Documents
JSON and Nested Data
Files and URLs
Embeddings
Custom Modalities
User Defined Functions
User Defined Functions
Functions
Classes & Methods
Aggregate UDFs
Working with GPUs
Legacy UDF Migration Guide
Legacy UDFs
Common Use Cases
Common Use Cases
Batch Inference
Datasets
Datasets
Common Crawl
DROID
Data Connectors
Data Connectors
Object Storage
Object Storage
AWS S3
Azure Blob Store
Google Cloud Storage
COS (Tencent Cloud)
Table Formats
Table Formats
Apache Hudi
Apache Iceberg
Delta Lake
Apache Paimon
Lance
Catalogs
Catalogs
AWS Glue
AWS S3 Tables
Apache Gravitino
Unity Catalog (Databricks)
Databases
Databases
Bigtable
ClickHouse
Postgres
SQL Databases
Turbopuffer
Files
Files
Files
Text Files
Generic File Source Options
Other Sources
Other Sources
Apache Kafka
MCAP
Hugging Face Datasets
Custom
Custom
Custom Connectors
Custom Catalogs
Extensions
Extensions
Overview
Community Extensions
Built on Daft
Authoring Guide
Architecture
Scaling & Performance
Scaling & Performance
Scaling Out and Deployment
Scaling Out and Deployment
Running on Kubernetes
Running on Ray
Optimization
Optimization
Managing Memory Usage
Partitioning and Batching
Shuffle Algorithms
Join Strategies
Observability
Observability
Dashboard
Progress Indicators
Logging
Telemetry
Sessions, Catalogs, and Tables
Resources
Resources
Roadmap
Community
↗
Release Notes
↗
Usage Telemetry
Examples
Examples
Multimodal Structured Outputs: Evaluating Image Understanding
Voice AI Analytics with Faster-Whisper and embed_text
Web Text Deduplication
Audio Transcription
Generate Text Embeddings for Turbopuffer
Running LLMs on the Red Pajamas Dataset
Generate Images from Text with Stable Diffusion
Querying Image Data
MNIST Digit Classification
UDF Patterns
Window Functions
Working with Common Crawl Data
Document Processing
Python API
Python API
AI
I/O
DataFrame
Datasets
Expressions
Functions
Functions
abs
add_months
any_value
approx_count_distinct
approx_percentiles
arccos
arccosh
arcsin
arcsinh
arctan
arctan2
arctanh
ascii_func
audio_file
audio_metadata
avg
between
bin
bitwise_and
bitwise_or
bitwise_xor
bool_and
bool_or
capitalize
cast
cbrt
ceil
chr_func
chunk
classify_image
classify_text
clip
coalesce
columns_avg
columns_max
columns_mean
columns_min
columns_sum
compress
concat
concat_ws
contains
conv
convert_image
convert_time_zone
convert_timezone
cos
cosh
cosine_distance
cosine_similarity
cot
count
count_distinct
count_matches
crop
csc
current_date
current_timestamp
current_timezone
damerau_levenshtein_distance
date
date_add
date_diff
date_format
date_from_unix_date
date_sub
date_trunc
dateadd
datediff
datepart
day
day_of_month
day_of_week
day_of_year
dayofmonth
dayofyear
decode
decode_image
decode_image_file
decompress
degrees
dense_rank
deserialize
dot_product
download
e
embed_image
embed_text
encode
encode_image
endswith
eq_null_safe
euclidean_distance
exp
explode
expm1
extract_day_uuid7
extract_hour_uuid7
extract_minute_uuid7
extract_month_uuid7
factorial
file
file_exists
file_path
file_size
fill_nan
fill_null
find
first_value
floor
format
from_unixtime
from_utc_timestamp
get
great_circle_distance
guess_mime_type
hamming_distance
hamming_distance_str
hash
hour
hypot
ilike
image_attribute
image_channel
image_file
image_file_metadata
image_hash
image_height
image_mode
image_to_tensor
image_width
is_in
is_inf
is_nan
is_null
jaccard_similarity
jaro_similarity
jaro_winkler_similarity
jq
json_array_length
json_object_keys
json_tuple
lag
last_day
last_value
lead
left
length
length_bytes
levenshtein_distance
like
list_agg
list_agg_distinct
list_append
list_bool_and
list_bool_or
list_contains
list_count
list_distinct
list_filter
list_flatten
list_join
list_map
list_max
list_mean
list_min
list_sort
list_sum
llm_generate
ln
log
log1p
log2
log10
lower
lpad
lstrip
make_date
make_timestamp
make_timestamp_ltz
map_get
map_keys
max
mean
median
microsecond
millisecond
min
minhash
minute
monotonically_increasing_id
month
months_between
nanosecond
negate
next_day
normalize
not_nan
not_null
over
parse_url