#sql #data-fusion #mysql #postgresql

datafusion-remote-table

A DataFusion table provider for executing SQL on remote databases

43 releases (25 breaking)

Uses new Rust 2024

0.26.0 Jan 23, 2026
0.25.0 Dec 2, 2025
0.24.0 Nov 3, 2025
0.16.3 Jul 16, 2025
0.9.0 Mar 31, 2025

#2630 in Database interfaces

MIT license

350KB
9K SLoC

datafusion-remote-table

License Crates.io Docs

Features

  1. Execute SQL queries on remote databases and stream results as datafusion table provider
  2. Insert data into remote databases
  3. Support inferring schema or user specified schema
  4. Support pushing down filters and limit to remote databases
  5. Execution plan can be serialized for distributed execution
  6. Record batches can be transformed before outputting to next plan node

Usage

  1. Execute SQL queries on remote database
#[tokio::main]
pub async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let options = PostgresConnectionOptions::new("localhost", 5432, "user", "password");
    let remote_table = RemoteTable::try_new(options, "select * from supported_data_types").await?;

    let ctx = SessionContext::new();
    ctx.register_table("remote_table", Arc::new(remote_table))?;

    ctx.sql("select * from remote_table").await?.show().await?;

    Ok(())
}
  1. Insert data into remote database
#[tokio::main]
pub async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let options = PostgresConnectionOptions::new("localhost", 5432, "user", "password");
    let remote_table = RemoteTable::try_new(options, vec!["public", "test_table"]).await?;

    let ctx = SessionContext::new();
    ctx.register_table("remote_table", Arc::new(remote_table))?;

    ctx.sql("insert into remote_table values (1, 'Tom')").await?.show().await?;

    Ok(())
}

Supported databases

  • Postgres
    • Int2 / Int4 / Int8
    • Float4 / Float8 / Numeric
    • Char / Varchar / Text / Bpchar / Bytea
    • Date / Time / Timestamp / Timestamptz / Interval
    • Bool / Oid / Name / Json / Jsonb / Geometry(PostGIS) / Xml / Uuid
    • Int2[] / Int4[] / Int8[]
    • Float4[] / Float8[]
    • Char[] / Varchar[] / Bpchar[] / Text[] / Bytea[]
  • MySQL
    • TinyInt (Unsigned) / Smallint (Unsigned) / MediumInt (Unsigned) / Int (Unsigned) / Bigint (Unsigned)
    • Float / Double / Decimal
    • Date / DateTime / Time / Timestamp / Year
    • Char / Varchar / Binary / Varbinary
    • TinyText / Text / MediumText / LongText
    • TinyBlob / Blob / MediumBlob / LongBlob
    • Json / Geometry
  • Oracle
    • Number / BinaryFloat / BinaryDouble / Float
    • Varchar2 / NVarchar2 / Char / NChar / Long / Clob / NClob
    • Raw / Long Raw / Blob
    • Date / Timestamp
    • Boolean / SDE.ST_GEOMETRY
  • SQLite
    • Null / Integer / Real / Text / Blob
  • DM (达梦数据库)
    • TinyInt / Smallint / Int / Bigint
    • Real / Float / Double / Numeric / Decimal
    • Char / Varchar / Text
    • Binary / Varbinary / Image
    • Bit / Timestamp / Time / Date

Thanks

Dependencies

~64–91MB
~1.5M SLoC