One of the features in DuckDB is its integration with other data libraries such as pandas. DuckDB makes it seamless when we convert to and from other dataframes and table formats. This flexibility gives the users the ability to implement DuckDB in their data pipelines with ease.
In this post, I’ll walk you through how to work with pandas, polars, and pyarrow in DuckDB.
You can find the full code in my GitHub repo.
DuckDB with Polars
Execute SQL on Polars in DuckDB – Polars to DuckDB
You can simply run a sql query specifying the dataframe name.
import polars as pl import duckdb data = {'ID': [1,2,3,4,5], 'Name': ['Microsoft', 'Apple', 'Netflix', 'Spotify', 'Intel']} # duckdb on polars dataframe pl_df = pl.DataFrame(data) rel = duckdb.sql('select * from pl_df') print('\nDuckDB relation from Polars df: \n', rel, type(rel)) """ DuckDB relation from Polars df: ┌───────┬───────────┐ │ ID │ Name │ │ int64 │ varchar │ ├───────┼───────────┤ │ 1 │ Microsoft │ │ 2 │ Apple │ │ 3 │ Netflix │ │ 4 │ Spotify │ │ 5 │ Intel │ └───────┴───────────┘ class 'duckdb.DuckDBPyRelation' """
DuckDB to Polars
To convert from DuckDB relation object to Polars dataframe, you’d use .pl().
# duckdb to polars pl_df_from_duckdb = rel.pl() print('\nPolars df from DuckDB: \n', type(pl_df_from_duckdb)) """ Polars df from DuckDB: class 'polars.internals.dataframe.frame.DataFrame' """
DuckDB with Pandas
Execute SQL on Pandas in DuckDB – Pandas to DuckDB
import pandas as pd import duckdb data = {'ID': [1,2,3,4,5], 'Name': ['Microsoft', 'Apple', 'Netflix', 'Spotify', 'Intel']} # duckdb on pandas dataframe - pandas to duckdb df = pd.DataFrame(data) rel = duckdb.sql('select * from df') print('\nDuckDB relation from Pandas df: \n', rel, type(rel)) """ DuckDB relation from Pandas df: ┌───────┬───────────┐ │ ID │ Name │ │ int64 │ varchar │ ├───────┼───────────┤ │ 1 │ Microsoft │ │ 2 │ Apple │ │ 3 │ Netflix │ │ 4 │ Spotify │ │ 5 │ Intel │ └───────┴───────────┘ class 'duckdb.DuckDBPyRelation' """
DuckDB to Pandas
For pandas, you use .df().
# duckdb to polars df_from_duckdb = rel.df() print('\nPandas df from DuckDB: \n', type(df_from_duckdb)) """ Pandas df from DuckDB: class 'pandas.core.frame.DataFrame' """
DuckDB with Arrow
Execute SQL on Arrow in DuckDB – Arrow to DuckDB
import pyarrow as pa import duckdb data = {'ID': [1,2,3,4,5], 'Name': ['Microsoft', 'Apple', 'Netflix', 'Spotify', 'Intel']} # duckdb on arrow table - arrow to duckdb arrow = pa.Table.from_pydict(data) rel = duckdb.sql('select * from arrow') print('\nDuckDB relation from Arrow table: \n', rel, type(rel)) """ DuckDB relation from Arrow table: ┌───────┬───────────┐ │ ID │ Name │ │ int64 │ varchar │ ├───────┼───────────┤ │ 1 │ Microsoft │ │ 2 │ Apple │ │ 3 │ Netflix │ │ 4 │ Spotify │ │ 5 │ Intel │ └───────┴───────────┘ class 'duckdb.DuckDBPyRelation' """
DuckDB to Arrow
For pyarrow, you use .arrow().
# duckdb to arrow arrow_from_duckdb = rel.arrow() print('\nArrow table from DuckDB: \n', type(arrow_from_duckdb)) """ Arrow table from DuckDB: class 'pyarrow.lib.Table' """
Summary
As you just saw, it is super easy to use DuckDB in conjunction with pandas, polars, and pyarrow. Hope this post helps you get started in using DuckDB with other data libraries!