Menu
Stuff by Yuki
  • Home
  • Python
  • Power BI
  • Tableau
  • Community
    • Makeover Monday
    • Workout Wednesday
  • About
  • Contact
Stuff by Yuki

DuckDB with Polars, Pandas, and Arrow

Posted on June 26, 2023June 26, 2023
Image by Artem Bryzgalov on Unsplash

One of the features in DuckDB is its integration with other data libraries such as pandas. DuckDB makes it seamless when we convert to and from other dataframes and table formats. This flexibility gives the users the ability to implement DuckDB in their data pipelines with ease.

In this post, I’ll walk you through how to work with pandas, polars, and pyarrow in DuckDB.

You can find the full code in my GitHub repo.

DuckDB with Polars

Execute SQL on Polars in DuckDB – Polars to DuckDB

You can simply run a sql query specifying the dataframe name.

Copy Copied Use a different Browser

import polars as pl
import duckdb

data = {'ID': [1,2,3,4,5], 'Name': ['Microsoft', 'Apple', 'Netflix', 'Spotify', 'Intel']}

# duckdb on polars dataframe
pl_df = pl.DataFrame(data)
rel = duckdb.sql('select * from pl_df')
print('\nDuckDB relation from Polars df: \n', rel, type(rel))
"""
DuckDB relation from Polars df: 
 ┌───────┬───────────┐
│  ID   │   Name    │
│ int64 │  varchar  │
├───────┼───────────┤
│     1 │ Microsoft │
│     2 │ Apple     │
│     3 │ Netflix   │
│     4 │ Spotify   │
│     5 │ Intel     │
└───────┴───────────┘
class 'duckdb.DuckDBPyRelation'
"""

DuckDB to Polars

To convert from DuckDB relation object to Polars dataframe, you’d use .pl().

Copy Copied Use a different Browser

# duckdb to polars
pl_df_from_duckdb = rel.pl()
print('\nPolars df from DuckDB: \n', type(pl_df_from_duckdb))
"""
Polars df from DuckDB: 
class 'polars.internals.dataframe.frame.DataFrame'
"""

DuckDB with Pandas

Execute SQL on Pandas in DuckDB – Pandas to DuckDB

Copy Copied Use a different Browser

import pandas as pd
import duckdb

data = {'ID': [1,2,3,4,5], 'Name': ['Microsoft', 'Apple', 'Netflix', 'Spotify', 'Intel']}

# duckdb on pandas dataframe - pandas to duckdb
df = pd.DataFrame(data)
rel = duckdb.sql('select * from df')
print('\nDuckDB relation from Pandas df: \n', rel, type(rel))
"""
DuckDB relation from Pandas df: 
 ┌───────┬───────────┐
│  ID   │   Name    │
│ int64 │  varchar  │
├───────┼───────────┤
│     1 │ Microsoft │
│     2 │ Apple     │
│     3 │ Netflix   │
│     4 │ Spotify   │
│     5 │ Intel     │
└───────┴───────────┘
class 'duckdb.DuckDBPyRelation'
"""

DuckDB to Pandas

For pandas, you use .df().

Copy Copied Use a different Browser

# duckdb to polars
df_from_duckdb = rel.df()
print('\nPandas df from DuckDB: \n', type(df_from_duckdb))

"""
Pandas df from DuckDB: 
class 'pandas.core.frame.DataFrame'
"""

DuckDB with Arrow

Execute SQL on Arrow in DuckDB – Arrow to DuckDB

Copy Copied Use a different Browser

import pyarrow as pa
import duckdb

data = {'ID': [1,2,3,4,5], 'Name': ['Microsoft', 'Apple', 'Netflix', 'Spotify', 'Intel']}

# duckdb on arrow table - arrow to duckdb
arrow = pa.Table.from_pydict(data)
rel = duckdb.sql('select * from arrow')
print('\nDuckDB relation from Arrow table: \n', rel, type(rel))


"""
DuckDB relation from Arrow table: 
 ┌───────┬───────────┐
│  ID   │   Name    │
│ int64 │  varchar  │
├───────┼───────────┤
│     1 │ Microsoft │
│     2 │ Apple     │
│     3 │ Netflix   │
│     4 │ Spotify   │
│     5 │ Intel     │
└───────┴───────────┘
class 'duckdb.DuckDBPyRelation'
"""

DuckDB to Arrow

For pyarrow, you use .arrow().

Copy Copied Use a different Browser

# duckdb to arrow
arrow_from_duckdb = rel.arrow()
print('\nArrow table from DuckDB: \n', type(arrow_from_duckdb))
"""
Arrow table from DuckDB: 
class 'pyarrow.lib.Table'
"""

Summary

As you just saw, it is super easy to use DuckDB in conjunction with pandas, polars, and pyarrow. Hope this post helps you get started in using DuckDB with other data libraries!

References

  • https://duckdb.org/docs/guides/python/polars.html
  • https://duckdb.org/docs/guides/python/sql_on_pandas
  • https://duckdb.org/docs/guides/python/sql_on_arrow

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • How to Convert String to Date or Datetime in Polars
  • Aggregations Over Multiple Columns in Polars
  • DuckDB with Polars, Pandas, and Arrow
  • Read from and Write to Amazon S3 in Polars
  • Handling Missing Values in Polars

Popular Posts

  • A Running Total Calculation with Quick Measure in Power BI
  • How To Copy And Paste Report Page in Power BI
  • How to Fill Dates Between Start Date and End Date in Power BI (Power Query)
  • Year-Over-Year Calculation: Time Intelligence in Power BI
  • Network Visualizations in Python

connect with me

  • LinkedIn
  • Twitter
  • Github
©2023 Stuff by Yuki | Powered by SuperbThemes & WordPress