Introduction
There are a few ways you can use SQL in Polars. One option is to use other libraries such as DuckDB and pandas. And another option is to actually run SQL without using other libraries.
I’ll be demonstrating the latter in this blog post. Please refer to this post on how to use DuckDB in Polars.
I myself didn’t know you could run SQL directly in Polars without relying on other libraries until one of my fellow LinkedIn connections, Luca introduced it in his post.
If you follow along, here’s the link to my Github repo.
Use SQL in Polars
The code is as simple as it looks.
- Create a context for SQL to work on a Polars dataframe.
- Register dataframe’s name to be used in your SQL query.
- Execute the query that returns a Polars dataframe.
import polars as pl
# read in data
df_pl = pl.scan_csv('../sample_data.csv')
# prep for sql execution
sql = pl.SQLContext()
sql.register('df_pl', df_pl)
result_df = sql.execute(
    """
      select 
        *
      from df_pl
      where Name = 'Mike'
    """
).collect()
print(result_df)
"""
output:
shape: (1, 4)
┌───────────┬──────┬─────┬───────────────────┐
│ studentId ┆ Name ┆ Age ┆ FirstEnrolledDate │
│ ---       ┆ ---  ┆ --- ┆ ---               │
│ i64       ┆ str  ┆ i64 ┆ str               │
╞═══════════╪══════╪═════╪═══════════════════╡
│ 1         ┆ Mike ┆ 24  ┆ 2020-01-17        │
└───────────┴──────┴─────┴───────────────────┘
"""
Summary
Using SQL directly in Polars is pretty simple in that you just need to know a few lines of specific Polars code. Many people working in data are familiar with SQL. The ability to use SQL in Polars helps unlock the performance of Polars to those who otherwise cannot.
