How to Use SQL in Polars

Introduction

There are a few ways you can use SQL in Polars. One option is to use other libraries such as DuckDB and pandas. And another option is to actually run SQL without using other libraries.

I’ll be demonstrating the latter in this blog post. Please refer to this post on how to use DuckDB in Polars.

I myself didn’t know you could run SQL directly in Polars without relying on other libraries until one of my fellow LinkedIn connections, Luca introduced it in his post.

If you follow along, here’s the link to my Github repo.

Use SQL in Polars

The code is as simple as it looks.

Create a context for SQL to work on a Polars dataframe.
Register dataframe’s name to be used in your SQL query.
Execute the query that returns a Polars dataframe.

Copy


import polars as pl

# read in data
df_pl = pl.scan_csv('../sample_data.csv')

# prep for sql execution
sql = pl.SQLContext()
sql.register('df_pl', df_pl)

result_df = sql.execute(
    """
      select 
        *
      from df_pl
      where Name = 'Mike'
    """
).collect()

print(result_df)

"""
output:
shape: (1, 4)
┌───────────┬──────┬─────┬───────────────────┐
│ studentId ┆ Name ┆ Age ┆ FirstEnrolledDate │
│ ---       ┆ ---  ┆ --- ┆ ---               │
│ i64       ┆ str  ┆ i64 ┆ str               │
╞═══════════╪══════╪═════╪═══════════════════╡
│ 1         ┆ Mike ┆ 24  ┆ 2020-01-17        │
└───────────┴──────┴─────┴───────────────────┘
"""

Summary

Using SQL directly in Polars is pretty simple in that you just need to know a few lines of specific Polars code. Many people working in data are familiar with SQL. The ability to use SQL in Polars helps unlock the performance of Polars to those who otherwise cannot.

Github repo

How to Use SQL in Polars

Introduction

Use SQL in Polars

Summary

References

Leave a Reply Cancel reply