Menu
Stuff by Yuki
  • Home
  • Python
  • Power BI
  • Tableau
  • Community
    • Makeover Monday
    • Workout Wednesday
  • About
  • Contact
Stuff by Yuki

Group Rows into List in Polars

Posted on June 2, 2023

I recently encountered a situation where I wanted to consolidate or group rows per group value into a Python list. There seems to be various solutions in pandas (a few resources at the bottom), but how can you do this in Polars?

There are probably multiple ways you can do it in Polars as well. One way I found is using groupby(). It’s easy and simple to implement.

The code I use for this post is found here: GitHub repo

How to Do it

Let’s say you have a dataset like this. Each letter has multiple values. And we want to these values per letter.

Copy Copied Use a different Browser

import polars as pl

df = pl.LazyFrame(
    {
        'Letter': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'D'],
        'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9]
    }
)

All you have to do is to:

  • Use groupby() to group by letters
  • Use agg() to group multiple values or rows into a Python list per letter
Copy Copied Use a different Browser

df = (
    df
    .groupby('Letter')
    .agg(pl.col('Value'))
    .sort('Letter')
)

The whole code looks like this:

Copy Copied Use a different Browser

import polars as pl

df = pl.LazyFrame(
    {
        'Letter': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'D'],
        'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9]
    }
)

df = (
    df
    .groupby('Letter')
    .agg(pl.col('Value'))
    .sort('Letter')
)

print(df.fetch())
'''
output:
shape: (4, 2)
┌────────┬───────────┐
│ Letter ┆ Value     │
│ ---    ┆ ---       │
│ str    ┆ list[i64] │
╞════════╪═══════════╡
│ A      ┆ [1, 2, 3] │
│ B      ┆ [4, 5, 6] │
│ C      ┆ [7, 8]    │
│ D      ┆ [9]       │
└────────┴───────────┘
'''

References

  • https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby
  • https://sparkbyexamples.com/pandas/pandas-group-dataframe-rows-list-groupby/

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Aggregations Over Multiple Columns in Polars
  • DuckDB with Polars, Pandas, and Arrow
  • Read from and Write to Amazon S3 in Polars
  • Handling Missing Values in Polars
  • LazyFrame vs DataFrame in Polars – Performance Comparison

Popular Posts

  • A Running Total Calculation with Quick Measure in Power BI
  • How To Copy And Paste Report Page in Power BI
  • How to Fill Dates Between Start Date and End Date in Power BI (Power Query)
  • Year-Over-Year Calculation: Time Intelligence in Power BI
  • Network Visualizations in Python

connect with me

  • LinkedIn
  • Twitter
  • Github
©2023 Stuff by Yuki | Powered by SuperbThemes & WordPress