I recently encountered a situation where I wanted to consolidate or group rows per group value into a Python list. There seems to be various solutions in pandas (a few resources at the bottom), but how can you do this in Polars?
There are probably multiple ways you can do it in Polars as well. One way I found is using groupby(). It’s easy and simple to implement.
The code I use for this post is found here: GitHub repo
How to Do it
Let’s say you have a dataset like this. Each letter has multiple values. And we want to these values per letter.
import polars as pl
lf = pl.LazyFrame(
{
'Letter': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'D'],
'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9]
}
)
All you have to do is to:
- Use groupby() to group by letters
- Use agg() to group multiple values or rows into a Python list per letter
lf = (
lf
.groupby('Letter')
.agg(pl.col('Value'))
.sort('Letter')
)
The whole code looks like this:
import polars as pl
lf = pl.LazyFrame(
{
'Letter': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'D'],
'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9]
}
)
lf = (
lf
.groupby('Letter')
.agg(pl.col('Value'))
.sort('Letter')
)
print(lf.fetch())
'''
output:
shape: (4, 2)
┌────────┬───────────┐
│ Letter ┆ Value │
│ --- ┆ --- │
│ str ┆ list[i64] │
╞════════╪═══════════╡
│ A ┆ [1, 2, 3] │
│ B ┆ [4, 5, 6] │
│ C ┆ [7, 8] │
│ D ┆ [9] │
└────────┴───────────┘
'''