Menu
Stuff by Yuki
  • Home
  • Data Engineering
    • Python
  • Business Intelligence
    • Power BI
    • Tableau
  • Perspectives
  • About
  • Contact
Stuff by Yuki

How to Calculate Percent of Total in Polars

Posted on May 25, 2023

Have you ever need to calculate the percent of total for your data? I bet you’ve encountered situations like that many times. It’s a common analysis, that if you know how to do, would benefit you as a data professional.

In this blog post, I’ll demonstrate how to do that in Polars. Here’s the link to the GitHub repo.

Calculate Percent of Total in Polars

In this example, my code calculates the percent of total count per particular grain. What you need is the count of rows for each value of a particular column as well as the total row count. And you divide the former by the latter. polars.count() comes in useful here as it gives you the row count in a specific context. There may be some other ways I have not found yet, but what I’ll demonstrate does the job.

Let’s say you have a dataframe like the following:

Copy Copied Use a different Browser

import polars as pl

df = pl.LazyFrame(
    {
        'Letter': ['A','A','A','B','B','C','D','D','D','D','D','E','F','G'],
        'Value': [1,2,3,4,5,6,7,8,9,10,11,12,13,14]
     }
)

One way to get the percent of total count, is to calculate the row count for each value of a column first, and then add a column calculating the total count. And finally do the division using those count columns.

Copy Copied Use a different Browser

df1 = (
    df
    .groupby('Letter')
    .agg(
        pl.count().alias('Cnt Per Letter')
    )
    .with_columns(
        (
            (pl.col('Cnt Per Letter') / pl.col('Cnt Per Letter').sum()).round(2) * 100
        )
        .cast(pl.UInt64).alias('Percent of Total')
    )
    .sort('Letter')
)

print(df1.collect())
'''
output:
shape: (7, 3)
┌────────┬────────────────┬──────────────────┐
│ Letter ┆ Cnt Per Letter ┆ Percent of Total │
│ ---    ┆ ---            ┆ ---              │
│ str    ┆ u32            ┆ u64              │
╞════════╪════════════════╪══════════════════╡
│ A      ┆ 3              ┆ 21               │
│ B      ┆ 2              ┆ 14               │
│ C      ┆ 1              ┆ 7                │
│ D      ┆ 5              ┆ 36               │
│ E      ┆ 1              ┆ 7                │
│ F      ┆ 1              ┆ 7                │
│ G      ┆ 1              ┆ 7                │
└────────┴────────────────┴──────────────────┘
'''

Another way I came up with is to use a window function to do the same.

Copy Copied Use a different Browser

df2 = (
    df
    .with_columns(
        pl.col('Value').count().over('Letter').alias('Cnt Per Letter'),
        pl.count().alias('Total Cnt')
    )
    .with_columns(
        (
            (pl.col('Cnt Per Letter') / pl.col('Total Cnt')).round(2) * 100
        )
        .cast(pl.UInt64).alias('Percent of Total')
    )
    .unique('Letter')
    .select(['Letter', 'Cnt Per Letter', 'Percent of Total'])
    .sort('Letter')
)

print(df2.collect())
'''
output:
shape: (7, 3)
┌────────┬────────────────┬──────────────────┐
│ Letter ┆ Cnt Per Letter ┆ Percent of Total │
│ ---    ┆ ---            ┆ ---              │
│ str    ┆ u32            ┆ u64              │
╞════════╪════════════════╪══════════════════╡
│ A      ┆ 3              ┆ 21               │
│ B      ┆ 2              ┆ 14               │
│ C      ┆ 1              ┆ 7                │
│ D      ┆ 5              ┆ 36               │
│ E      ┆ 1              ┆ 7                │
│ F      ┆ 1              ┆ 7                │
│ G      ┆ 1              ┆ 7                │
└────────┴────────────────┴──────────────────┘
'''

Summary

There are ways to calculate percent of total in Polars as well as in other tools. I hope this post gives an idea of how you can do that and apply it in other analytics use cases using Polars.

GitHub repo

References

  • https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.count.html
  • https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/groupby.html
  • https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/api/polars.DataFrame.with_columns.html
  • https://stackoverflow.com/questions/75080918/possible-to-calculate-counts-and-percentage-in-one-chain-using-polars
  • https://stackoverflow.com/questions/72747398/best-way-to-get-percentage-counts-in-polars

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Where I’m Headed in the Next 5 Years
  • Open-Source vs Vendor Data Tools
  • Developing the Habit of Writing
  • How to Inspect and Optimize Query Plans in Python Polars
  • Learn Python Polars with Polars Cookbook

Popular Posts

  • A Running Total Calculation with Quick Measure in Power BI
  • A Complete Guide to Git Integration in Power BI
  • How To Copy And Paste Report Page in Power BI
  • Handling Missing Values in Polars
  • How to Convert String to Date or Datetime in Polars

connect with me

  • LinkedIn
  • Twitter
  • Github
  • Website

Search Articles

©2025 Stuff by Yuki | Powered by SuperbThemes