Menu
Stuff by Yuki
  • Home
  • Data Engineering
    • Python
  • Business Intelligence
    • Power BI
    • Tableau
  • Perspectives
  • About
  • Contact
Stuff by Yuki

Handling Missing Values in Polars

Posted on June 16, 2023June 16, 2023
Image by Pierre Bamin on Unsplash

Checking and filling missing values is an important piece in data science and analytics projects. A popular dataframe library, pandas, provides a method like fillna(), for example.

Polars has built-in methods and expressions to work with missing values as well. This post covers ways to check missing values as well as ways how you can fill missing values.

All of my code is available in my GitHub repo.

Null vs NaN

It’s good to go over the difference between “Null” values and “NaN” values. First off, “NaN” stands for “Not a Number”. It’s used to represent missing values in a column of float data type. It’s technically not considered as missing values in Polars. If you’re familiar with pandas or numpy, np.NaN will be treated as NaN in Polars.

On the other hand, “Null” values are just the general null values that you think of. It’s simply a pure missing value, regardless of column’s data type. And null values and NaN values can exist in the same column. A side note, Python’s “None” values are treated as nulls in Polars.

One important thing to remember is that “Null” and “NaN” values are treated differently in Polars. For instance, if you fill null values using fill_null(), that doesn’t fill NaN values, and vice-versa.

Check Missing Values in Polars

Let’s say we have a dataframe that looks like this:

Copy Copied Use a different Browser

import polars as pl

df = pl.DataFrame({
    'Letter': ['A', 'B', 'C', 'D', 'F', None, 'H', 'I', 'J'],
    'Value': [1.0, 2, 3, None, 5, 6, 7, None, float("nan")]
})
"""
shape: (9, 2)
┌────────┬───────┐
│ Letter ┆ Value │
│ ---    ┆ ---   │
│ str    ┆ f64   │
╞════════╪═══════╡
│ A      ┆ 1.0   │
│ B      ┆ 2.0   │
│ C      ┆ 3.0   │
│ D      ┆ null  │
│ F      ┆ 5.0   │
│ null   ┆ 6.0   │
│ H      ┆ 7.0   │
│ I      ┆ null  │
│ J      ┆ NaN   │
└────────┴───────┘
"""

It contains both None values as well as a NaN value.

One of the things you could use to check missing values is .null_count(). This is available both for Polars dataframe and series.

Copy Copied Use a different Browser

print(df.null_count())  # null_count on the whole dataframe
print(df.select(pl.col('Value').null_count()))   # null_count on a column
"""
shape: (1, 2)
┌────────┬───────┐
│ Letter ┆ Value │
│ ---    ┆ ---   │
│ u32    ┆ u32   │
╞════════╪═══════╡
│ 1      ┆ 2     │
└────────┴───────┘
shape: (1, 1)
┌───────┐
│ Value │
│ ---   │
│ u32   │
╞═══════╡
│ 2     │
└───────┘
"""

As you can see, .null_count() only counts null values, ignoring NaN values.

There doesn’t exist a method like .null_count() for NaN values yet as of today, but there is one similar to .is_null(), which is .is_nan(). You can use it to get the count of NaN values with the code like the followings:

Copy Copied Use a different Browser

print(
    df
    .filter(pl.col('Value').is_nan())
    .select(pl.count()) 
)
"""
shape: (1, 1)
┌───────┐
│ count │
│ ---   │
│ u32   │
╞═══════╡
│ 2     │
└───────┘
"""

Copy Copied Use a different Browser

print(
    df
    .filter(pl.col('Value').is_nan())
    .shape[0]
)
"""
1
"""

On a side note, you can also use .is_not_null() and .is_not_nan() to get the exact opposite result of .is_null() and .is_nan()

Fill Missing Values in Polars

For filling missing values, there are methods available for both Null values and NaN values using .fill_null() and .fill_nan(). You can use them on a dataframe level as well as column level. I will demonstrate how to use .fill_null() only as the functionality is the same for .fill_na().

There are 3 ways you can fill missing values:

  1. Using literal values
  2. Using a strategy
  3. Using Expression(s)

Using Literal Value

It pretty much explains itself. You specify a fixed value and Null values are replace with that value.

Copy Copied Use a different Browser

print(
    df.select(pl.col('Value').fill_null(100))
)
"""
shape: (9, 1)
┌───────┐
│ Value │
│ ---   │
│ f64   │
╞═══════╡
│ 1.0   │
│ 2.0   │
│ 3.0   │
│ 100.0 │
│ 5.0   │
│ 6.0   │
│ 7.0   │
│ 100.0 │
│ NaN   │
└───────┘
"""

Using a Strategy

If you’ve used pandas to fill missing values, you might be familiar with strategies such as backfill and forward fill. Similar options are available in Polars as well. The current list is this: ‘forward’, ‘backward’, ‘min’, ‘max’, ‘mean’, ‘zero’, ‘one’.

Copy Copied Use a different Browser

# with forward strategy
print(
    df.select(pl.col('Value').fill_null(strategy='forward'))
)
"""
shape: (9, 1)
┌───────┐
│ Value │
│ ---   │
│ f64   │
╞═══════╡
│ 1.0   │
│ 2.0   │
│ 3.0   │
│ 3.0   │
│ 5.0   │
│ 6.0   │
│ 7.0   │
│ 7.0   │
│ NaN   │
└───────┘
"""

Copy Copied Use a different Browser

# with backward strategy
print(
    df.select(pl.col('Value').fill_null(strategy='backward'))
)
"""
shape: (9, 1)
┌───────┐
│ Value │
│ ---   │
│ f64   │
╞═══════╡
│ 1.0   │
│ 2.0   │
│ 3.0   │
│ 5.0   │
│ 5.0   │
│ 6.0   │
│ 7.0   │
│ NaN   │
│ NaN   │
└───────┘
"""

Using Expressions

This is where Polars shines. You can use your own expression to fill missing values. Your options won’t be limited to the list of fill strategies available. In this example, I’m calculating max and min of ‘Value’ column and multiple that by 10. You probably won’t do this in real world but this is to demonstrate a way to fill missing values using expressions

Copy Copied Use a different Browser

# with expressions
print(
    df.select(pl.col('Value').fill_null(
        (pl.col('Value').max() - pl.col('Value').min()) * 10
    ))
)  
"""
shape: (9, 1)
┌───────┐
│ Value │
│ ---   │
│ f64   │
╞═══════╡
│ 1.0   │
│ 2.0   │
│ 3.0   │
│ 60.0  │
│ 5.0   │
│ 6.0   │
│ 7.0   │
│ 60.0  │
│ NaN   │
└───────┘
"""

Drop Missing Values

Besides filling missing values, one other thing you can do is to drop rows that contain missing values.

Polars has this method only for both Null values as of today. The method .drop_nulls() is available both for dataframe and lazyframe.

If you want to drop rows that contain NaN values then you’d need to fill NaN values with Null values first, and then apply .drop_nulls().

Copy Copied Use a different Browser

# drop nulls values
print(df.drop_nulls())
"""
shape: (6, 2)
┌────────┬───────┐
│ Letter ┆ Value │
│ ---    ┆ ---   │
│ str    ┆ f64   │
╞════════╪═══════╡
│ A      ┆ 1.0   │
│ B      ┆ 2.0   │
│ C      ┆ 3.0   │
│ F      ┆ 5.0   │
│ H      ┆ 7.0   │
│ J      ┆ NaN   │
└────────┴───────┘
"""

Copy Copied Use a different Browser

# drop nan values, after converting them to nulls
print(df.fill_nan(None).drop_nulls())
"""
shape: (5, 2)
┌────────┬───────┐
│ Letter ┆ Value │
│ ---    ┆ ---   │
│ str    ┆ f64   │
╞════════╪═══════╡
│ A      ┆ 1.0   │
│ B      ┆ 2.0   │
│ C      ┆ 3.0   │
│ F      ┆ 5.0   │
│ H      ┆ 7.0   │
└────────┴───────┘
"""

Summary

I hope this blog post helps you learn how to work with and fill missing values in Polars. Knowing how to handle missing values is beneficial as a data analytics/science professional.

GitHub repo

References

  • https://pola-rs.github.io/polars-book/user-guide/expressions/null/#missing-data-metadata
  • https://pola-rs.github.io/polars-book/

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Where I’m Headed in the Next 5 Years
  • Open-Source vs Vendor Data Tools
  • Developing the Habit of Writing
  • How to Inspect and Optimize Query Plans in Python Polars
  • Learn Python Polars with Polars Cookbook

Popular Posts

  • A Running Total Calculation with Quick Measure in Power BI
  • A Complete Guide to Git Integration in Power BI
  • How To Copy And Paste Report Page in Power BI
  • Handling Missing Values in Polars
  • How to Convert String to Date or Datetime in Polars

connect with me

  • LinkedIn
  • Twitter
  • Github
  • Website

Search Articles

©2025 Stuff by Yuki | Powered by SuperbThemes