Polars being one of the best Python libraries to work with data, it’s still new and it lacks some functionalities you may find in pandas, for example.
But did you know there is a way to add your own custom functionality/method to Polars without having to go through the process and complexity of contributing to the codebase?
You can basically “register” your custom functionality to Polars’ namespace (expression, dataframe, lazyframe, and series).
If you want to follow along, here’s the link to the Github repo.
2 Ways to Extend the Polars API
Technically there is one way of registering your functionality to Polars’ namespace, but you can do it in two different ways.
One way is to use the decorator explained in Polars’ website. For my use case, I’m adding a custom expression to title case string values.
import polars as pl
#define and register your custom functionality
@pl.api.register_expr_namespace('custom')
class CustomStringMethodsCollection:
def __init__(self, expr: pl.Expr):
self._expr = expr
def to_title_case(self) -> pl.Expr:
convert_to_title = (
pl.element().str.slice(0, 1).str.to_uppercase()
+
pl.element().str.slice(1).str.to_lowercase()
)
converted_elements = (
self._expr
.str.split(' ')
.arr.eval(convert_to_title)
.arr.join(separator=' ')
)
return converted_elements
# see if this works
df = pl.LazyFrame(
{'Name': ['mike mikEMiKe', 'SaRaH SarAhSarah MIKe', 'your name']}
)
print(
df.with_columns(
pl.col('Name').custom.to_title_case()
)
.collect()
)
'''
output:
shape: (3, 1)
┌───────────────────────┐
│ Name │
│ --- │
│ str │
╞═══════════════════════╡
│ Mike Mikemike │
│ Sarah Sarahsarah Mike │
│ Your Name │
└───────────────────────┘
'''
What you’d probably do in practice is to create a dedicated file to store custom expressions and reference them into your code. You can refer to this post on how I’m doing that.
Another way is to your custom method to the base class, explained in this thread in stack overflow
import polars as pl
def to_title_case(self) -> pl.Expr:
convert_to_title = (
pl.element().str.slice(0, 1).str.to_uppercase()
+
pl.element().str.slice(1).str.to_lowercase()
)
converted_elements = (
self
.str.split(' ')
.arr.eval(convert_to_title)
.arr.join(separator=' ')
)
return converted_elements
pl.Expr.to_title_case = to_title_case
df = pl.LazyFrame(
{'Name': ['mike mikEMiKe', 'SaRaH SarAhSarah MIKe', 'your name']}
)
print(
df.with_columns(
pl.col('Name').to_title_case()
)
.collect()
)
'''
output:
┌───────────────────────┐
│ Name │
│ --- │
│ str │
╞═══════════════════════╡
│ Mike Mikemike │
│ Sarah Sarahsarah Mike │
│ Your Name │
└───────────────────────┘
'''
You’re adding a custom method directly to the base class. The first approach is probably better especially when you have multiple custom functionalities you’re trying to implement.
Summary
Being able to add your own custom functionality enriches Polars’ potential even more. Polars is trying to catch up to a framework like pandas in terms of the number operations or things you can do. Until then, we just create what we need if that doesn’t yet exist in Polars 🙂