We’ve all been there. You get your data and start analyzing it then you realize a date column is actually set to string data type. Whether that’s because the issue in the data source or your program didn’t read it right, it’s beneficial to know how to convert string to date or datetime. I’ll be covering how to do that in Python Polars in this blog post!
There are a few ways to convert string to date or datetime in Polars (precisely it’s data type of Utf8).
- str.to_date()
- str.strptime()
- try_parse_dates or use_pyarrow when reading data (read_csv)
I’m not going to cover “use_pyarrow” parameter option as it does the same thing as “try_parse_dates”.
All my code is available in my GitHub repo.
Convert string to date
You can use all the options mentioned above when converting string to date. Note that “try_parse_dates” parses the string value to date or datetime automatically depending on the original format. e.g. if the format is something like 2023-01-01 then it parses as date, but if it’s something like 2023-01-01 00:00:00 then it parses as datetime.
With these methods, you’d need to specify the format the string date value represents. Here’s a resource you can refer to when getting the correct date/datetime/time format: https://docs.rs/chrono/latest/chrono/format/strftime/index.html
Importing polars library and check data types before converting to date/datetime:
import polars as pl
DATES_FILE_NAME = 'dates.csv'
DATETIMES_FILE_NAME = 'datetimes.csv'
dates_df = pl.read_csv(DATES_FILE_NAME)
datetimes_df = pl.read_csv(DATETIMES_FILE_NAME)
print(dates_df, '\n', datetimes_df) # check original data type, which is str
"""
shape: (100, 1)
┌────────────┐
│ date │
│ --- │
│ str │
╞════════════╡
│ 2023-06-11 │
│ 2023-03-24 │
│ 2023-04-22 │
│ 2023-07-07 │
│ … │
│ 2023-06-30 │
│ 2023-03-04 │
│ 2023-02-02 │
│ 2023-03-30 │
└────────────┘
shape: (100, 1)
┌─────────────────────┐
│ datetime │
│ --- │
│ str │
╞═════════════════════╡
│ 2023-06-11 20:55:57 │
│ 2023-03-24 23:48:28 │
│ 2023-04-22 16:16:38 │
│ 2023-07-07 00:08:36 │
│ … │
│ 2023-06-30 02:12:02 │
│ 2023-03-04 22:12:50 │
│ 2023-02-02 21:12:45 │
│ 2023-03-30 02:57:21 │
└─────────────────────┘
"""
Converting string to date using “to_date” and “strptime”:
# convert string to date
# 1. to_date and strptime
a = (
pl.read_csv(DATES_FILE_NAME)
.with_columns(
pl.col('date').str.to_date(format='%Y-%m-%d').alias('to_date'),
pl.col('date').str.strptime(pl.Date, format='%Y-%m-%d').alias('strptime'),
)
)
print(a) # check data type
"""
shape: (100, 3)
┌────────────┬────────────┬────────────┐
│ date ┆ to_date ┆ strptime │
│ --- ┆ --- ┆ --- │
│ str ┆ date ┆ date │
╞════════════╪════════════╪════════════╡
│ 2023-06-11 ┆ 2023-06-11 ┆ 2023-06-11 │
│ 2023-03-24 ┆ 2023-03-24 ┆ 2023-03-24 │
│ 2023-04-22 ┆ 2023-04-22 ┆ 2023-04-22 │
│ 2023-07-07 ┆ 2023-07-07 ┆ 2023-07-07 │
│ … ┆ … ┆ … │
│ 2023-06-30 ┆ 2023-06-30 ┆ 2023-06-30 │
│ 2023-03-04 ┆ 2023-03-04 ┆ 2023-03-04 │
│ 2023-02-02 ┆ 2023-02-02 ┆ 2023-02-02 │
│ 2023-03-30 ┆ 2023-03-30 ┆ 2023-03-30 │
└────────────┴────────────┴────────────┘
"""
Notice the output data type is “date” instead of “str”.
Let’s see how we can do the same using “try_parse_dates”:
# 2. try_parse_dates when reading
b = (
pl.read_csv(DATES_FILE_NAME, try_parse_dates=True)
)
print(b) # check data type
"""
shape: (100, 1)
┌────────────┐
│ date │
│ --- │
│ date │
╞════════════╡
│ 2023-06-11 │
│ 2023-03-24 │
│ 2023-04-22 │
│ 2023-07-07 │
│ … │
│ 2023-06-30 │
│ 2023-03-04 │
│ 2023-02-02 │
│ 2023-03-30 │
└────────────┘
"""
Convert string to datetime or time
Using str.strptime(), you can also convert string to datetime or time:
# convert string to datetime / time
# 1. strptime - datetime and time
c = (
pl.read_csv(DATETIMES_FILE_NAME)
.with_columns(
pl.col('datetime').str.strptime(pl.Datetime, format='%Y-%m-%d %H:%M:%S').alias('strptime_datetime'),
pl.col('datetime').str.strptime(pl.Datetime, format='%Y-%m-%d %H:%M:%S').alias('strptime_time'),
)
)
print(c)
"""
shape: (100, 3)
┌─────────────────────┬─────────────────────┬─────────────────────┐
│ datetime ┆ strptime_datetime ┆ strptime_time │
│ --- ┆ --- ┆ --- │
│ str ┆ datetime[μs] ┆ datetime[μs] │
╞═════════════════════╪═════════════════════╪═════════════════════╡
│ 2023-06-11 20:55:57 ┆ 2023-06-11 20:55:57 ┆ 2023-06-11 20:55:57 │
│ 2023-03-24 23:48:28 ┆ 2023-03-24 23:48:28 ┆ 2023-03-24 23:48:28 │
│ 2023-04-22 16:16:38 ┆ 2023-04-22 16:16:38 ┆ 2023-04-22 16:16:38 │
│ 2023-07-07 00:08:36 ┆ 2023-07-07 00:08:36 ┆ 2023-07-07 00:08:36 │
│ … ┆ … ┆ … │
│ 2023-06-30 02:12:02 ┆ 2023-06-30 02:12:02 ┆ 2023-06-30 02:12:02 │
│ 2023-03-04 22:12:50 ┆ 2023-03-04 22:12:50 ┆ 2023-03-04 22:12:50 │
│ 2023-02-02 21:12:45 ┆ 2023-02-02 21:12:45 ┆ 2023-02-02 21:12:45 │
│ 2023-03-30 02:57:21 ┆ 2023-03-30 02:57:21 ┆ 2023-03-30 02:57:21 │
└─────────────────────┴─────────────────────┴─────────────────────┘
"""
You can convert string to datetime with “try_parse_dates” parameter option only if the string value is in datetime format. You can’t convert date in string date type to datetime with “try_parse_dates”:
# 2. try_parse_dates
d = (
pl.read_csv(DATETIMES_FILE_NAME, try_parse_dates=True)
)
print(d)
"""
shape: (100, 1)
┌─────────────────────┐
│ datetime │
│ --- │
│ datetime[μs] │
╞═════════════════════╡
│ 2023-06-11 20:55:57 │
│ 2023-03-24 23:48:28 │
│ 2023-04-22 16:16:38 │
│ 2023-07-07 00:08:36 │
│ … │
│ 2023-06-30 02:12:02 │
│ 2023-03-04 22:12:50 │
│ 2023-02-02 21:12:45 │
│ 2023-03-30 02:57:21 │
└─────────────────────┘
"""
Summary
Dealing with date or datetime values stored as string is a common problem we encounter. The good thing is that there are multiple ways to convert string to date or datetime in Polars. And that gives you enough flexibility to choose what to use depending on your specific case.
References
- https://pola-rs.github.io/polars-book/user-guide/transformations/time-series/parsing/#parsing-dates-from-a-file
- https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.str.to_date.html
- https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.str.strptime.html
- https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.read_csv.html