Skip to content

Pandas to_datetime: Convert Strings, Timestamps, and Mixed Formats

Updated on

Date and time columns are among the messiest fields in any dataset. CSV exports arrive with dates formatted as "01/02/2025" (is that January 2nd or February 1st?). API responses return Unix timestamps in seconds, milliseconds, or nanoseconds. Spreadsheet exports mix "March 5, 2024" with "2024-03-05" in the same column. Logs from different servers carry different timezone offsets. Until you convert these strings and numbers into proper datetime objects, you cannot sort chronologically, resample to weekly totals, or compute time deltas.

pd.to_datetime() is the single function in pandas designed to handle all of these cases. It parses strings, integers, floats, and even separate year/month/day columns into pandas Timestamp or DatetimeIndex objects. This guide walks through every parameter, shows copy-ready code for the most common conversion patterns, explains error handling, covers the Pandas 2.0+ format changes, and finishes with a real-world data cleaning example.

📚

Quick Reference

The most common usage requires just one line:

import pandas as pd
 
df['date'] = pd.to_datetime(df['date_string'])

For a column with a known format, specifying the format is both faster and safer:

df['date'] = pd.to_datetime(df['date_string'], format='%Y-%m-%d')

For messy data where some values cannot be parsed:

df['date'] = pd.to_datetime(df['date_string'], errors='coerce')

Full Syntax

pd.to_datetime(
    arg,
    errors='raise',
    dayfirst=False,
    yearfirst=False,
    utc=False,
    format=None,
    exact=True,
    unit=None,
    infer_datetime_format=lib.no_default,  # deprecated in 2.0
    origin='unix',
    cache=True
)

Parameters Table

ParameterTypeDefaultDescription
argscalar, list, Series, DataFramerequiredThe data to convert. Accepts a string, int, float, list, tuple, Series, DataFrame, or dict.
formatstr or NoneNoneThe strftime-compatible format string to parse the date. When None, pandas infers the format. In Pandas 2.0+, also accepts 'mixed' and 'ISO8601'.
errors'raise', 'coerce', 'ignore''raise''raise' throws an exception on unparseable values. 'coerce' sets them to NaT. 'ignore' returns the original input unchanged.
dayfirstboolFalseIf True, parses "01/02/2025" as February 1st (day first). Useful for European date formats.
yearfirstboolFalseIf True, parses "25/01/02" as 2025-01-02 (year first). Takes precedence over dayfirst.
utcboolFalseIf True, returns UTC-localized DatetimeIndex. Converts timezone-aware inputs to UTC.
unitstr or NoneNoneThe unit of the numeric argument: 'D', 's', 'ms', 'us', 'ns'. Used when arg contains integers or floats representing elapsed time since origin.
originscalar'unix'The reference date for numeric conversions. 'unix' means 1970-01-01. You can pass a custom timestamp string or Timestamp.
exactboolTrueIf True, requires the format to match exactly. If False, allows the format to match anywhere in the string.
cacheboolTrueIf True, uses a cache of unique converted dates to speed up parsing of duplicate values.

Format Codes

When you specify format, use Python's strftime / strptime directives:

CodeMeaningExample
%Y4-digit year2025
%y2-digit year25
%mMonth as zero-padded number01 to 12
%dDay as zero-padded number01 to 31
%HHour (24-hour clock)00 to 23
%IHour (12-hour clock)01 to 12
%MMinute, zero-padded00 to 59
%SSecond, zero-padded00 to 59
%fMicrosecond, zero-padded to 6 digits000000 to 999999
%pAM or PMAM, PM
%zUTC offset+0000, -0530
%ZTimezone nameUTC, EST
%AFull weekday nameMonday
%aAbbreviated weekdayMon
%BFull month nameJanuary
%bAbbreviated month nameJan
%jDay of year001 to 366
%%Literal %%

Examples

String to Datetime

The simplest case: a column of ISO 8601 date strings.

import pandas as pd
 
dates = pd.Series(['2025-01-15', '2025-02-20', '2025-03-25'])
result = pd.to_datetime(dates)
print(result)

Output:

0   2025-01-15
1   2025-02-20
2   2025-03-25
dtype: datetime64[ns]

Custom Format Strings

When dates do not follow ISO 8601, provide an explicit format.

European date format (day/month/year):

import pandas as pd
 
dates = pd.Series(['15/01/2025', '20/02/2025', '25/03/2025'])
result = pd.to_datetime(dates, format='%d/%m/%Y')
print(result)

Output:

0   2025-01-15
1   2025-02-20
2   2025-03-25
dtype: datetime64[ns]

Timestamp with time component:

import pandas as pd
 
timestamps = pd.Series(['2025-01-15 14:30:00', '2025-02-20 09:15:45'])
result = pd.to_datetime(timestamps, format='%Y-%m-%d %H:%M:%S')
print(result)

Output:

0   2025-01-15 14:30:00
1   2025-02-20 09:15:45
dtype: datetime64[ns]

Month name format:

import pandas as pd
 
dates = pd.Series(['January 15, 2025', 'February 20, 2025'])
result = pd.to_datetime(dates, format='%B %d, %Y')
print(result)

Output:

0   2025-01-15
1   2025-02-20
dtype: datetime64[ns]

Handling Unparseable Values with errors='coerce'

Real datasets contain bad dates. Setting errors='coerce' converts unparseable values to NaT (Not a Time) instead of raising an exception. This keeps your pipeline running while marking bad rows for review.

import pandas as pd
 
dates = pd.Series(['2025-01-15', 'not_a_date', '2025-03-25', ''])
result = pd.to_datetime(dates, errors='coerce')
print(result)

Output:

0   2025-01-15
1          NaT
2   2025-03-25
3          NaT
dtype: datetime64[ns]

You can then find the bad rows:

bad_rows = result.isna()
print(f"Found {bad_rows.sum()} unparseable dates at indices: {bad_rows[bad_rows].index.tolist()}")

Output:

Found 2 unparseable dates at indices: [1, 3]

Unix Timestamps to Datetime

APIs and databases frequently store dates as Unix timestamps. Use the unit parameter to specify the time unit.

Seconds since epoch:

import pandas as pd
 
timestamps = pd.Series([1609459200, 1612137600, 1614556800])
result = pd.to_datetime(timestamps, unit='s')
print(result)

Output:

0   2021-01-01
1   2021-02-01
2   2021-03-01
dtype: datetime64[ns]

Milliseconds since epoch (common in JavaScript):

import pandas as pd
 
timestamps_ms = pd.Series([1609459200000, 1612137600000, 1614556800000])
result = pd.to_datetime(timestamps_ms, unit='ms')
print(result)

Output:

0   2021-01-01
1   2021-02-01
2   2021-03-01
dtype: datetime64[ns]

Custom origin:

import pandas as pd
 
# Days since 2020-01-01
days = pd.Series([0, 31, 60])
result = pd.to_datetime(days, unit='D', origin='2020-01-01')
print(result)

Output:

0   2020-01-01
1   2020-02-01
2   2020-03-01
dtype: datetime64[ns]

Timezone-Aware Conversion

Use utc=True to get UTC-localized output. This is especially important when your data mixes timezone offsets.

import pandas as pd
 
dates = pd.Series([
    '2025-01-15 10:00:00+05:30',
    '2025-01-15 10:00:00-05:00',
    '2025-01-15 10:00:00+00:00'
])
result = pd.to_datetime(dates, utc=True)
print(result)

Output:

0   2025-01-15 04:30:00+00:00
1   2025-01-15 15:00:00+00:00
2   2025-01-15 10:00:00+00:00
dtype: datetime64[ns, UTC]

You can then convert to any other timezone using .dt.tz_convert():

eastern = result.dt.tz_convert('US/Eastern')
print(eastern)

Mixed Formats in the Same Column

Sometimes a single column contains dates in multiple formats. On pandas versions before 2.0, you typically use errors='coerce' and parse in stages. On Pandas 2.0+, format='mixed' handles this directly.

Pre-2.0 approach (multi-pass):

import pandas as pd
 
dates = pd.Series(['2025-01-15', '15/02/2025', 'March 25, 2025', '2025.04.10'])
 
# First pass: ISO format
result = pd.to_datetime(dates, errors='coerce')
 
# Second pass: fill NaT with day/month/year attempt
mask = result.isna()
result[mask] = pd.to_datetime(dates[mask], format='%d/%m/%Y', errors='coerce')
 
# Third pass: full month name
mask = result.isna()
result[mask] = pd.to_datetime(dates[mask], format='%B %d, %Y', errors='coerce')
 
# Fourth pass: dot-separated
mask = result.isna()
result[mask] = pd.to_datetime(dates[mask], format='%Y.%m.%d', errors='coerce')
 
print(result)

Output:

0   2025-01-15
1   2025-02-15
2   2025-03-25
3   2025-04-10
dtype: datetime64[ns]

Combining Year, Month, and Day Columns

When your DataFrame has separate columns for year, month, and day, pass the DataFrame directly to pd.to_datetime(). The function looks for columns named year, month, day (and optionally hour, minute, second).

import pandas as pd
 
df = pd.DataFrame({
    'year': [2025, 2025, 2025],
    'month': [1, 6, 12],
    'day': [15, 20, 31]
})
df['date'] = pd.to_datetime(df[['year', 'month', 'day']])
print(df)

Output:

   year  month  day       date
0  2025      1   15 2025-01-15
1  2025      6   20 2025-06-20
2  2025     12   31 2025-12-31

With time components:

import pandas as pd
 
df = pd.DataFrame({
    'year': [2025, 2025],
    'month': [3, 7],
    'day': [10, 22],
    'hour': [14, 9],
    'minute': [30, 0],
    'second': [0, 45]
})
df['datetime'] = pd.to_datetime(df)
print(df['datetime'])

Output:

0   2025-03-10 14:30:00
1   2025-07-22 09:00:45
Name: datetime, dtype: datetime64[ns]

Performance: Explicit Format vs. Inferred Format

Specifying the format parameter is significantly faster than letting pandas infer the format, because inference requires pandas to inspect every value and try multiple parsing strategies.

import pandas as pd
import numpy as np
 
# Generate 1 million date strings
n = 1_000_000
dates = pd.Series([f'2025-{np.random.randint(1,13):02d}-{np.random.randint(1,29):02d}' for _ in range(n)])
 
# Slow: let pandas infer the format
%timeit pd.to_datetime(dates)
# ~1.2s per loop
 
# Fast: specify the format explicitly
%timeit pd.to_datetime(dates, format='%Y-%m-%d')
# ~180ms per loop (6-7x faster)

Rule of thumb: If you know the format and every value follows it, always pass format. It is both faster and eliminates ambiguity (e.g., 01/02/2025 being parsed as January 2nd instead of February 1st).

Using dayfirst and yearfirst

These parameters resolve ambiguity in date strings.

import pandas as pd
 
# Without dayfirst: "01/02/2025" is parsed as January 2nd
print(pd.to_datetime('01/02/2025'))
# 2025-01-02 00:00:00
 
# With dayfirst=True: "01/02/2025" is parsed as February 1st
print(pd.to_datetime('01/02/2025', dayfirst=True))
# 2025-02-01 00:00:00
 
# With yearfirst=True: "25/01/02" is parsed as 2025-01-02
print(pd.to_datetime('25/01/02', yearfirst=True))
# 2025-01-02 00:00:00

Warning: dayfirst and yearfirst are not strict. If pandas can parse the string unambiguously in a different way, it may ignore these hints. For reliable parsing, use format instead.

Common Errors and How to Fix Them

OutOfBoundsDatetime

Pandas timestamps have a valid range from approximately 1677-09-21 to 2262-04-11. Values outside this range raise OutOfBoundsDatetime.

import pandas as pd
 
# This will raise an error
try:
    pd.to_datetime('1066-10-14')
except Exception as e:
    print(type(e).__name__, e)

Fix: Use errors='coerce' to convert out-of-range values to NaT (then handle them with fillna()), or work with Python's datetime.datetime objects directly.

Format Mismatch

If you specify format='%Y-%m-%d' but the data contains "15/01/2025", pandas raises a ValueError.

import pandas as pd
 
try:
    pd.to_datetime('15/01/2025', format='%Y-%m-%d')
except ValueError as e:
    print(e)
# time data "15/01/2025" doesn't match format "%Y-%m-%d"

Fix: Ensure the format string matches the actual data, or use errors='coerce' to skip mismatched values.

NaT Handling

NaT (Not a Time) is the datetime equivalent of NaN. Comparisons with NaT always return False, and arithmetic with NaT returns NaT.

import pandas as pd
 
nat = pd.NaT
 
print(nat == nat)    # False
print(nat > pd.Timestamp('2025-01-01'))  # False
print(pd.isna(nat))  # True

Tip: Always use pd.isna() or Series.isna() to check for NaT, not ==.

ValueError with Mixed Timezones

If a column contains dates with different timezone offsets and you try to parse without utc=True, pandas raises an error.

import pandas as pd
 
mixed_tz = pd.Series(['2025-01-15T10:00:00+05:30', '2025-01-15T10:00:00-05:00'])
 
# This raises: ValueError: Tz-aware datetime.datetime cannot be converted to datetime64
# Fix: use utc=True
result = pd.to_datetime(mixed_tz, utc=True)
print(result)

Pandas 2.0+ Changes

Pandas 2.0 introduced important changes to pd.to_datetime():

format='mixed'

Before 2.0, columns with mixed date formats required multi-pass parsing or infer_datetime_format=True. Pandas 2.0+ provides format='mixed', which attempts to parse each element individually.

import pandas as pd
 
dates = pd.Series(['2025-01-15', '15/02/2025', 'March 25, 2025'])
result = pd.to_datetime(dates, format='mixed', dayfirst=True)
print(result)

Output:

0   2025-01-15
1   2025-02-15
2   2025-03-25
dtype: datetime64[ns]

format='ISO8601'

For columns that always contain ISO 8601 strings but with varying precision (some with time, some without), format='ISO8601' is faster and more explicit than format='mixed'.

import pandas as pd
 
dates = pd.Series(['2025-01-15', '2025-02-20T14:30:00', '2025-03-25T09:15:00.123456'])
result = pd.to_datetime(dates, format='ISO8601')
print(result)

Output:

0   2025-01-15 00:00:00.000000
1   2025-02-20 14:30:00.000000
2   2025-03-25 09:15:00.123456
dtype: datetime64[ns]

Deprecation of infer_datetime_format

The infer_datetime_format parameter is deprecated since Pandas 2.0. Use format='mixed' or format='ISO8601' instead. Passing infer_datetime_format=True now produces a FutureWarning.

Pandas < 2.0Pandas 2.0+
infer_datetime_format=Trueformat='mixed' or format='ISO8601'
Multi-pass with errors='coerce'format='mixed' with dayfirst
No type-backed arraysDefault datetime64[ns] backed by Arrow when available

Real-World Example: Cleaning a Sales Dataset

Here is a typical scenario. You receive a CSV export from a sales system where the date column is inconsistent.

import pandas as pd
 
# Simulate the raw data
data = {
    'order_id': [1001, 1002, 1003, 1004, 1005, 1006, 1007],
    'date_raw': [
        '2025-01-15',
        '02/20/2025',
        'March 5, 2025',
        '2025.04.10',
        'PENDING',
        '2025-06-01 14:30:00',
        '1719792000'  # Unix timestamp as string
    ],
    'amount': [250.00, 125.50, 340.00, 89.99, 450.00, 200.00, 175.00]
}
df = pd.DataFrame(data)
print("Raw data:")
print(df)

Step 1: Try standard parsing with coercion.

df['date_clean'] = pd.to_datetime(df['date_raw'], errors='coerce')
print("\nAfter first pass:")
print(df[['order_id', 'date_raw', 'date_clean']])

Step 2: Handle the dot-separated format.

mask = df['date_clean'].isna()
df.loc[mask, 'date_clean'] = pd.to_datetime(
    df.loc[mask, 'date_raw'], format='%Y.%m.%d', errors='coerce'
)

Step 3: Handle the Unix timestamp stored as a string.

mask = df['date_clean'].isna()
# Try converting remaining NaT values as Unix timestamps
numeric_mask = df.loc[mask, 'date_raw'].str.isnumeric().fillna(False)
unix_indices = mask[mask].index[numeric_mask.values]
df.loc[unix_indices, 'date_clean'] = pd.to_datetime(
    df.loc[unix_indices, 'date_raw'].astype(int), unit='s'
)

Step 4: Review remaining failures.

still_missing = df['date_clean'].isna()
print(f"\nStill unparseable ({still_missing.sum()} rows):")
print(df.loc[still_missing, ['order_id', 'date_raw']])

Output:

Still unparseable (1 rows):
   order_id date_raw
4      1005  PENDING

Step 5: Use the cleaned dates for analysis.

# Monthly revenue summary (resample for time series aggregation)
df_valid = df.dropna(subset=['date_clean'])
df_valid = df_valid.set_index('date_clean')
monthly = df_valid['amount'].resample('M').sum()
print("\nMonthly revenue:")
print(monthly)

Visualizing Time Series Data After Conversion

Once your dates are properly converted, you can sort by date, resample to different frequencies, or group by time periods. The next step is usually visualization -- plotting trends, spotting seasonality, or finding outliers. PyGWalker (opens in a new tab) lets you turn a pandas DataFrame into an interactive visual interface directly inside Jupyter Notebook. Instead of writing matplotlib or seaborn code, you drag and drop columns to build time series charts, heatmaps, and bar charts.

pip install pygwalker
import pygwalker as pyg
 
# After cleaning your date column
gwalker = pyg.walk(df_valid.reset_index())

This opens an interactive UI where you can place date_clean on the x-axis and amount on the y-axis to instantly see a time series chart of sales -- no plotting code required.

Frequently Asked Questions

What is the difference between pd.to_datetime() and pd.Timestamp()?

pd.to_datetime() is designed for converting sequences (lists, Series, DataFrame columns) of date-like values into datetime objects. It handles multiple formats, error coercion, and vectorized parsing. pd.Timestamp() creates a single timestamp from a scalar value. Use pd.to_datetime() when working with columns and pd.Timestamp() when you need a single reference point.

How do I convert a datetime column back to a string?

Use the .dt.strftime() accessor. For example, df['date'].dt.strftime('%Y-%m-%d') converts each datetime to a string in YYYY-MM-DD format. You can use any format codes from the format table above.

Why does pd.to_datetime return datetime64[ns] and not a Python datetime?

Pandas uses NumPy's datetime64[ns] type for performance. Operations on datetime64[ns] arrays are vectorized and run at C speed, while Python datetime objects require slow Python-level loops. If you need Python datetime objects, use .dt.to_pydatetime() on the Series.

How do I parse dates during pd.read_csv() instead of after loading?

Pass the column name to the parse_dates parameter: pd.read_csv('file.csv', parse_dates=['date_column']). For custom formats, combine it with date_format (Pandas 2.0+): pd.read_csv('file.csv', parse_dates=['date_column'], date_format='%d/%m/%Y'). In older versions, use date_parser with a lambda.

How do I handle dates before 1677 or after 2262?

Pandas datetime64[ns] has a range of approximately 1677-09-21 to 2262-04-11. For dates outside this range, use pd.Timestamp with unit='us' or unit='ms' to extend the range, or store dates as Python datetime.date objects in an object column. Pandas 2.0 with Arrow-backed arrays (datetime64[us]) extends the range significantly.

Conclusion

pd.to_datetime() is the workhorse function for date conversion in pandas. The key takeaways:

  • Always specify format when you know it. It is faster and eliminates ambiguity.
  • Use errors='coerce' in production pipelines to convert bad dates to NaT instead of crashing.
  • Use unit and origin for numeric timestamps from APIs and databases.
  • Use utc=True when mixing timezone-aware data to get consistent UTC output.
  • Use format='mixed' on Pandas 2.0+ for columns with inconsistent formats instead of multi-pass parsing.
  • Combine year/month/day columns by passing a DataFrame with correctly named columns directly to pd.to_datetime().
  • After converting dates, explore your time series data interactively with PyGWalker (opens in a new tab) -- drag and drop columns to build charts without writing plot code.

Related Guides

📚