.pipe() in pandas changed how I write data pipelines

Posted by Economy-Concert-641@reddit | Python | View on Reddit | 45 comments

Been using .pipe() in pandas lately and it's been a game changer — anyone else?

I was writing some data transformation code the other day and stumbled across .pipe(). Honestly didn't expect much, but it completely changed how I structure my pipelines.

Instead of this mess:

df_final = sort_by_total(calculate_total(filter_by_price(df)))

You just write it top to bottom like a recipe:

df_final = (

df

.pipe(filter_by_price)

.pipe(calculate_total)

.pipe(sort_by_total)

)

Same result, way more readable. Each function takes a DataFrame and returns a DataFrame — that's the only rule.

Full example if you want to try it:

import pandas as pd

df = pd.DataFrame({

"product": ["Product A", "Product B", "Product C", "Product D"],

"price": [20, 150, 230, 100],

"quantity": [10, 5, 3, 8]

})

def filter_by_price(df):

return df[df["price"] > 100]

def calculate_total(df):

return df.assign(total_value=df["price"] * df["quantity"])

def sort_by_total(df):

return df.sort_values("total_value", ascending=False)

df_final = (

df

.pipe(filter_by_price)

.pipe(calculate_total)

.pipe(sort_by_total)

)

Been using it a lot for ETL and data cleaning workflows. Makes debugging way easier too — just comment out one .pipe() step and you see exactly where things go wrong.

Anyone else using this regularly? Any patterns you've found useful with it?