HumanMint - Normalizing & Cleaning Government Contact Data

Posted by AmbitiousTie@reddit | Python | View on Reddit | 1 comments

Hey r/Python!

I just released a small library I've been building at work for cleaning messy human-centric data: HumanMint.

Think government contact records with chaotic names, weird phone formats, noisy department strings, inconsistent titles, etc.

It was coded in a single day, so expect some rough edges, but the core works surprisingly well.

Note: This is my first public library, so feedback and bug reports are very welcome.

What it does (all in one mint() call)

Example

from humanmint import mint

result = mint(
    name="Dr. John Smith, PhD",
    email="JOHN.SMITH@CITY.GOV",
    phone="(202) 555-0173",
    address="123 Main St, Springfield, IL 62701",
    department="000171 - Public Works 850-123-1234 ext 200",
    title="Chief of Police",
)

print(result.model_dump())

Result (simplified):

Why I built it

I work with thousands of US local-government contacts, and the raw data is wildly inconsistent.

I needed a single function that takes whatever garbage comes in and returns something normalized, structured, and predictable.

Features beyond mint()

Install

pip install humanmint

Repo

https://github.com/RicardoNunes2000/HumanMint

If anyone wants to try it, break it, suggest improvements, or point out design flaws, I'd love the feedback.

The whole goal was to make dealing with messy human data as painless as possible.