Blight is a pipeline, not a list of complaints.
Kansas City publishes property code violations and its dangerous-buildings list as open data — but spread across separate datasets, in two different schemas (a frozen 2009–2021 archive and a current EnerGov feed), with dirty, mixed-type columns and no shared clean key beyond the parcel PIN. Looked at row-by-row, a violation is just an isolated complaint.
The interesting questions are longitudinal: how many cited properties become repeat offenders, how few escalate all the way to demolition, where blight concentrates, and how fast the city actually responds — by council district. Answering those requires stitching the sources into one dimensional model that follows a single property across fifteen years.
Extract-load, then model the funnel.
A Python script pulls all three Socrata datasets into a BigQuery raw layer (landed as all-STRING to survive dirty source columns), then a dbt project transforms them through staging → intermediate → marts. The centerpiece is an accumulating-snapshot fact — one row per property — that encodes the whole lifecycle in a single current_stage column:
-- fact_property_lifecycle.sql — one row per property (PIN) CASE WHEN COALESCE(d.is_demolition_status, FALSE) THEN 'demolition' WHEN d.pin IS NOT NULL THEN 'dangerous_building' WHEN p.total_violations >= 2 THEN 'repeat_violations' ELSE 'single_violation' END AS current_stage
- Sources: ~800K historical violations + ~175K current + the dangerous-buildings list, joined on parcel PIN (~972K fact rows after deduping)
- Two facts: a transaction-grain
fact_violationand the accumulating-snapshotfact_property_lifecycle; dims for property, council district, violation type, and date - Real-world cleaning: unified the two violation schemas, normalized ordinances across eras, KC-bounds-filtered junk geocodes, nulled resolution dates that precede the found date
- 34 tests (surrogate-key uniqueness, referential integrity, accepted values, singular data-quality checks) plus CI and a weekly scheduled refresh on GitHub Actions
The funnel, quantified.
Across 2005–2026, 79,892 properties drew ~972,000 violations. The funnel narrows hard: 85.6% of cited properties are repeat offenders, but only 0.4% reach the dangerous-buildings list and 0.03% active demolition. Citywide, 87.9% of violations are eventually resolved — at a median of 230 days.
Blight is sharply concentrated and response time is uneven: Council District 3 alone accounts for ~380K violations, the highest escalation rate, and the slowest response (260-day median) — versus District 2 at 134 days, a near-2× accountability gap. The single most-cited property racked up 282 violations; the most common citation, at 21% of all violations, is animal feces. The full Looker Studio board is embedded on the homepage.