// projects / case-study · 01

Kansas City
blight lifecycle.

A dbt-bigquery project tracing two decades of Kansas City property blight — from a property's first code violation, through repeat violations, to a "dangerous building" designation and demolition. Three KCMO open-data sources, a Python extract-load layer, and an accumulating-snapshot model. Dashboard embedded on the homepage.

// source
data.kcmo.org · 3 datasets
// type
bigquery · live
// 01 / problem

Blight is a pipeline, not a list of complaints.

Kansas City publishes property code violations and its dangerous-buildings list as open data — but spread across separate datasets, in two different schemas (a frozen 2009–2021 archive and a current EnerGov feed), with dirty, mixed-type columns and no shared clean key beyond the parcel PIN. Looked at row-by-row, a violation is just an isolated complaint.

The interesting questions are longitudinal: how many cited properties become repeat offenders, how few escalate all the way to demolition, where blight concentrates, and how fast the city actually responds — by council district. Answering those requires stitching the sources into one dimensional model that follows a single property across fifteen years.

// 02 / approach

Extract-load, then model the funnel.

A Python script pulls all three Socrata datasets into a BigQuery raw layer (landed as all-STRING to survive dirty source columns), then a dbt project transforms them through staging → intermediate → marts. The centerpiece is an accumulating-snapshot fact — one row per property — that encodes the whole lifecycle in a single current_stage column:

-- fact_property_lifecycle.sql — one row per property (PIN)
CASE
  WHEN COALESCE(d.is_demolition_status, FALSE) THEN 'demolition'
  WHEN d.pin IS NOT NULL                       THEN 'dangerous_building'
  WHEN p.total_violations >= 2                 THEN 'repeat_violations'
  ELSE 'single_violation'
END AS current_stage
  • Sources: ~800K historical violations + ~175K current + the dangerous-buildings list, joined on parcel PIN (~972K fact rows after deduping)
  • Two facts: a transaction-grain fact_violation and the accumulating-snapshot fact_property_lifecycle; dims for property, council district, violation type, and date
  • Real-world cleaning: unified the two violation schemas, normalized ordinances across eras, KC-bounds-filtered junk geocodes, nulled resolution dates that precede the found date
  • 34 tests (surrogate-key uniqueness, referential integrity, accepted values, singular data-quality checks) plus CI and a weekly scheduled refresh on GitHub Actions
// 03 / outcome

The funnel, quantified.

properties cited
79,892
repeat offenders
85.6%
median time to fix
230 days

Across 2005–2026, 79,892 properties drew ~972,000 violations. The funnel narrows hard: 85.6% of cited properties are repeat offenders, but only 0.4% reach the dangerous-buildings list and 0.03% active demolition. Citywide, 87.9% of violations are eventually resolved — at a median of 230 days.

Blight is sharply concentrated and response time is uneven: Council District 3 alone accounts for ~380K violations, the highest escalation rate, and the slowest response (260-day median) — versus District 2 at 134 days, a near-2× accountability gap. The single most-cited property racked up 282 violations; the most common citation, at 21% of all violations, is animal feces. The full Looker Studio board is embedded on the homepage.

// 04 / stack

Stack.

Python Socrata API dbt-core dbt-bigquery BigQuery Looker Studio GitHub Actions SQL