Korean Housing Risk — Public Dataset
The public storefront for the Korean District Credit Risk Index — sample district reports, a national 16-year trend chart, and methodology documentation, published under CC BY-NC 4.0. The first publicly available 시군구-level court registry time series covering 228 Korean districts.
Overview
A public GitHub repository that mirrors approved outputs from the Korean District Credit Risk Index — the private, monthly-updating district credit risk pipeline built on 16 years of IROS court registry data. The mirror publishes what an institutional buyer needs to assess methodology and reproduce key findings: one sample district PDF report, one national trend chart across all five registration types, and full methodology documentation. The full dataset, composite district rankings, and scoring methodology remain proprietary.
Problem
An institutional data product — a B2B credit risk index sold to 저축은행 and 상호금융 lenders — needs a credibility signal that outreach emails alone cannot provide. A lender receiving a cold email about a district risk index has no way to verify the claim independently without seeing the data. A public repository containing sample outputs and methodology documentation converts an unverifiable claim into an independently-assessable one.
Constraints
- Must publish enough to demonstrate methodology without revealing enough to reverse-engineer the composite scoring model — the full 50+ district ranking CSVs contain sufficient information to reconstruct the weighting scheme and undermine the commercial proposition
- CC BY-NC 4.0 covers the published output charts and reports; the underlying collection methodology and raw dataset are proprietary
- Updates flow one direction only (private → public) on demand — the mirror is never the source of truth and must never diverge from the private pipeline's validated outputs
Approach
Selective publication: one sample district PDF report (수원시 March 2026 — a terminal-stage district, the most commercially compelling example), one 6-panel national trend PNG covering all five registration types from 2010 to 2026, and three methodology documents (PRODUCT_SPEC.md, METHODOLOGY.md, CROSS_VALIDATION.md). These outputs are generated by the private pipeline and pushed manually when approved for release. The repository README contextualizes the findings and provides an institutional inquiry contact path.
Key Decisions
Sample report for 수원시, not a low-distress district
수원시 is currently in the terminal distress stage — the most commercially relevant scenario for a 저축은행 considering non-metropolitan loan expansion. A sample report for a recovering district would demonstrate the data format but not the commercial use case. The sample report must show the tool doing something a lender actually cares about.
- Anonymised sample district — less credible; lenders can verify 수원시 against their own portfolio exposure
- Publish all 228 district snapshots — reveals the composite ranking, which is the proprietary commercial asset
Manual publication cadence, not automated sync
The public mirror should contain only outputs that have been explicitly approved for release. An automated sync would publish new district data immediately after each pipeline update, before review. Given that the composite rankings are commercially sensitive, manual review of each publication is the correct default.
- Automated weekly sync — faster to update but risks publishing data before the commercial relationship is established
Tech Stack
- Python (pipeline, report generation — in parent project district-credit-risk)
- PDF (district snapshot reports)
- PNG (national trend chart)
- Markdown (methodology documentation)
- CC BY-NC 4.0
Result & Impact
- 228Districts in underlying dataset
- 16Years of public time series
- 1 (수원시)Sample reports published
- 3Methodology documents
The first publicly available 시군구-level Korean housing registry time series. National aggregate findings: 39.5% of 2023 임차권 registrations concentrated in 10 districts (up from 30.1% in 2010); 5 districts currently in terminal distress stage; 17-month median cascade from 임차권 spike to 강제경매 spike. These findings are independently reproducible from IROS public data by any analyst with API access.
Learnings
- A public mirror with selective outputs is more credible than a full-access dataset because the act of curation signals that the producer understands which parts of the analysis are defensible versus which require further validation.
- The sample report choice (수원시 at terminal stage) is a product decision, not a data decision. The right sample is the one that demonstrates the tool's commercial value, not the one that is easiest to generate.
- CC BY-NC 4.0 for outputs with proprietary methodology is a sustainable combination: it encourages citation and builds credibility without giving away the collection infrastructure.
Public outputs
Contents:
reports/pdf/snapshot_수원시_2026_03.pdf— 2-page institutional-grade district snapshot. Registration trend, distress stage classification (terminal), year-on-year comparison.reports/charts/national_trend.png— 6-panel national aggregate chart, 2010–2026. Top panels: credit activity (근저당, 전세권). Bottom panels: distress signals (가압류, 임차권, 강제경매). The 전세사기 crisis window (2022–2024) is visible as the 임차권 spike — the largest concentrated signal in the 16-year series.METHODOLOGY.md— data sourcing, registration type definitions, composite score construction (high-level).PRODUCT_SPEC.md— full product specification.CROSS_VALIDATION.md— framework for validating against external public datasets.
Full dataset and composite district rankings available upon institutional inquiry.