Korean District Credit Risk Index
A 16-year, 228-district time-series of Korean housing credit risk built from public court registry data — tracking the five registration types that lead and lag residential foreclosure, with a statistically consistent 17-month cascade from lease-lien onset to forced auction.
Overview
A monthly-updating district-level credit risk index for Korean residential real estate, built entirely from the 등기정보광장 (IROS) public court registry OpenAPI. The dataset covers 228 시군구 districts from January 2010 to the present across five registration types: 임차권 (court-ordered lease lien), 강제경매 (forced auction), 가압류 (provisional seizure), 전세권 (jeonse deposit right), and 근저당 (collateral mortgage). Each type serves a distinct role — 임차권 and 가압류 are tenant-initiated leading indicators; 강제경매 is the lagging foreclosure signal; 근저당 captures aggregate debt load. The headline empirical finding: a statistically consistent 17-month median lag between an 임차권 spike in a district and a subsequent 강제경매 spike in the same district (district-level R ≈ 0.93). This makes today's 임차권 trend a forward-looking predictor of foreclosure pressure 17 months out — and the core commercial proposition for 저축은행 and 상호금융 lenders evaluating district exposure.
Problem
The FSC's November 2025 supervisory revision assigns 110% weighting to 비수도권 저축은행 loan portfolios, creating a regulatory incentive for savings banks to expand into non-metropolitan districts. The data gap is one level up: which districts to expand into, and which carry deteriorating credit risk. Korean residential credit risk data is fragmented — KCB and NICE cover borrower scores, not district-level property registry trends. Transaction price data (MOLIT 실거래가) reflects completed deals, not the distress signals that precede defaults. No structured, time-series product exists that tracks the court registry leading indicators at the district level. That gap is the commercial opportunity.
Constraints
- IROS OpenAPI issues 1,000 calls per service key per day — a single key cannot complete a full backfill across 228 districts × 5 types × 197 months without hitting the daily limit, requiring 5 separate keys rotated across registration types
- DuckDB on Windows acquires an exclusive file lock — any process that holds the connection open blocks other reads; required explicit try/finally connection management throughout the pipeline
- The IROS API returns APIERROR-0003 (daily limit) silently mixed with valid responses — naively retrying on 0003 consumes the next day's quota; required a dedicated DailyLimitError class to skip remaining batches on limit hit
- 228 districts include 3 abolished administrative units (codes 405, 1511, 1610) that appear in the IROS codebook but return no data — required explicit handling to avoid phantom zero-rows poisoning the time series
Approach
The pipeline collects court registry data from the public IROS OpenAPI rather than transaction prices. Registration events are the earliest publicly-available signal of housing distress — a lease-lien is filed months before an auction is requested, which is filed months before it appears in the registry. By tracking all five registration types independently per district per month, the index captures the full distress lifecycle: debt accumulation (근저당), tenant exposure (임차권, 전세권), active dispute (가압류), and resolution (강제경매). A DuckDB fact table stores the time series; a SQLite state database logs every fetched API call for idempotent backfill. Monthly update runs via a single command (python -m pipeline update) and adds the prior two months. District-level composite scores weight the five types by signal reliability and apply rolling z-scores to normalise for baseline district activity.
Key Decisions
Court registry data over transaction price data
Transaction prices (MOLIT 실거래가) reflect completed deals — they are lagging indicators that confirm distress after it has materialised. Registry data (임차권, 가압류) reflects active legal disputes in progress. The 17-month cascade finding only exists because registry data captures the distress pipeline; price data would show nothing until the district's auctions are already clearing at discount.
- MOLIT 실거래가 + cancellation flags — detects market anomalies but misses the credit risk dimension entirely
- KB시세 + LTV-based scoring — requires commercial data license, not reproducible with public APIs only
Five separate IROS API service keys (one per registration type)
The IROS daily limit is per key, not per account. Assigning one key per registration type means a daily limit hit on 임차권 does not block 강제경매 collection. The five types have different backfill depths — 근저당 data extends furthest back; 임차권 has the most recent volatility — so independent rate limits also allow prioritising the most commercially important types.
- Single key with daily scheduling — backfill takes 5× longer and a single limit hit stalls the entire pipeline
- Multiple accounts — terms of service risk
SQLite state.db as idempotency log separate from DuckDB
DuckDB's UPSERT could handle deduplication, but provides no way to know whether a specific API call was already attempted (as opposed to returning zero rows). The SQLite log records every (district, type, year_month) triple that has been successfully fetched, making the backfill safely resumable after interruption without re-hitting the API.
- Checkpoint files per batch — fragile under partial failures and hard to query across
- DuckDB UPSERT only — cannot distinguish 'not yet fetched' from 'legitimately zero registrations'
Tech Stack
- Python 3.11
- DuckDB (fact table + analytics)
- SQLite (idempotency state log)
- IROS 등기정보광장 OpenAPI (5 service keys)
- pandas, requests, python-dotenv
Result & Impact
- 228Districts covered
- 222,903Rows collected
- 16Years of history
- 5Registration types
- 74Tests passing
- ~0.93Cascade R²
The first publicly-derived, district-level Korean residential credit risk time series covering the full 임차권 → 강제경매 cascade. The 17-month median lag finding — reproducible from IROS public data by any analyst — gives 저축은행 and 상호금융 lenders a forward-looking signal that no commercially available product provides: today's lease-lien spike in a given 시군구 predicts, with strong statistical confidence, the foreclosure pressure that district will face 17 months from now.
Learnings
- Registry data is a structurally better signal for credit risk than transaction prices — it captures the distress pipeline, not just its outcome. This is the insight that makes the product differentiated rather than redundant with existing MOLIT-based tools.
- Idempotency at the API call level, not the batch level, is what makes a multi-month backfill survivable. A state log that records individual (district, type, month) triples costs almost nothing and eliminates the risk of corrupted partial batches.
- The 17-month lag is a district-level finding, not a property-level finding — it emerges from the aggregate series, not from tracking individual properties. Trying to apply it at the property level would be methodologically incorrect.
- Separated API keys per data dimension (one key per registration type) is a simple design choice that quadruples throughput under daily rate limits without any infrastructure complexity.
Public outputs
Sample district report and national trend chart are published at kr-housing-risk — the public storefront for this project. The repository contains:
snapshot_수원시_2026_03.pdf— a two-page institutional-grade district report showing the 수원시 registration time series with composite risk scoringnational_trend.png— a 6-panel national trend chart covering all five registration types from 2010 to 2026
The full dataset (222,903 rows, 228 districts) and composite ranking outputs remain private — they contain enough information to reverse-engineer the scoring methodology.
Related
The district risk signals from this project are consumed read-only by lease-risk-engine, which resolves a tenant’s address to a district risk band and positions it on the 17-month cascade timeline.