Ongoing

Korean Real Estate Tax Screen

Builder · 2026 · 5 min read

Screens Korean apartment transaction records for cancellation-cluster anomalies associated with price manipulation and tax evasion, using the MOLIT 실거래가 public API. Module A is live with 37 passing tests.

Overview

A systematic screen for Korean real estate transactions that exhibit statistical patterns associated with 양도세 (capital gains tax) evasion and 부동산 가격 띄우기 (price manipulation via coordinated cancellations). The tool consumes the national MOLIT 실거래가 (actual transaction price) API, scores buildings on anomaly indicators derived from cancellation clusters, 신고가 (record-high price) flags, and 법인-매수 (corporate purchase) patterns, and outputs ranked building reports for review by tax and legal professionals. Module A — transaction cancellation anomaly detection — is live. Module B (below-market transfer screening via 공동주택공시가격 cross-reference) is built and tested on a development branch, pending regulatory calibration before merge.

Problem

Korean real estate tax evasion and price manipulation leave traces in the public transaction record that no existing tool systematically screens. The MOLIT 실거래가 database publishes every registered apartment transaction — including the 해제여부 (cancellation) field that records whether a reported transaction was later cancelled. Coordinated cancellations across a building in a short window are a known manipulation pattern: record-high prices are reported to inflate valuations, then cancelled before transfer tax falls due. Law firms and accounting firms providing real estate tax advisory have no structured tool to flag these patterns for their clients before filing. The screen targets that gap using only public data.

Constraints

The MOLIT API returns raw XML with underdocumented field semantics — the 해제여부 cancellation flag meaning had to be reverse-engineered from transaction records cross-referenced against KAR public investigation reports
Building-level aggregation requires joining by complex address fields (시군구 + 번지 + 동명 + 건물명) that are inconsistently formatted across API responses — required normalisation logic before any reliable building-level grouping
Shared column constants between this project and fraud-screen diverged during early development, requiring extraction to a canonical shared layer (_shared/code/kr_re_data/columns/molit.py) to prevent silent schema drift
Module B threshold constants (THRESHOLD_LOW_*) control the precision-recall tradeoff for below-market transfers — setting them without reference to actual 국세청 enforcement thresholds risks either too many false positives (overloading reviewers) or too many misses

Approach

Phase 1 (Module A) ports the cancellation-cluster detection logic from the fraud-screen project, which first identified the pattern during valuation research. A building's anomaly score aggregates three signals: cancellation rate above district baseline, number of 신고가 transactions within a trailing window, and 법인-매수 concentration. The per-building score is normalised against the same 시군구 × period cohort so rankings are relative to local market behaviour rather than national averages. Output is a ranked CSV of buildings by composite anomaly score, suitable for attorney or accountant review. The pipeline runs with a single command against any 시군구 code and date range.

Key Decisions

Cancellation cluster as primary anomaly signal over raw price deviation

Reasoning:

Raw price outliers generate too many false positives — legitimate record-high transactions occur in every district. The cancellation flag is more specific: a transaction that is reported at a record high and then cancelled before the transfer tax assessment date follows a pattern that legitimate high-price transactions do not. Requiring both the 신고가 flag and a cancellation within a trailing window dramatically reduces noise while preserving sensitivity to the manipulation pattern.

Alternatives considered:

Price-deviation-only screening — produces many false positives in rapidly appreciating districts where genuine 신고가 transactions are common
법인 flag only — under-captures individual-seller manipulation and misses the coordinated-cancellation pattern entirely

Shared column constants extracted to _shared/code rather than duplicated

Reasoning:

This project and fraud-screen both consume the MOLIT 실거래가 API with the same 19 column names. Keeping independent copies meant a silent schema divergence that only surfaces when one project updates a column name and the other does not. The shared layer ensures both projects fail loudly at import time if a constant changes, rather than silently returning wrong data.

Alternatives considered:

Maintain separate constants in each project — divergence risk accumulates over time
Merge fraud-screen and tax-screen into one repo — conflates two products with different buyers and regulatory contexts

Tech Stack

Python 3.11
MOLIT 실거래가 OpenAPI (국토부 실거래가 공개시스템)
pandas, requests
Shared column constants: _shared/code/kr_re_data/columns/molit.py
pytest (37 tests, Module A)

Result & Impact

37 / 37

Tests passing (Phase 1)
해제여부 cancellation cluster

Primary signal
법무법인 · 회계법인

Target buyer

Module A screens any 시군구 × date range for building-level cancellation anomalies using only the public MOLIT 실거래가 dataset. The output is a ranked building list that a tax attorney or accountant can use to prioritise which transactions warrant closer inspection, before a client files or responds to a 국세청 inquiry.

Learnings

The 해제여부 flag in the MOLIT API is more analytically valuable than any price field — it records something a seller cannot easily conceal, whereas a price can be set to anything. Building a screen around the cancellation signal rather than the price signal was the key methodological insight.
Forking from fraud-screen rather than building from scratch saved the first two weeks of API exploration and column-normalisation work, but created a shared-state dependency that had to be made explicit (shared constants layer) before both projects could evolve independently.
Module B exists and is fully tested, but a single regulatory calibration call (국세청 126) blocks its merge. The discipline of not merging until calibration is done is correct — incorrect THRESHOLD_LOW_* values would produce a tool that systematically misses the cases it is designed to catch.

Portfolio