Four Signals, One Score: How kr-anomaly-scoring Flags KOSDAQ CB/BW Manipulation - Writing

Source code: kr-anomaly-scoring · Part of the Korean Forensic Accounting Toolkit

The architecture post on splitting the platform into four repos introduced kr-anomaly-scoring in a paragraph. This post goes one level deeper: each signal threshold, why it was chosen, and how the four flags compose into a score that a human analyst can actually act on.

The CB/BW screen is the most developed part of the scoring layer. The disclosure timing module and the officer-network centrality module are covered at the end.

The Four CB/BW Flags

A Korean convertible bond or bond-with-warrant event gets scored against four binary flags. Each flag either fires or doesn’t. The anomaly score is the count of flags that fired — an integer from 0 to 4.

Flag 1 — Repricing below 95% of market (리픽싱)

if float(rp_price) < market_price_at_rp * REPRICING_DISCOUNT_RATIO:  # 0.95
    repricing_flag = True

What it detects: A CB’s repricing clause (refixing) that adjusted the conversion price downward to below 95% of the market price on the repricing date.

Why 0.95: The 5% band accommodates execution timing — a repricing negotiated over several days may close slightly below the current quote without being manipulative. A repricing to 85% or 70% of market is economically indefensible as compensation for credit risk and is the pattern associated with intentional dilution. The 5% threshold catches the tail while excluding minor timing slippage.

Data source: Repricing history is parsed from the repricing_history JSON field in cb_bw_events.parquet, drawn from DART’s CB prospectus sub-documents.

Flag 2 — Exercise within 5 calendar days of price peak

if abs((ex_date - peak_date).days) <= EXERCISE_PEAK_WINDOW_CALENDAR_DAYS:  # 5
    exercise_cluster_flag = True

What it detects: A conversion or warrant exercise event that occurred within 5 calendar days of the highest closing price in a ±60 trading day window around the issuance date.

Why 5 calendar days, not trading days: Calendar days are used intentionally for simplicity — the forensic question is temporal proximity, not market-session precision. A bondholder who exercises on Monday when the peak was the prior Friday has 3 calendar days between them. Using trading days would treat that as 1 day, which is too tight. The calendar-day window better reflects the actual decision lag.

Why peak within ±60 trading days: The 60-day window captures the issuance period during which informed parties would know the conversion terms and could time exercise to maximize the spread between exercise price and current price. Beyond 60 trading days (roughly 3 months), the CB has aged into a normal secondary-market instrument.

What the flag captures: A bondholder who exercises at exactly the price peak, or within days of it, is either extraordinarily lucky or timing on information. The flag alone is not evidence — many exercises happen near local peaks by chance. As one of four flags, it contributes weight to a composite signal.

Flag 3 — Volume surge above 3× pre-event baseline

baseline_vol = df_pre[vol_col].mean()  # 30 days before the ±60 window
volume_ratio = event_vol / baseline_vol
if volume_ratio > VOLUME_SURGE_RATIO:  # 3.0
    volume_flag = True

What it detects: Average trading volume in the ±60-day event window more than 3× the 30-day pre-event baseline.

Why 3×: A 2× volume increase can occur from routine news or sector rotation. A 3× increase during the issuance window — when insiders know the CB terms and potential arbitrageurs are positioning — is anomalous. The 3× threshold is conservative enough to filter noise but sensitive enough to catch the coordinated buying patterns associated with CB manipulation.

Why mean volume over the window: The forensic question is whether the overall issuance period attracted unusual activity, not whether one specific day was anomalous. A single news spike could produce a daily peak of 10× baseline; spreading the measurement over the ±60-day window requires sustained elevated activity, which is harder to produce accidentally.

Flag 4 — Officer holdings decrease post-exercise

if pre_shares > 0 and post_shares < pre_shares * HOLDINGS_DECREASE_RATIO:  # 0.95
    holdings_flag = True

What it detects: Total officer holdings (from DART 임원 대량보유 disclosures) decreasing by more than 5% between pre-issuance and post-exercise periods for the same corporation.

Why 0.95 (5% decrease threshold): Officer holdings fluctuate for routine reasons — vesting schedules, tax sales, charitable transfers. A 5% decrease is the minimum threshold that distinguishes deliberate reduction from routine variation. In the context of a CB/BW event, officer holdings decrease post-exercise suggests insiders were selling into the liquidity created by the bondholder’s conversion.

Data source: DART 임원 대량보유 (5%+ ownership changes) from officer_holdings.parquet. Coverage is limited to officers who cross the 5% threshold disclosure requirement — lower-percentage holdings are invisible to this screen.

Score Composition: Additive, Not Multiplicative

anomaly_score = len(flags)  # 0, 1, 2, 3, or 4

The score is the count of flags. Not a probability. Not a weighted sum. Four flags = score 4. Three flags = score 3.

Why additive: Each flag has a false positive rate individually. A company might have repricing below 95% for legitimate credit-risk compensation. A company might have officers selling for tax reasons. The conjunction of multiple flags is what makes a case forensically interesting. Multiplicative scoring would produce extreme values when all flags fire, making the scale hard to interpret operationally.

Why integer, not probability: A probability estimate requires calibrated training data — labeled fraud cases, clean controls, and a model that can assign defensible likelihoods. This screen does not have that. The enforcement dataset (240 cases, 86 DART-matched) is too small for a reliable classifier. An integer score is honest about what the screen can and cannot do: it can rank cases by how many signals fired, not by how likely fraud is.

Operating thresholds:

Score 0: No flags. No review warranted from this screen.
Score 1–2: Marginal. Queue for secondary screening alongside other signals (Beneish M-Score, disclosure timing).
Score ≥ 3: High risk. Priority queue for human review. In the CB/BW context, three or four simultaneous flags — repricing below market, exercise at peak, volume surge, officer sell-down — represent a coherent manipulation pattern, not independent coincidences.

Disclosure Timing Score

The disclosure timing module applies a different scoring approach — continuous rather than discrete — because it measures a different quantity.

anomaly_score = abs(price_change_pct) * volume_ratio * gap_hours
flag = abs(price_change_pct) >= 5.0 and volume_ratio >= 2.0

What it detects: Material DART disclosures filed after market hours where the same-day or prior-day price and volume movement is inconsistent with the filing time. A disclosure filed at 18:00 KST (DART’s typical batch upload window) with a 2.5-hour gap from the 15:30 market close — and the stock moved 8% that day with 4× baseline volume — suggests the information was known before the public filing.

The gap factor: 2.5 hours for same-day (filing at 18:00, close at 15:30). 15 hours for prior-day (filing at 18:00, open at 09:00 the next trading day). Multiplying by gap hours weights prior-day price moves higher than same-day moves: a large price change occurring 15 hours before the disclosure was filed carries more forensic weight than one occurring 2.5 hours before.

Flag condition: Price change ≥ 5% AND volume ≥ 2× baseline. Both conditions must hold — a large price move on low volume is typically news-driven; the conjunction of price and volume movement is what suggests front-running.

Borderline capture: Cases with price change ≥ 3% (TIMING_BORDERLINE_PRICE_PCT) are retained in output even when below the full flag threshold. They appear in the results table flagged as borderline, available for context in human review without being counted in the formal flag summary.

Officer Network Centrality

The officer_network.py module builds a cross-company directorship graph and computes centrality metrics. Nodes are people; edges connect two people who simultaneously held board positions at the same company. Centrality measures how many board networks a person sits at the hub of.

Why centrality is a forensic signal: The JFIA literature on Korean chaebol governance documents that enforcement cases disproportionately involve officers who hold positions across multiple companies simultaneously. A high-centrality officer can coordinate transactions — CB issuances, related-party transfers, asset movements — across portfolio companies in ways that are invisible to any single company’s disclosure record. Officer-network centrality is the network-level complement to the per-company signals above.

What the module produces: A per-officer centrality score and a per-company flag for companies whose officers are in the top centile of the directorship network. This flag does not have the same threshold logic as the CB/BW flags — it is a qualitative marker that the company’s leadership is unusually embedded in the broader KOSDAQ governance network.

What the Score Measures and What It Doesn’t

The CB/BW anomaly score measures: how many of four statistically anomalous patterns appeared in the data for a given issuance event.

It does not measure: the probability that manipulation occurred. The false positive rate at each signal individually is expected to be high — 40% of flagged CB events are likely explainable by legitimate business circumstances. The score’s value is as a ranked priority queue: a score-4 case gets reviewed before a score-1 case, and both get reviewed before score-0.

What makes a score-3 or score-4 case forensically interesting is the conjunction. Repricing below market alone is common. Exercise at peak alone is plausible. Volume surge alone is explainable. All three together, for the same issuance, from the same company, with officers selling post-exercise: that pattern is what the human analyst is looking for. The screen surfaces it. The analyst determines what it means.

The data pipeline that feeds this screen is described in Splitting a Forensic-Finance Monolith into Four Repos. The delivery layer — the Marimo apps that visualize the scores and the MCP server that makes them queryable by Claude — is described in An MCP Server for Korean Forensic Finance.