Ongoing

Korean Beneish M-Score (kr-beneish)

Builder · 2026 · 8 min read

Korean IFRS adaptation of the 8-variable Beneish M-Score calibrated against a 50-case labeled dataset (17 FSS/SFC fraud, 13 clean, 20 auto-controls); Korean threshold -2.45 vs. US -1.78; flags ~1,250 of 7,447 KOSDAQ company-years (2018–2023).

Overview

A Python implementation of the Beneish M-Score adapted for Korean IFRS reporting. The 8-variable formula is preserved, but two ratios (GMI, SGAI) are zeroed out for nature-of-expense-method issuers who do not separately report COGS — a structural feature of Korean IFRS disclosures that invalidates both ratios for the majority of KOSDAQ companies. The Korean-specific threshold (-2.45) was derived by bootstrap calibration against a 50-case labeled dataset drawn from FSS and SFC enforcement records, replacing the US academic threshold (-1.78) which was calibrated on US GAAP cost-of-goods-separate filers. TATA (total accruals to total assets) exhibits a documented sign inversion on KOSDAQ relative to the original Beneish formula direction, which is preserved in this implementation.

Problem

The Beneish M-Score is the dominant academic tool for earnings manipulation screening, but its US threshold (-1.78) was calibrated on US GAAP companies that separately report COGS. Korean IFRS allows the nature-of-expense method, which omits COGS from the income statement — making GMI and SGAI undefined or misleading for the majority of KOSDAQ issuers. Applying the US threshold to Korean data produces an uncalibrated signal on a mismatched input set. No prior published work had recalibrated the threshold using Korean enforcement case data, and no open Python implementation addressed the IFRS disclosure format differences.

Constraints

  • Nature-of-expense method issuers cannot compute GMI (gross margin index) or SGAI (SG&A expense index) — both ratios require separately disclosed COGS, which these issuers do not report
  • TATA exhibits a sign inversion on KOSDAQ: negative accruals (overstated liabilities reverting) are more prevalent in Korean fraud cases than the positive-accrual pattern Beneish observed in US data
  • The labeled training set is 50 cases (17 fraud, 13 clean, 20 auto-controls) — small enough that bootstrap confidence intervals for the Korean threshold are wide: CI [-3.50, -1.60]
  • Per-year 1%/99% winsorization is required to prevent extreme ratio values from KOSDAQ micro-caps from distorting cross-sectional distributions
  • Beneish coverage requires 2 years of DART financial history — companies with fewer than 2 years of filings cannot be scored

Approach

Implemented the 8-variable formula with Korean IFRS structural adjustments: GMI and SGAI set to 1.0 (neutral) for nature-of-expense-method issuers rather than excluded or imputed. TATA direction confirmed via labeled-case analysis: the sign flip is preserved. Bootstrap threshold calibration over 50 labeled cases using 10,000 bootstrap draws to find the threshold maximizing F1 at the Korean KOSDAQ base rate. Final threshold -2.45 (CI: [-3.50, -1.60]). Validated against an additional held-out fold. Applied to 7,447 KOSDAQ company-years (2018–2023) to generate the scored universe used by the forensic-accounting-toolkit pipeline. DSRI > 2.0 identified as the single strongest individual predictor among the eight variables — a receivables-to-sales ratio that doubles year-over-year warrants examination regardless of the composite score.

Key Decisions

Zero-out GMI and SGAI for nature-of-expense issuers rather than excluding them

Reasoning:

Excluding nature-of-expense issuers would drop the majority of KOSDAQ from the scored universe — exactly the segment most likely to contain manipulation targets. Imputing GMI and SGAI from industry averages introduces model-dependent assumptions that cannot be verified against the labeled set. Setting both to 1.0 (neutral: no gross margin change, no SG&A change) understates the M-Score for issuers where these ratios would be informative, but preserves the remaining six ratios' signal and allows full-universe coverage. The tradeoff favors coverage: the six remaining ratios, especially DSRI and TATA, are the empirically stronger signals on the Korean labeled set.

Alternatives considered:
  • Exclude nature-of-expense issuers entirely — drops coverage for the majority of KOSDAQ; inconsistent with the goal of a full-universe screen
  • Impute GMI from industry-median gross margins — model-dependent, unvalidated against the labeled set, adds maintenance burden when industry classifications change

Bootstrap calibration against Korean enforcement cases rather than applying US threshold -1.78 directly

Reasoning:

The US threshold was estimated on a US GAAP dataset with different accounting standards, different capital market structure, and a different base rate of enforcement actions. Korean KOSDAQ has a structurally different distribution of accrual ratios due to the IFRS differences described above. Applying -1.78 to Korean data produces a flag rate inconsistent with the empirical enforcement incidence. Bootstrap calibration against the Korean labeled set produces a threshold that is grounded in actual Korean enforcement outcomes, even if the CI is wide due to sample size.

Alternatives considered:
  • Apply US threshold -1.78 directly — fast, no labeled data required; but produces uncalibrated signal on a population the threshold was not designed for
  • Train a fully Korean-specific model from scratch — more statistically sound but requires a larger labeled set than 50 cases to produce stable parameters

Per-year 1%/99% winsorization of ratio inputs

Reasoning:

KOSDAQ micro-caps produce extreme ratio values from small denominators — a company with near-zero beginning-of-year assets produces an AQI that swamps the cross-sectional distribution. Winsorizing within each fiscal year prevents a single extreme observation from distorting the year's threshold calculation. Global (across-year) winsorization would be distorted by temporal shifts in the KOSDAQ composition; per-year winsorization adapts to each year's distribution.

Alternatives considered:
  • No winsorization — extreme micro-cap ratios dominate the distribution in all years
  • Global winsorization across all years — year composition changes distort the global percentile thresholds

Tech Stack

  • Python ≥3.11, uv
  • pandas, numpy
  • scipy.stats (bootstrap sampling, confidence intervals)
  • scikit-learn (RandomForestClassifier, cross-validation)
  • DART OpenAPI (IFRS financial statements)
  • pytest

Result & Impact

  • 7,447 (2018–2023)
    KOSDAQ company-years scored
  • ~16.8% (~1,250 company-years)
    Flag rate at Korean threshold (-2.45)
  • 50 cases (17 fraud, 13 clean, 20 auto-controls)
    Labeled training set
  • 0.756 ± 0.192
    RF AUC (10-fold CV)

The first open-source Beneish M-Score implementation calibrated for Korean IFRS disclosures. The Korean threshold (-2.45) was derived from actual FSS and SFC enforcement records — not from the US academic literature. The TATA sign flip and the GMI/SGAI zero-out for nature-of-expense issuers are documented, reproducible, and testable against the labeled set. These adjustments are necessary for any credible application of the M-Score to KOSDAQ data.

Learnings

  • Korean IFRS nature-of-expense disclosure makes GMI and SGAI undefined for the majority of KOSDAQ issuers — this is a structural feature, not an edge case. Any Korean M-Score implementation that does not address it is silently producing incorrect ratios for roughly 60% of the scored population.
  • TATA sign flip is real and reproducible on the labeled dataset. Korean fraud cases disproportionately involve overstated liabilities that revert (negative accruals at fraud discovery), which is the opposite of the US inflated-receivables / capitalized-expense pattern Beneish observed.
  • Bootstrap CI width (~2.0 range for the Korean threshold) reflects genuine labeled-set uncertainty, not a calibration failure. Expanding the labeled dataset — not refining the calibration method — is the path to a tighter threshold.
  • DSRI > 2.0 is the single most actionable signal: a receivables-to-sales ratio that doubles year-over-year warrants immediate examination regardless of the composite M-Score. In the labeled set, DSRI was the dominant separator between fraud and clean cases.

The Korean IFRS Problem

The Beneish M-Score formula has eight input ratios. Two of them — GMI (gross margin index) and SGAI (SG&A expense index) — require a separately disclosed cost of goods sold line on the income statement.

Korean IFRS allows the nature-of-expense method: expenses are reported by category (materials, personnel, depreciation) rather than function (cost of goods sold, SG&A). For issuers using this method, there is no COGS line. GMI and SGAI are not computable.

This is not a minority case. The majority of KOSDAQ issuers use the nature-of-expense method. Any Korean M-Score implementation that treats missing COGS as an error or as zero will produce incorrect ratios for most of the population it is trying to score.

The kr-beneish implementation sets GMI and SGAI to 1.0 for these issuers — the neutral value (no change in gross margin, no change in SG&A intensity). This understates the M-Score for companies where these ratios would have been informative, but it preserves the remaining six ratios and allows full-universe coverage.

Korean Threshold Calibration

The US academic threshold (-1.78) was estimated on US GAAP companies in the 1980s–1990s. Applying it to 2018–2023 Korean data produces a flag rate that does not correspond to the empirical frequency of enforcement actions in the KOSDAQ universe.

The Korean threshold (-2.45) was derived by bootstrap calibration on a 50-case labeled set: 17 FSS/SFC enforcement fraud cases, 13 clean-audit-opinion controls matched by year and sector, and 20 auto-controls drawn from non-enforcement years of enforcement-case companies. 10,000 bootstrap draws were used to find the threshold maximizing F1 at the Korean base rate of enforcement.

The 95% CI is [-3.50, -1.60] — wide, because 50 cases is a small sample. A larger labeled dataset would narrow it. The threshold is directionally sound but should be treated as an estimate, not a precise cutoff.

TATA Sign Inversion

The original Beneish formula assigns a positive coefficient to TATA (total accruals to total assets): higher positive accruals → higher M-Score → more manipulation risk. This reflects the US fraud pattern of inflated receivables and capitalized expenses producing positive accruals.

On the Korean labeled set, the fraud cases are disproportionately characterized by negative TATA — overstated liabilities that reverse suddenly at the enforcement trigger. The sign inversion means applying the original coefficient direction would reduce the M-Score for a pattern that the Korean data treats as a fraud signal.

The kr-beneish implementation preserves the sign inversion as documented: for Korean IFRS data, negative TATA increases the fraud signal rather than decreasing it. This is validated against the labeled set and documented as a breaking difference from the original formula.