Ongoing

krff-shell — Natural Language Finance Shell

Builder · 2026 · 8 min read

11-tool MCP server exposing the Korean forensic finance toolkit as natural-language queries via Claude Desktop — DuckDB query layer, parameterized SQL (injection-safe), per-company self-contained HTML reports (Plotly), optional Claude narrative synthesis. FastAPI REST endpoints mirror all 11 MCP tools. 317 tests.

Overview

An MCP (Model Context Protocol) server and CLI that wraps the forensic-accounting-toolkit's analytics as 11 tools callable from Claude Desktop or any MCP-compatible client. Tools span the full analytical stack: company registry lookups, trading calendar computations, Beneish M-Score screening, CB/BW dilution assessment, anomaly score retrieval, and JFIA literature search. All tools execute parameterized SQL through a shared DuckDB connection layer (`krff/db.py`) — no string interpolation in query construction. Per-company HTML reports embed interactive Plotly charts as self-contained files with no external CDN dependency. FastAPI REST endpoints mirror all 11 MCP tools for non-MCP integrations.

Problem

The forensic-accounting-toolkit's 12 component libraries each have their own Python API. Using them together requires knowing which library to call, which parquet files to load, how to join results across data sources, and how to format output for human review. A practitioner answering 'which KOSDAQ companies had both a high M-Score and a CB repricing event in the same fiscal year?' must orchestrate at least three libraries, two parquet schemas, and undocumented join logic. krff-shell collapses this into a single natural-language query: the MCP layer handles routing, data loading, and result formatting, while the practitioner describes what they want.

Constraints

Local-only deployment — krff-shell runs against local parquet files produced by the toolkit's data pipeline; no hosted service, no shared database; each user runs their own instance
MCP protocol requires all tool inputs and outputs to be JSON-serializable — Plotly figure objects must be serialized and returned as HTML strings, not as native objects
SQL injection risk in DuckDB: parameterized queries using `?` placeholders work in WHERE clauses but not for table names or column identifiers — dynamic table selection required a different injection mitigation strategy
FastAPI REST endpoints must maintain schema parity with MCP tool signatures — drift between the two interfaces is not caught by the type checker and required dedicated parity tests
Claude narrative synthesis is optional and gated behind an API key check — the shell must be fully functional without an active Anthropic API key

Approach

Built `krff/db.py` as a singleton DuckDB connection manager with typed parameterized query helpers. 11 tools implemented as standard Python functions; MCP registration and FastAPI route definition both reference the same function — no code duplication between interfaces. Per-company HTML reports use Plotly with `include_plotlyjs=True` (not CDN) so reports are viewable offline and do not depend on external asset availability. JFIA search (Tool 11) loads `jfia_catalog.json` at startup and runs keyword + abstract full-text matching, returning structured citation records. The test suite (317 tests) covers all 11 tool response schemas, all REST endpoint input/output contracts, SQL injection attempts on every parameterized field, and report HTML structure validation.

Key Decisions

Parameterized SQL for all DuckDB queries rather than f-string interpolation

Reasoning:

The initial implementation used f-strings to construct DuckDB queries — fast to write, readable, and common in data analysis scripts. During the test suite build, a SQL injection test case against the company code parameter produced a query that executed successfully against an unintended table. DuckDB's in-memory model means injection doesn't reach a networked database server, but it can expose parquet file contents or corrupt an in-memory state that the session reuses. Parameterized queries using `?` placeholders in WHERE clauses, combined with an allowlist approach for table name selection, eliminated the injection surface.

Alternatives considered:

f-string interpolation throughout — readable but injectable; acceptable for single-user local scripts, not for a tool callable by an LLM agent
ORM (SQLAlchemy) over DuckDB — parameterization handled automatically; adds a dependency and a translation layer between DuckDB's columnar semantics and the ORM's row-oriented model

Self-contained HTML reports (Plotly `include_plotlyjs=True`) rather than CDN-dependent or hosted dashboard

Reasoning:

Per-company reports are shared with practitioners and saved to local directories. A CDN-dependent report that embeds `<script src='https://cdn.plot.ly/...'>` becomes non-functional when viewed offline or in an air-gapped environment. A hosted dashboard requires a running server and a session. Self-contained HTML files (single file, no external dependencies) can be emailed, archived, or viewed in a browser with no infrastructure. The file size increase (~3 MB per report vs. ~50 KB with CDN) is acceptable for artifacts intended for human review rather than bulk storage.

Alternatives considered:

CDN-dependent Plotly (`include_plotlyjs='cdn'`) — smaller file, same visual output; breaks offline and in environments where CDN is blocked
Hosted FastAPI dashboard with shared state — better for real-time updates; requires a running server and introduces session management complexity

FastAPI REST layer mirroring all 11 MCP tools

Reasoning:

MCP is the correct interface for Claude Desktop. But not all downstream integrations are MCP-compatible — programmatic scripts, CI pipelines, and non-Claude LLM clients use HTTP APIs. Building the FastAPI layer using the same function signatures as the MCP tools (no separate implementation) costs minimal additional development but preserves integration flexibility. The parity test suite ensures both interfaces return identical outputs for identical inputs.

Alternatives considered:

MCP-only, no REST layer — simpler codebase; removes the programmatic integration option before there is confirmed demand for it
REST-only with no MCP — loses the Claude Desktop integration that makes natural-language queries possible

Tech Stack

Python ≥3.11, uv
MCP SDK (Anthropic)
FastAPI + uvicorn
DuckDB
Plotly
pandas, PyArrow
Anthropic SDK (claude-sonnet-4-6) — optional narrative synthesis
jfia_catalog.json (Tool 11 data source)
pytest (317 tests)

Result & Impact

11

MCP tools
317

Tests
11 (mirrors MCP tools)

REST endpoints
Self-contained HTML (Plotly)

Report format

The conversational interface for the forensic-accounting-toolkit. Questions that previously required orchestrating multiple libraries and parquet schemas are answered as a single Claude Desktop query. The parameterized SQL layer and 317-test coverage make it the most hardened component in the toolkit from a code-quality standpoint. The self-contained HTML report format has been used in practitioner outreach — a single file attachment that opens in any browser with no dependencies.

Learnings

SQL injection testing should be part of the initial test suite, not a later hardening pass. The DuckDB injection surface was only discovered when building tests for the demo — if tests had been written first, the f-string interpolation pattern would have been caught before it appeared in the main implementation.
MCP tool design is an API design problem. The first iteration of several tools returned raw DataFrames serialized to JSON — useful for machine consumption, not for the Claude narrative layer or for human review. The second iteration returns structured objects with a summary string, a findings list, and a data block — a format that the LLM can narrate and a human can skim.
Parity between MCP tools and REST endpoints is a maintenance constraint, not a one-time setup. The dedicated parity test suite (checking identical inputs produce identical outputs across both interfaces) is what catches drift when one interface's response schema is updated without updating the other.
Optional Claude narrative synthesis is the right default. Practitioners using the tool for rapid screening do not need narrative for every result — they need the data fast. Narrative is valuable for the final output (the per-company HTML report) and for explaining anomalies to a non-technical stakeholder. Gating it behind an API key check makes the dependency explicit.

Architecture

krff-shell is organized in three layers:

Query layer — krff/db.py

A singleton DuckDB connection manager. All 11 tools route their data access through this module. Parameterized helpers ensure that no tool constructs a query by string interpolation — all user-supplied values pass through ? placeholders in WHERE clauses. Table name selection uses an allowlist pattern (not parameterization, which DuckDB does not support for identifiers).

Tool layer — 11 MCP tools + 11 FastAPI routes

Each tool is a plain Python function. MCP registration and FastAPI route definition both reference the same function — there is no separate implementation for the two interfaces. Tool categories:

Company data — registry lookup, financial statement retrieval, trading calendar
Forensic screening — Beneish M-Score, CB/BW dilution assessment, anomaly scores
Report generation — per-company self-contained HTML report
Literature — JFIA article search (Tool 11, loads jfia_catalog.json at startup)

Report layer — krff/reports.py

Generates per-company HTML files with interactive Plotly charts. include_plotlyjs=True embeds the Plotly JavaScript bundle in each file — reports are self-contained, offline-capable, and email-safe.

SQL Injection Mitigation

DuckDB’s parameterized query support covers WHERE clause values but not table names or column identifiers. The initial implementation used f-strings for table name selection — a pattern discovered during test construction to be exploitable within the in-memory session.

The fix: an allowlist for all table name inputs. If a caller passes a table name not in the allowlist, the query returns an empty result rather than executing against an unintended target. SQL injection tests for all 11 parameterized fields are in the test suite and run on every commit.

Dual Interface

	MCP	REST
Client	Claude Desktop, MCP-compatible	HTTP, programmatic scripts
Input format	JSON tool call	HTTP request body
Output format	MCP tool response	JSON response
Schema source	Same Python function	Same Python function
Parity test	Yes — 317 tests include interface parity checks

Portfolio

Overview

Problem

Constraints

Approach

Key Decisions

Parameterized SQL for all DuckDB queries rather than f-string interpolation

Self-contained HTML reports (Plotly `include_plotlyjs=True`) rather than CDN-dependent or hosted dashboard

FastAPI REST layer mirroring all 11 MCP tools

Tech Stack

Result & Impact

Learnings

Architecture

SQL Injection Mitigation

Dual Interface

Writing