Modern Fraud Systems

January 5, 2026·Bryan Lai

Modern Fraud Systems

Fraud is no longer one person trying one stolen card.

It is scripts, device farms, deepfakes, synthetic identities, mule networks, and repeated probes across many companies.

The goal is not accuracy.

The goal is to stop expensive fraud without drowning good users and operations.

Use metrics that match that goal: recall at a fixed false positive rate, calibration, analyst agreement, and financial reward.

System Map

Move from identity checks to behavior and relationship checks.

Do not only ask:

Who are you?

Also ask:

How do you type? Which device are you using? Who else used that device? Is the camera feed real?

Layer 1: Gateway

KYC and KYB are necessary, but not enough.

Generative AI makes faces, voices, and documents easier to fake.

Collect harder-to-fake signals:

Camera metadata.
Device fingerprint.
IP and network history.
Typing and gesture behavior.
Document metadata.
Prior fraud consortium hits.

Examples:

If ExifTool says the image came from a virtual camera, flag it.
If a new applicant has a clean credit file but an impossible birth-to-credit timeline, review it.
If the same device was caught card testing at another company minutes ago, block or challenge it.

Layer 2: Real-Time Plumbing

If the math in training differs from the math in production, the model will lie.

This is training-serving skew.

Example: training uses avg_spend rounded to 2 decimals, but production sends 4 decimals. That tiny difference can create a false-positive storm.

The fix:

Define features once.
Use the same logic online and offline.
Use stateful streams for recent behavior.
Monitor feature freshness.

For account takeover, seconds matter. If failed_login_count arrives 10 seconds late, the attacker gets 10 seconds for free.

Layer 3: Decision Engine

Stop looking at users alone.

Fraud is relational.

A new user can look clean while sharing a device, IP, address, browser fingerprint, recipient, or behavior pattern with known fraud.

Use a heterogeneous graph.

Examples:

New User A connects to Device X. Device X was used by a money mule last week. Risk should flow through the graph.
time_between_keystrokes_variance near zero suggests automation.
count_distinct_users_on_device_1h exploding suggests a device farm.

Analysts need to see the graph and the score.

If two users have different names but share the same browser fingerprint and physical address, the analyst should see that immediately.

Use a circuit breaker. If the graph service lags, fall back to simple rules so the system keeps running.

Layer 4: Action, Ops, Compliance

Every decision has a legal tail.

Use three tiers:

Tier	Risk Score	Action
Green	0-20	Approve
Yellow	21-79	Add friction: liveness, SMS, manual review
Red	80+	Block or file required report

The manual review queue is training data.

Analyst decisions should feed the model. Measure analyst agreement, or the labels become noise.

For rejections, log a reason. "Denied due to risk score" is not enough. "Denied due to high application volume on this phone number" is better.

If laundering is detected, follow SAR and BSA requirements.

Layer 5: Feedback Loop

Fraud models die when feedback is slow.

Real labels can take 90 days. Chargebacks arrive late. Investigations take time.

Use proxy labels while waiting:

Manual review block.
Failed liveness.
Confirmed device farm.
Repeated failed login pattern.

Monitor:

PSI for drift.
AUPRC for rare-event quality.
Brier score for calibration.
Cohen's Kappa for analyst agreement.

Run shadow deployments before new models go live.

The new model should score production events silently before it gets power.

Optimize Money, Not Kaggle Scores

F1-Score does not pay the bills. Optimize for a custom Financial Reward Function:

Profit = (Value of Stopped Fraud) - (Cost of False Positives * Customer LTV) - (Compute Cost)

A model with lower accuracy but higher profit wins.

Every part of the system must protect the chargeback ratio without blocking too many good users.

Example: Instant Cash

For instant cash, watch first-party fraud.

The user may be real. The intent may be fake.

Use:

Behavior features for repayment intent.
Cost-weighted loss so missing a $1,000 fraud hurts more than missing a$ 100 default.
Idempotency keys so payouts cannot happen twice.
"Break the spell" prompts for APP scams or business email compromise.

The Glossary: Formulas & Definitions

1. Population Stability Index (PSI)

Measures how much the distribution of your risk scores has shifted over time. $PSI = \sum ((Actual\% - Expected\%) \times \ln(\frac{Actual\%}{Expected\%}))$

PSI < 0.1: No shift.
PSI > 0.25: Major shift; retrain immediately.

2. Recall @ 1% FPR (False Positive Rate)

$Recall = \frac{True Positives}{True Positives + False Negatives}$ $FPR = \frac{False Positives}{False Positives + True Negatives}$

3. Brier Score (Calibration / MSE)

$Brier = \frac{1}{N} \sum_{t=1}^{N} (f_t - o_t)^2$ 0 = Perfect accuracy; 1 = Perfect wrongness.

4. Cohen's Kappa ( $\kappa$ )

Measures agreement between human analysts. $\kappa = \frac{p_o - p_e}{1 - p_e}$

$p_o$ : Observed agreement.
$p_e$ : Expected agreement by chance.
Note: If $\kappa < 0.6$ , your manual review is a coin toss.

Definitions

Point-in-Time Correctness: Traveling back in time to see exactly what a feature looked like when a transaction happened.
Training-Serving Skew: When your model's lab environment does not match the real world.
Feature Freshness: The time it takes for an action (event) to become a usable feature.
Inductive Learning: Predicting risk for entities the model has never seen before by looking at neighborhood relationships.
Transductive Learning: Predicting risk only for specific entities seen during training (useless for new attackers).
Heterogeneous Graph: A map of different data types (User connected to IP connected to Device).
Message Passing: How risk leaks from one node to its neighbors in a graph.
Exit Scam / Bust-out Fraud: Building trust and high credit over years just to steal the max limit once and disappear.
Synthetic Identity Fraud: Frankenstein IDs made of real and fake data.
Benford's Law: The math rule that leading digits in real data follow a specific curve. If they are uniform, they are faked.
Idempotency Key: Ensures "Exactly Once" execution. A unique ID that ensures a transaction happens only once, even if the button is clicked twice.
APP (Authorized Push Payment) Scam: Tricking a user into sending their own money to a criminal.
Adverse Action Reason: The legal reason why you said "No" to a customer.
SAR (Suspicious Activity Report): A legal report filed when you detect potential money laundering (AML).
Disparate Impact: When an algorithm unintentionally discriminates against a protected group.
Reasoning Trace: A detailed log of the logic that led to a specific decision.
Account Takeover (ATO): When an attacker hijacks a legitimate user's session or credentials.
Card Testing: Automated attempts to check if stolen credit cards are active.
First-Party Fraud: When a real customer lies to get a refund or free loan.
Money Mule: Someone used to transfer and wash stolen funds.
Orchestration Layer: The brain that coordinates between vendors, models, and actions.
Circuit Breaker: A safety switch that stops a failing ML service to save the whole system.
Shadow Deployment: Running a model in the background to test it without risk.
AUPRC: The area under the precision-recall curve; the best metric for rare events like fraud.
Concept Drift: When fraudsters change behavior to defeat your current model.
Calibration: Forcing model scores to represent actual real-world probability.
Platt Scaling: A method to calibrate model outputs into meaningful probabilities.
Liveness Detection: Proving a face is a real person, not a mask or recording.
Velocity Features: Counts over time, such as 'Logins in 5 mins'.
Consortium Data: Risk intel shared across many companies.
KYC (Know Your Customer) / KYB (Know Your Business): Legal identity checks.
AML (Anti-Money Laundering): Systems to stop the movement of illegal funds.
Manual Review Queue: The desk where humans check gray-area cases.
False Positive Ratio: How many good users you block for every bad actor caught.
Chargeback Ratio: Your percentage of disputed sales.
Lambda/Kappa Architecture: Frameworks for data processing speeds.
Stateful Stream Processing: Computations that remember previous events.
Online/Offline Parity: Ensuring data logic is identical in lab and production.
Cost-Weighted Loss: Penalizing expensive errors, such as missing $1k, more than small ones.
Inter-rater Reliability (IRR): The degree of agreement among independent analysts.
Behavioral Entropy: Randomness in human interaction. Bots have low entropy; humans have high entropy.
Uncertainty Band: The Yellow Zone (Risk Score 21-79) where high-friction tools and humans live.
FinCEN (Financial Crimes Enforcement Network): The US Treasury bureau that collects and analyzes SARs.
BEC (Business Email Compromise): Fraud where an attacker hijacks business email accounts to redirect payments.
BSA (Bank Secrecy Act): The US law requiring financial institutions to assist government agencies in detecting money laundering.
Consent Order: A legal agreement where a company agrees to stop illegal practices and accept government monitoring to avoid a lawsuit.

Modern Fraud Systems

System Map

Layer 1: Gateway

Layer 2: Real-Time Plumbing

Layer 3: Decision Engine

Layer 4: Action, Ops, Compliance

Layer 5: Feedback Loop

Optimize Money, Not Kaggle Scores

Example: Instant Cash

The Glossary: Formulas & Definitions

1. Population Stability Index (PSI)

2. Recall @ 1% FPR (False Positive Rate)

3. Brier Score (Calibration / MSE)

4. Cohen's Kappa (κ\kappaκ)

Definitions

References & Further Reading

4. Cohen's Kappa ( $\kappa$ )