Fintech

Detect transaction anomalies before they become fraud cases

Rule-based fraud systems miss novel attack patterns and drown your ops team in false positives. Tuning them is a full-time job and the fraudsters always move first.

⚠️ Charts, numbers, and "client" examples on this page use illustrative mock data. Real client deployments are confidential.

The Problem

Rule-based fraud systems work brilliantly for the attack patterns you've already seen — and miss every novel one. They drown the ops team in false positives and cost a fraud analyst's full week to tune. Meanwhile, the fraudsters iterate weekly: new device fingerprints, new mule networks, new card-testing flows.

Three concrete failure modes show up in nearly every fintech we talk to:

The new-pattern blind spot. A novel attack — say, account-takeover via a leaked OAuth refresh token — slips through every existing rule for the first 11–14 days while the team is still writing a rule for it.
False-positive fatigue. The ops team reviews 4,000 holds a day. Genuine fraud sits in the same queue as a tourist trying to buy chai at the airport. Reviewer judgement degrades after 200 cases.
The threshold tradeoff. Tighten thresholds to catch more fraud, false positives explode and customers complain. Loosen them, fraud losses jump. There's no good operating point.

⚠ All transaction examples below use synthetic data.

The DataCraves Approach

We don't replace your rules engine. We layer an unsupervised anomaly model on top of it, then a graph-feature scorer that catches mule networks the rules can't see. The combiner blends rule signals and ML signals into a single ranked queue, sorted by expected loss, not raw score.

Three signals, one decision

Isolation Forest on transaction-level features — amount, merchant category, geo-velocity, time-of-day, device entropy. Flags transactions that don't look like the user's history.
Autoencoder on the user's last-30-day behavioural embedding. Reconstruction error spikes when behaviour suddenly changes (compromised account).
Graph features — shared device fingerprints, IP clusters, beneficiary networks. Catches mule rings that look fine transaction-by-transaction.

The combiner uses a Bayesian update: rule signals form the prior, ML signals form the likelihood, the posterior is the ranked score. Each signal carries its own confidence, so a high-confidence rule still wins over a low-confidence ML flag.

The Architecture

Fraud detection architecture: rules + ML ensemble

Diagram 1 — Architecture. Rules engine stays in place; the ML stack runs alongside and the combiner produces the final decision.

Sequence diagram of a flagged transaction

Diagram 2 — End-to-end flow for one transaction, from stream → enrichment → scoring → ops review → feedback.

Feedback loop with TP/FP/FN labels feeding nightly retrain

Diagram 3 — Feedback loop. Every analyst decision feeds back; nightly retrain compensates for class imbalance with SMOTE + class weights.

The Dashboard

Catch rate

88.4%

▲ 21 pp

False positives

1.7%

▼ 1.2 pp

Loss prevented (30d)

₹47 L

▲ 38%

Median review time

38s

▼ 22s

Catch-rate at fixed FP budget

Same FP budget; ML signals lift catch-rate from ~43% to ~88%.

Anomaly score distribution

Clear separation above ~0.6.

Loss prevented vs threshold

Sweet spot near 0.65 — beyond, you lose more catch than you save in friction.

Top fraud patterns this month

Card-testing dominant; ATO via OAuth a new entrant this quarter.

Latency profile (p50 / p95 / p99)

Sub-150ms p99 budget for the full pipeline.

Model precision

Precision @ recall=0.85, mock pilot data.

The Math / The Logic

Each of the three ML scorers produces a value in [0, 1]. The combiner uses a calibrated logistic blend:

# combiner.py
import numpy as np

def combine(rule_score, iso_score, ae_score, graph_score,
            weights=(0.35, 0.25, 0.20, 0.20)):
    # Logistic blend, calibrated on a held-out validation week.
    z = sum(w * np.log(s / (1 - s) + 1e-9)
            for w, s in zip(weights,
                              [rule_score, iso_score, ae_score, graph_score]))
    return 1 / (1 + np.exp(-z))

def expected_loss(score, amount, recovery_rate=0.18):
    # Rank by E[loss], not by score, so high-value low-prob fraud wins.
    p_fraud = score
    return p_fraud * amount * (1 - recovery_rate)

Geo-velocity feature. A classic but still useful signal:

def geo_velocity_kmh(t1_loc, t1_time, t2_loc, t2_time):
    dist = haversine_km(t1_loc, t2_loc)
    hours = (t2_time - t1_time).total_seconds() / 3600
    return dist / max(hours, 0.0001)
# > 800 km/h → impossible without a flight, contributes +0.4 to anomaly score

Graph-feature signal. Build a bipartite graph of (user, device) edges over a 90-day window. For each new transaction, compute the connected-component size containing the user. Components > 14 with > 3 chargebacks in history get a +0.3 graph_score boost.

Sample Output / Insight

📨 Ops console — Insights Agent · streaming

HOLD — txn_id 0x3F2A1C · score 0.91 · expected loss ₹4,200 · rank #3 today

User u_881271 initiated ₹4,200 transfer to a beneficiary added 11 minutes ago. Three signals fired simultaneously:

• Anomaly: amount is 4.2× user's 90-day median (z = +3.1)
• Behavioural: autoencoder reconstruction error 8.7 (baseline 1.2)
• Graph: beneficiary shares device fingerprint with 3 chargeback accounts

Suggested action: hold for step-up auth (OTP + selfie). 78% of similar holds last quarter were confirmed fraud.

ROI Math

Baseline. Mid-size payments business, ₹2,000 Cr / yr GMV, fraud loss rate 14 bps → ₹2.8 Cr / yr in fraud losses, FP rate 2.9% (~58K customer complaints / yr).
Catch-rate lift. 1.8× catch at same FP budget → loss rate drops from 14 bps to 7.7 bps → ₹1.26 Cr / yr saved.
FP reduction. 2.9% → 1.7% → 24,000 fewer customer-complaint cases / yr → ~₹35 L saved in support cost + customer churn avoidance.
Analyst time. 38s vs 60s median review → 36% throughput gain → 2.2 FTEs equivalent freed (~₹40 L / yr).
Total annualised. ~₹2.0 Cr / yr saved vs DataCraves Enterprise tier ≈ net ROI 25-30×.

Assumptions to challenge: 1.8× catch is an apples-to-apples ML uplift over a mature rules engine. If your rules engine is <6 months old, expect 2.5–3× lift. If it's been hand-tuned for years, expect closer to 1.4×.

Common Pitfalls

⚠ Training only on confirmed fraud.

Confirmed-fraud labels are biased toward what your old rules already catch. Add weak labels (chargebacks, friendly-fraud disputes) and PU-learning to capture the missed class.

⚠ Ignoring concept drift.

Fraud patterns shift weekly. A model that hasn't been retrained in 30 days is already 15-20% degraded.

⚠ Throwing the rules engine away.

Rules give you regulatory defensibility. Keep them; let the ML layer sit on top, not replace.

⚠ One global threshold.

Threshold per merchant category and per amount band, or you'll over-block low-value transactions.

⚠ No explanation in the hold.

Analysts who can't see "why this one" review at 30% accuracy. Always surface top-3 contributing features.

📊 Mock data, real patterns.

📈

Expected ROI

1.8–2.5x lift in fraud catch rate at the same false-positive budget

Honest framing: pilot benchmarks against rule-only systems; effect size depends on how mature your existing rules are.

Run this on your own data?

A 30-minute demo shows the agents working against your warehouse — not a pre-baked sandbox.

Book a demo →

Mock data, real patterns. Every visualization is synthetic to preserve client confidentiality.