DataCraves logo DataCraves
← All use cases
Fintech

Detect transaction anomalies before they become fraud cases

Rule-based fraud systems miss novel attack patterns and drown your ops team in false positives. Tuning them is a full-time job and the fraudsters always move first.

⚠️ Charts, numbers, and "client" examples on this page use illustrative mock data. Real client deployments are confidential.

The Problem

Rule-based fraud systems work brilliantly for the attack patterns you've already seen — and miss every novel one. They drown the ops team in false positives and cost a fraud analyst's full week to tune. Meanwhile, the fraudsters iterate weekly: new device fingerprints, new mule networks, new card-testing flows.

Three concrete failure modes show up in nearly every fintech we talk to:

⚠ All transaction examples below use synthetic data.

The DataCraves Approach

We don't replace your rules engine. We layer an unsupervised anomaly model on top of it, then a graph-feature scorer that catches mule networks the rules can't see. The combiner blends rule signals and ML signals into a single ranked queue, sorted by expected loss, not raw score.

Three signals, one decision

The combiner uses a Bayesian update: rule signals form the prior, ML signals form the likelihood, the posterior is the ranked score. Each signal carries its own confidence, so a high-confidence rule still wins over a low-confidence ML flag.

The Architecture

Fraud detection architecture: rules + ML ensemble
Diagram 1 — Architecture. Rules engine stays in place; the ML stack runs alongside and the combiner produces the final decision.
Sequence diagram of a flagged transaction
Diagram 2 — End-to-end flow for one transaction, from stream → enrichment → scoring → ops review → feedback.
Feedback loop with TP/FP/FN labels feeding nightly retrain
Diagram 3 — Feedback loop. Every analyst decision feeds back; nightly retrain compensates for class imbalance with SMOTE + class weights.

The Dashboard

Catch rate
88.4%
▲ 21 pp
False positives
1.7%
▼ 1.2 pp
Loss prevented (30d)
₹47 L
▲ 38%
Median review time
38s
▼ 22s
Catch-rate at fixed FP budget
Same FP budget; ML signals lift catch-rate from ~43% to ~88%.
Anomaly score distribution
Clear separation above ~0.6.
Loss prevented vs threshold
Sweet spot near 0.65 — beyond, you lose more catch than you save in friction.
Top fraud patterns this month
Card-testing dominant; ATO via OAuth a new entrant this quarter.
Latency profile (p50 / p95 / p99)
Sub-150ms p99 budget for the full pipeline.
Model precision
Precision @ recall=0.85, mock pilot data.

The Math / The Logic

Each of the three ML scorers produces a value in [0, 1]. The combiner uses a calibrated logistic blend:

# combiner.py
import numpy as np

def combine(rule_score, iso_score, ae_score, graph_score,
            weights=(0.35, 0.25, 0.20, 0.20)):
    # Logistic blend, calibrated on a held-out validation week.
    z = sum(w * np.log(s / (1 - s) + 1e-9)
            for w, s in zip(weights,
                              [rule_score, iso_score, ae_score, graph_score]))
    return 1 / (1 + np.exp(-z))

def expected_loss(score, amount, recovery_rate=0.18):
    # Rank by E[loss], not by score, so high-value low-prob fraud wins.
    p_fraud = score
    return p_fraud * amount * (1 - recovery_rate)

Geo-velocity feature. A classic but still useful signal:

def geo_velocity_kmh(t1_loc, t1_time, t2_loc, t2_time):
    dist = haversine_km(t1_loc, t2_loc)
    hours = (t2_time - t1_time).total_seconds() / 3600
    return dist / max(hours, 0.0001)
# > 800 km/h → impossible without a flight, contributes +0.4 to anomaly score

Graph-feature signal. Build a bipartite graph of (user, device) edges over a 90-day window. For each new transaction, compute the connected-component size containing the user. Components > 14 with > 3 chargebacks in history get a +0.3 graph_score boost.

Sample Output / Insight

📨 Ops console — Insights Agent · streaming

HOLD — txn_id 0x3F2A1C · score 0.91 · expected loss ₹4,200 · rank #3 today

User u_881271 initiated ₹4,200 transfer to a beneficiary added 11 minutes ago. Three signals fired simultaneously:

  • Anomaly: amount is 4.2× user's 90-day median (z = +3.1)
  • Behavioural: autoencoder reconstruction error 8.7 (baseline 1.2)
  • Graph: beneficiary shares device fingerprint with 3 chargeback accounts

Suggested action: hold for step-up auth (OTP + selfie). 78% of similar holds last quarter were confirmed fraud.

ROI Math

Assumptions to challenge: 1.8× catch is an apples-to-apples ML uplift over a mature rules engine. If your rules engine is <6 months old, expect 2.5–3× lift. If it's been hand-tuned for years, expect closer to 1.4×.

Common Pitfalls

⚠ Training only on confirmed fraud.
Confirmed-fraud labels are biased toward what your old rules already catch. Add weak labels (chargebacks, friendly-fraud disputes) and PU-learning to capture the missed class.
⚠ Ignoring concept drift.
Fraud patterns shift weekly. A model that hasn't been retrained in 30 days is already 15-20% degraded.
⚠ Throwing the rules engine away.
Rules give you regulatory defensibility. Keep them; let the ML layer sit on top, not replace.
⚠ One global threshold.
Threshold per merchant category and per amount band, or you'll over-block low-value transactions.
⚠ No explanation in the hold.
Analysts who can't see "why this one" review at 30% accuracy. Always surface top-3 contributing features.
📊 Mock data, real patterns.
📈
Expected ROI
1.8–2.5x lift in fraud catch rate at the same false-positive budget

Honest framing: pilot benchmarks against rule-only systems; effect size depends on how mature your existing rules are.

Run this on your own data?

A 30-minute demo shows the agents working against your warehouse — not a pre-baked sandbox.

Book a demo →

Mock data, real patterns. Every visualization is synthetic to preserve client confidentiality.