AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection Leveraging LLMs and Federated Learning

Ismail Hossain1, Sai Puppala2, Md Jahangir Alam1, Sajedul Talukder1

1Department of Computer Science, University of Texas at El Paso
2School of Computing, Southern Illinois University Carbondale

Abstract

Scams exploiting real-time social engineering—such as phishing, impersonation, and phone fraud—are a persistent and evolving threat across digital platforms. Existing detection systems are typically reactive, offering limited protection during active scammer interactions. We propose a novel, privacy-preserving, AI-in-the-loop framework that proactively detects and disrupts scam conversations in real time. Our system integrates instruction-tuned large language models (LLMs) with a privacy-aware utility function that selects responses balancing conversational engagement and harm minimization. To prevent disclosure of personal or sensitive information, we apply rigorous PII scoring and enforce hard safety thresholds. Crucially, we couple this mechanism with federated learning, enabling on-device model updates without raw data aggregation—preserving user privacy while supporting continual learning from diverse, real-world interactions. Experimental results show that our approach improves scammer deception, sustains engagement, and adapts dynamically across scenarios. This is the first system to combine real-time conversational scam-baiting with privacy-preserving federated learning, offering a new paradigm for proactive, adaptive, and trustworthy scam defense.

⚔️ Attack Model & Defense Framework

Threat Model Diagram

Figure: Scammer tactics and AI-in-the-Loop defense workflow.

Scammers exploit digital traces such as posts, contacts, and behaviors to craft personalized attacks. Our system detects and mitigates these threats in real time.

📌 Scam Tactics

  • Harvest social media posts, contacts, and behaviors
  • Post fake ads or offers (e.g., free services)
  • Engage victims via comments & direct messages
  • Escalate into phishing, fraud, or data theft

🛡 Real-Time AI Monitoring

An AI module monitors conversations, assigning a risk score Rt ∈ [0,1]. If risk exceeds threshold τ, the system flags suspicious activity.

🤖 AI Scambaiting Agent

Instead of cutting off the dialogue, an AI agent impersonates the victim, generating safe, strategic replies that waste the scammer’s time while avoiding personal data exposure.

🔒 Privacy Protection

Protects users with Federated Learning and Differential Privacy (DP-SGD), defending against gradient leakage attacks (DLG, iDLG).

⚖️ Multi-Threshold Risk Control

  • θ₁: Detect scam & alert user
  • θ₂: Ensure scambaiting remains safe
  • δ: Halt engagement if privacy risk or harm score exceeded

🛡️ Prototype of the Proposed System

The following example illustrates how our system handles a Family Emergency Scam conversation. It shows how a normal exchange escalates into a scam, and how the AI intervenes in real time.

📱 Conversation Prototype

Scam Conversation Prototype

Figure: Prototype of Family Emergency Scam Conversation

🔄 Step-by-Step Process

  1. Normal Start: A grandchild greets their grandparent.
  2. Financial Distress: The grandchild mentions money problems.
  3. Escalation: Repeated requests for money emerge.
  4. AI Detection: The system flags this as a potential scam.
  5. Scambaiting: The AI continues the chat, gathering info safely.
  6. User Control: User decides to continue or end the conversation.

System Architecture

System Architecture Diagram

Figure: Overview of the proposed real-time scam prevention system architecture. The pipeline includes four primary stages: (1) message monitoring and role identification, (2) scam detection using local LLMs, (3) AI-based scambaiting upon threshold breach, and (4) federated learning-based model aggregation on a global server to enhance detection while preserving privacy.

Our system transforms scam defense from passive detection into active, AI-powered engagement that is safe, adaptive, and privacy-preserving.

🚨 Problem Context

Scammers exploit personal info & psychology (urgency, fear, authority) across social media, calls, and messaging apps.

💡 Core Idea

An AI-in-the-loop system uses LLMs for conversational scambaiting to delay, disrupt, and study scams in real time.

⚙️ Design Features

  • Instruction-tuned LLMs generate safe, victim-like responses.
  • Utility function balances engagement vs. privacy risk.
  • Federated Learning ensures privacy via local training.
  • Dynamic thresholds control detection & safe responses.

📈 Pipeline Flow

  1. Monitor conversations & compute scam score.
  2. Flag risks & activate AI assistant with user consent.
  3. Generate & rank multiple candidate responses.
  4. Send safest, most engaging reply to scammer.
  5. Continuously adapt context-aware engagement.
  6. Models updated securely via Federated Learning.

📊 Dataset Statistics

We compiled a range of real and synthetic datasets to evaluate scam detection and response generation. These datasets cover both classification (scam vs. non-scam detection) and generation (safe scambaiting responses) tasks.

To strengthen training, we also generated a large synthetic multitask dataset (ChatGPT-4o) spanning scam types (appointment, delivery, refund, support, telemarketing, etc.) and a general non-scam class. Each conversation was expanded into three tasks: PII Evaluation, Scam Risk Scoring, and Scam-Baiting Response Generation.

Classification & Generation Datasets
Type ssc sasc masc ssd ytsc asb sbc
appointment - 200 200 0 - - -
delivery - 200 200 200 - - -
insurance - 200 200 200 - - -
wrong - 200 200 200 - - -
refund - 200 200 200 4 - -
reward - 200 200 200 7 - -
ssn - 200 200 200 4 - -
support - 200 200 200 5 - -
telemarketing - 0 0 200 - - -
#max conv len 13 28 30 28 67 871 73
#min conv len 6 4 3 6 13 2 3
#avg conv len 10 14 12 13 28 56 10
Synthesized Multi-Task Dataset
Type #Conversations #Samples
appointment 500 1500
delivery 500 1500
insurance 500 1500
wrong 500 1500
refund 500 1500
reward 500 1500
support 500 1500
telemarketing 500 1500
gift_card 500 1500
account_suspension 500 1500
identity_verification 500 1500
general 2000 2000
Total 8000 20000

⚖️ Risk & Engagement Scores

Type Engagement PII Risk Scam Risk
account_suspension 0.64 0.80 0.87
appointment 0.66 0.80 0.88
delivery 0.65 0.81 0.87
gift_card 0.66 0.80 0.88
identity_verification 0.65 0.80 0.87
insurance 0.66 0.80 0.87
refund 0.66 0.80 0.87
reward 0.66 0.79 0.88
support 0.65 0.80 0.87
telemarketing 0.65 0.79 0.88
wrong 0.65 0.79 0.88
general 0.65 0.00 0.03

Results

We evaluated our models across multiple tasks — scam detection, PII risk scoring, response generation, human evaluation, multi-turn engagement, federated learning, and safeness assessment.

1. Baseline Model Performance

Compared BERT, RoBERTa, DistilBERT, BiLSTM, BiGRU across MASC, SASC, SSC, SSD. BiLSTM & BiGRU achieved near-perfect F1, outperforming transformers.

Model Dataset F1 FPR FNR AUPRC
BERT-Base masc 0.9812 0.218 0.124 0.9784
sasc 0.9756 0.231 0.240 0.9713
ssc 0.9874 0.207 0.116 0.9849
ssd 0.9625 0.275 0.282 0.9612
BERT-Large masc 0.9883 0.208 0.113 0.9861
sasc 0.9731 0.236 0.244 0.9695
ssc 0.9674 0.251 0.260 0.9652
ssd 0.9925 0.111 0.104 0.9873
RoBERTa masc 0.9932 0.101 0.209 0.9908
sasc 0.9916 0.112 0.211 0.9897
ssc 0.9901 0.107 0.213 0.9881
ssd 1.0000 0.000 0.000 1.0000
DistilBERT masc 0.9697 0.262 0.270 0.9678
sasc 0.9724 0.240 0.248 0.9701
ssc 0.9682 0.259 0.267 0.9657
ssd 0.9651 0.271 0.280 0.9636
BiLSTM masc 0.9988 0.0017 0.0008 0.9979
sasc 0.9889 0.0175 0.0050 0.9806
ssc 0.9994 0.000 0.0013 0.9994
ssd 0.9945 0.0033 0.0075 0.9930
BiGRU masc 0.9992 0.0008 0.0008 0.9988
sasc 0.9979 0.0033 0.0008 0.9963
ssc 0.9994 0.000 0.0013 0.9994
ssd 0.9996 0.0008 0.000 0.9992

2. Instruction-Tuned LLMs for Scam Detection

Dataset LlamaGuard LlamaGuard2 LlamaGuard3 MD-Judge
F1 AUPRC FPR FNR F1 AUPRC FPR FNR F1 AUPRC FPR FNR F1 AUPRC FPR FNR
MASC 0.5829 0.5895 0.7299 0.2383 0.7275 0.7580 0.7269 0.0 0.8200 0.7095 0.3368 0.0567 0.8306 0.8992 0.2038 0.1450
SASC 0.6621 0.7531 0.9532 0.0000 0.6833 0.7139 0.8426 0.0015 0.7074 0.6877 0.6637 0.0559 0.8496 0.8808 0.3150 0.0288
SSC 0.6761 0.6754 0.6996 0.1525 1.0000 1.0000 0.0000 0.0000 0.9934 0.9962 0.0126 0.0019 0.9735 1.0000 0.0000 0.0515
SSD 0.6610 0.7409 0.9334 0.0000 0.7253 0.7716 0.6965 0.0015 0.7295 0.7189 0.5854 0.0644 0.8985 0.9320 0.1978 0.0453

3. Scam-Baiting Response Generation

Dataset Perplexity ↓ Dist-1 ↑ Dist-2 ↑ DialogRPT ↑
MASC 26.51 0.18 0.53 0.35
SASC 28.37 0.21 0.56 0.28
SSC 22.30 0.69 0.54 0.80
SSD 27.84 0.15 0.47 0.36

4. Human Evaluation

Metric MD-Judge LlamaGuard3 p-value
Realism (1–5) 4.31 ± 0.52 3.92 ± 0.61 <0.01
Engagement (1–5) 4.05 ± 0.60 3.31 ± 0.65 <0.01
Safety (%) 96.0 92.0 <0.05
Effectiveness (1–5) 4.12 ± 0.55 3.43 ± 0.57 <0.01

5. Multi-Turn Engagement

Model Count Mean Time (s) μE μPII μS μL
LlamaGuard 7 ± 2 6.50 ± 5.59 0.30 ± 0.30 0.17 ± 0.24 0.39 ± 9.19. 275 ± 106
LlamaGuard2 9 ± 0 5.68 ± 1.65 0.78 ± 0.05 0.81 ± 0.11 0.11 ± 6.11 163 ± 97
LlamaGuard3 8 ± 2 7.47 ± 3.83 0.74 ± 0.04 0.38 ± 0.42 0.92 ± 0.06 245 ± 145
MD-Judge 9 ± 1 8.42 ± 2.01 0.79 ± 0.04 0.57 ± 0.30 0.53 ± 4.04 228 ± 17

6. Federated Learning (with/without DP)

Performance comparison of aggregated models using FedAvg with and without Differential Privacy.
Round Method Novelty ↑ Rel. (Sc) ↑ Scam Risk ↓ Engage. ↑ PII Risk ↓
5 - 0.5804 0.7399 0.5417 0.7966 0.0050
0.1-DP 0.5991 0.7474 0.4998 0.7984 0.0074
0.8-DP 0.5049 0.7425 0.5407 0.7014 0.0064
10 - 0.5906 0.7377 0.5415 0.7928 0.0050
0.1-DP 0.6062 0.7451 0.4998 0.7983 0.0074
0.8-DP 0.5849 0.7448 0.5392 0.7927 0.0037
15 - 0.5986 0.7409 0.5413 0.7969 0.0050
0.1-DP 0.5963 0.7455 0.4998 0.8009 0.0074
0.8-DP 0.5978 0.7450 0.5344 0.8003 0.0085
20 - 0.5961 0.7425 0.5415 0.7960 0.0050
0.1-DP 0.6024 0.7476 0.4998 0.7987 0.0074
0.8-DP 0.5982 0.7426 0.5342 0.7954 0.0085
25 - 0.6006 0.7427 0.5415 0.7974 0.0051
0.1-DP 0.6048 0.7470 0.4998 0.7982 0.0074
0.8-DP 0.6055 0.7396 0.5342 0.7969 0.0085
30 - 0.5986 0.7459 0.5413 0.8054 0.0052
0.1-DP 0.5956 0.7491 0.4997 0.8003 0.0074
0.8-DP 0.6071 0.7460 0.5421 0.7972 0.0085

7. Safeness & Risk Awareness

Evaluation results for four guard models across moderation labels.
Moderation Engagement Score PII Risk Score Scam Detection
LlamaGuard
safe 0.512450 0.120861 0.746914
unsafe_o1 0.756410 0.038462 0.935128
unsafe_o2 0.833333 0.000000 0.933333
unsafe_o5 0.900000 0.000000 1.000000
unsafe_o6 0.900000 0.000000 1.000000
LlamaGuard2
safe 0.723525 0.290834 0.700517
unsafe_o1 1.100000 0.800000 0.960000
unsafe_o3 0.741386 0.802070 0.960281
unsafe_o5 0.749597 0.799329 0.966711
unsafe_o9 0.703636 0.490909 0.904545
unsafe_s1 0.739167 0.758333 0.937500
unsafe_s3 0.760000 0.800000 0.953333
LlamaGuard3
safe 0.914468 0.240787 0.461707
unsafe_o1 0.963778 0.477778 0.906000
unsafe_s1 0.970483 0.750345 0.942000
unsafe_s2 0.974372 0.685681 0.957775
unsafe_s9 0.860000 0.000000 1.000000
MD-Judge
safe 0.775891 0.299076 0.404701
unsafe_o1 0.769355 0.575645 0.748952
unsafe_o3 0.800000 0.800000 0.847500
unsafe_o4 0.752763 0.721171 0.823093
unsafe_o5 0.739103 0.333333 0.845128

Key Contributions

  • We introduce the problem of privacy-preserving, AI-driven conversational scambaiting using instruction-tuned LLMs.
  • We design a novel utility function that balances scammer engagement against personally identifiable information (PII) and behavioral risk.
  • We implement a real-time response filtering mechanism that enforces safety via harm scoring and strict thresholds.
  • We propose a federated learning architecture that enables decentralized model training without requiring raw data collection.

Discussion & Future Directions

Our system provides real-time scam detection and safe scambaiting on messaging platforms while preserving privacy. Despite high scam awareness, users remain vulnerable, so we combine synthetic and real-world dialogues with non-scam data to improve balance.

Key innovations include the joint optimization of detection, risk scoring, and response generation through a privacy-weighted utility function, outperforming prior classifiers and LLM-based systems.

We use federated learning with differential privacy to adapt against evolving scams without leaking user data. Local fine-tuning improves detection over time. Future enhancements will address adversarial prompt tuning, voice-scam defenses, and cross-cultural studies.

Finally, role identification ensures AI engages only with the higher-risk participant. Users retain full control with opt-out options, safety thresholds, and harm filtering—ensuring safe, responsive, and trustworthy interventions.