Multilingual Transcription

0:00 Speaker 0 Calm American accent Authentic

Welcome to today's demo. Could you start by telling us a bit about yourself and your background?

0:06 Speaker 1 Happy British accent Authentic

Of course! I've been working in audio processing for about seven years now, mostly focused on real-time voice analysis.

0:18 Speaker 0 Neutral Authentic

And what drew you specifically to deepfake detection as a research area?

0:24 Speaker 1 Interested Deepfake

Well, it started when I noticed that existing systems were struggling with prosody-preserving synthesis. The gap between human and synthetic speech keeps narrowing.

0:42 Speaker 0 Confident Authentic

That's a great point. The Modulate API combines frame-level analysis with utterance-level classification — giving you both granular and holistic verdicts.

Raw JSON

{
  "deepfake_score": 0.9788,
  "utterances": [
    {
      "utterance_uid": "ec69b9e7-8fac-4b3e-a4da-ea9773d56aed",
      "text": "Track package.",
      "start_ms": 91620,
      "duration_ms": 960,
      "speaker": 2,
      "language": "en",
      "emotion": "Excited",
      "accent": "American",
      "deepfake_score": 1.1000000000000001
    },
    {
      "utterance_uid": "4003b94d-b15d-46b1-9276-80366dc178fc",
      "text": "Thank you. Did you say you'd like to place an order?",
      "start_ms": 3000,
      "duration_ms": 3800,
      "speaker": 1,
      "language": "en",
      "emotion": "Interested",
      "accent": "American",
      "deepfake_score": 0.9723
    },
    {
      "utterance_uid": "a1c82f3d-09e2-4f7a-b831-22de94a05c61",
      "text": "I need to check on my delivery, it's been two weeks.",
      "start_ms": 8200,
      "duration_ms": 4100,
      "speaker": 2,
      "language": "en",
      "emotion": "Frustrated",
      "accent": "American",
      "deepfake_score": 0.1042
    },
    {
      "utterance_uid": "7fd301ea-c44b-48d2-a96e-5b3c71d0f882",
      "text": "I understand your frustration. Let me pull up your order right now.",
      "start_ms": 12400,
      "duration_ms": 3600,
      "speaker": 1,
      "language": "en",
      "emotion": "Calm",
      "accent": "American",
      "deepfake_score": 0.9812
    }
  ]
}

General Statistics

Speakers: 2
Languages: en
Deepfake analyzed: 17 / 19 utterances
Avg deepfake score: 0.6368
Max deepfake score: 0.9810

Audio

File Name: AIAgentFrustration.mp3
File Size: 1.87 MB
File Type: audio/mpeg
Audio Duration: 1m 37.3s

Request

HTTP: 200 OK
Endpoint: /api/velma-2-stt-batch
Response Size: 5.8 KB

Performance

Processing Time: 2.66s
Processing Factor: 36.6x real-time

Emotion pattern	Conversation	Category	Industry
07:48	Gender-role argument ends the relationship	Social	Personal Relationships
05:33	Elderly caller needs login for surgery payment	Support	Banking
06:10	Sales rep fumbles MFA setup with IT	Support	IT services
04:31	Customer fights for refund in delivery fraud	Support	E-commerce
07:52	Youtuber describes personal stalker experience	Social	Online media
01:37	AI bot can't find customer's order	Support	E-commerce
03:49	User demands MFA reset for drive access	Support	IT services
04:24	Streamer rants on politics and censorship	Social	Media & broadcasting
04:28	Angry caller demands update on late delivery	Support	E-commerce
06:25	Manager pushes IT for password reset	Support	IT services
05:14	Customer tries account recovery without security steps	Support	E-commerce

Audio researcher discusses deepfake detection

Raw JSON

General Statistics

Audio

Request

Performance

Preloaded demo recordings