Welcome to today's demo. Could you start by telling us a bit about yourself and your background?
Of course! I've been working in audio processing for about seven years now, mostly focused on real-time voice analysis.
And what drew you specifically to deepfake detection as a research area?
Well, it started when I noticed that existing systems were struggling with prosody-preserving synthesis. The gap between human and synthetic speech keeps narrowing.
That's a great point. The Modulate API combines frame-level analysis with utterance-level classification — giving you both granular and holistic verdicts.
| Timestamp | Verdict | Confidence |
|---|---|---|
| 0:00 – 0:04 | Authentic |
97.0%
|
| 0:04 – 0:08 | Authentic |
91.2%
|
| 0:08 – 0:12 | Authentic |
93.4%
|
| 0:12 – 0:16 | Synthetic |
78.3%
|
| 0:16 – 0:20 | Synthetic |
96.8%
|
| 0:20 – 0:24 | Synthetic |
87.6%
|
| 0:24 – 0:28 | Authentic |
86.2%
|
| Speakers | 2 |
| Languages | en |
| Deepfake analyzed | 17 / 19 utterances |
| Avg deepfake score | 0.6368 |
| Max deepfake score | 0.9810 |
| File Name | AIAgentFrustration.mp3 |
| File Size | 1.87 MB |
| File Type | audio/mpeg |
| Audio Duration | 1m 37.3s |
| Processing Time | 2.66s |
| Processing Factor | 36.6x real-time |
| HTTP | 200 OK |
| Endpoint | /api/velma-2-stt-batch |
| Response Size | 5.8 KB |
{ "deepfake_score": 0.9788, "utterances": [ { "utterance_uid": "ec69b9e7-8fac-4b3e-a4da-ea9773d56aed", "text": "Track package.", "start_ms": 91620, "duration_ms": 960, "speaker": 2, "language": "en", "emotion": "Excited", "accent": "American", "deepfake_score": 1.1000000000000001 }, { "utterance_uid": "4003b94d-b15d-46b1-9276-80366dc178fc", "text": "Thank you. Did you say you'd like to place an order?", "start_ms": 3000, "duration_ms": 3800, "speaker": 1, "language": "en", "emotion": "Interested", "accent": "American", "deepfake_score": 0.9723 }, { "utterance_uid": "a1c82f3d-09e2-4f7a-b831-22de94a05c61", "text": "I need to check on my delivery, it's been two weeks.", "start_ms": 8200, "duration_ms": 4100, "speaker": 2, "language": "en", "emotion": "Frustrated", "accent": "American", "deepfake_score": 0.1042 }, { "utterance_uid": "7fd301ea-c44b-48d2-a96e-5b3c71d0f882", "text": "I understand your frustration. Let me pull up your order right now.", "start_ms": 12400, "duration_ms": 3600, "speaker": 1, "language": "en", "emotion": "Calm", "accent": "American", "deepfake_score": 0.9812 } ] }
Velma-2 handles transcription, emotion, accent, and engagement detection across 70+ languages. Choose the options that fits your use case and upload voice.
Drop your audio or video file here, or stream live voice
Supported formats up to 50 MB:
| Emotion pattern | Conversation | Category | Industry |
|---|---|---|---|
|
|
Gender-role argument ends the relationship | Social | Personal Relationships |
|
|
Elderly caller needs login for surgery payment | Support | Banking |
|
|
Sales rep fumbles MFA setup with IT | Support | IT services |
|
|
Customer fights for refund in delivery fraud | Support | E-commerce |
|
|
Youtuber describes personal stalker experience | Social | Online media |
|
|
AI bot can't find customer's order | Support | E-commerce |
|
|
User demands MFA reset for drive access | Support | IT services |
|
|
Streamer rants on politics and censorship | Social | Media & broadcasting |
|
|
Angry caller demands update on late delivery | Support | E-commerce |
|
|
Manager pushes IT for password reset | Support | IT services |
|
|
Customer tries account recovery without security steps | Support | E-commerce |