Audio researcher discusses deepfake detection
Welcome to today's demo. Could you start by telling us a bit about yourself and your background?
Of course! I've been working in audio processing for about seven years now, mostly focused on real-time voice analysis.
And what drew you specifically to deepfake detection as a research area?
Well, it started when I noticed that existing systems were struggling with prosody-preserving synthesis. The gap between human and synthetic speech keeps narrowing.
That's a great point. The Modulate API combines frame-level analysis with utterance-level classification — giving you both granular and holistic verdicts.
Raw JSON
{ "deepfake_score": 0.9788, "utterances": [ { "utterance_uid": "ec69b9e7-8fac-4b3e-a4da-ea9773d56aed", "text": "Track package.", "start_ms": 91620, "duration_ms": 960, "speaker": 2, "language": "en", "emotion": "Excited", "accent": "American", "deepfake_score": 1.1000000000000001 }, { "utterance_uid": "4003b94d-b15d-46b1-9276-80366dc178fc", "text": "Thank you. Did you say you'd like to place an order?", "start_ms": 3000, "duration_ms": 3800, "speaker": 1, "language": "en", "emotion": "Interested", "accent": "American", "deepfake_score": 0.9723 }, { "utterance_uid": "a1c82f3d-09e2-4f7a-b831-22de94a05c61", "text": "I need to check on my delivery, it's been two weeks.", "start_ms": 8200, "duration_ms": 4100, "speaker": 2, "language": "en", "emotion": "Frustrated", "accent": "American", "deepfake_score": 0.1042 }, { "utterance_uid": "7fd301ea-c44b-48d2-a96e-5b3c71d0f882", "text": "I understand your frustration. Let me pull up your order right now.", "start_ms": 12400, "duration_ms": 3600, "speaker": 1, "language": "en", "emotion": "Calm", "accent": "American", "deepfake_score": 0.9812 } ] }
General Statistics
- Speakers
- 2
- Languages
- en
- Deepfake analyzed
- 17 / 19 utterances
- Avg deepfake score
- 0.6368
- Max deepfake score
- 0.9810
Audio
- File Name
- AIAgentFrustration.mp3
- File Size
- 1.87 MB
- File Type
- audio/mpeg
- Audio Duration
- 1m 37.3s
Request
- HTTP
- 200 OK
- Endpoint
- /api/velma-2-stt-batch
- Response Size
- 5.8 KB
Performance
- Processing Time
- 2.66s
- Processing Factor
- 36.6x real-time