Skip to main navigation Skip to search Skip to main content

Multi-Layer Knowledge Distillation with Custom Temperature Scaling for Deepfake Detection

  • Eva Hemantkumar Shah
  • , Utkarsh Dubey
  • , Arshaq Shagihan Abdul Rahiman
  • , Nisha P. Shetty*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The swift emergence of highly convincing deepfakes on social media and digital platforms has posed considerable difficulties to digital security, privacy, and public confidence, as these fabrications are increasingly used for financial deception and social manipulation. The enormous computational demands of high-capacity models now impede the deployment of effective defense measures, preventing real-time authentication on the resource-constrained edge devices where these forgeries are most frequently encountered. Deep learning and machine learning, particularly high-capacity convolutional neural networks (CNNs) intended to examine temporal irregularities and facial textures, are key components of current detection methods. These current approaches, however, have a crucial trade-off: while naive model compression frequently results in a significant loss in forensic detection performance and a failure to retain fine-grained feature representations, larger models achieve high accuracy but are too slow for mobile deployment. In order to close this gap, our study develops a multi-layer Knowledge Distillation (KD) system that allows a lightweight student, EfficientNet-B0, to receive decision-level, feature-level, and attention-level knowledge from a high-capacity Xception teacher.We allow the learner to imitate the teacher’s internal representations and spatial focus areas while maintaining crucial forensic clues by using a bespoke "gentle" temperature scaling schedule. With 4x fewer parameters and 20x fewer processes, the suggested Student Temp 2 model obtained a remarkable accuracy of 97.23% and an AUC of 0.9942, somewhat surpassing the teacher’s accuracy. These findings suggest that the ShuffleNetV2-KD variation, which maintains a high accuracy of 95.45% with only 1.25M parameters, be used for the most severe resource limitations. This framework is essential for practical deployment since it offers a scalable way to include reliable, real-time deepfake detection straight into edge-based communication platforms and smartphones.

Original languageEnglish
Pages (from-to)46763-46787
Number of pages25
JournalIEEE Access
Volume14
DOIs
Publication statusAccepted/In press - 2026

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Multi-Layer Knowledge Distillation with Custom Temperature Scaling for Deepfake Detection'. Together they form a unique fingerprint.

Cite this