Skip to main navigation Skip to search Skip to main content

Deepfake defense: Combining spatial and temporal cues with CNN–BiLSTM–transformer architecture

Research output: Contribution to journalArticlepeer-review

Abstract

The proliferation of deepfakes is a major threat to the believability of online media and the stability of public discourse. These hyper-realistic fake videos, nearly indistinguishable from genuine content, can be misused to spread disinformation, conduct identity theft, and manipulate political narratives. Most existing deepfake detectors analyze spatial or temporal features in isolation however, in real-world scenarios involving video compression, occlusions, or frame instability, such approaches are inadequate. Convolution Neural Networks (CNN) effectively capture spatial artifacts but fail to model temporal dynamics, while recurrent neural networks (RNNs) and long short-term memory (LSTM) units handle short-range temporal signals but struggle with long-term dependencies. To address these limitations, we propose a hybrid deep learning architecture that integrates CNN, bidirectional LSTMs (BiLSTMs), and transformer encoders within a unified framework. The CNN module extracts fine-grained spatial information from each frame, the BiLSTM branch captures local temporal motion, and the transformer encoder models global temporal relationships across video sequences. This dual-path temporal modeling framework leverages the strengths of both sequential learning and attention mechanisms to enable comprehensive spatiotemporal analysis. The model is implemented in TensorFlow using MobileNetV2 as its CNN backbone and evaluated on the FaceForensics++ and DeepFake Detection Challenge (DFDC) datasets. The proposed architecture demonstrates superior performance compared to baseline models such as XceptionNet, CNN–LSTM, and CNN–Transformer, achieving an F1-score of 90.6% and an AUC of 98.5%. In addition to high detection accuracy, the model exhibits strong robustness against video quality degradation, making it a practical and scalable solution for detecting deepfakes in critical and sensitive applications.

Original languageEnglish
Article numbere0334980
JournalPLoS One
Volume20
Issue number11 November
DOIs
Publication statusPublished - 11-2025

All Science Journal Classification (ASJC) codes

  • General

Fingerprint

Dive into the research topics of 'Deepfake defense: Combining spatial and temporal cues with CNN–BiLSTM–transformer architecture'. Together they form a unique fingerprint.

Cite this