WaViT-CDC: wavelet vision transformer with central difference convolutions for spatial-frequency deepfake detection

    Research output: Contribution to journalArticlepeer-review

    Abstract

    The increasing popularity of generative AI has led to a significant rise in deepfake content, creating an urgent need for generalized and reliable deepfake detection methods. Since existing approaches rely on either spatial-domain features or frequency-domain features, they struggle to generalize across unseen datasets, especially those with subtle manipulations. To address these challenges, a novel end-to-end Wavelet Central Difference Convolutional Vision Transformer framework is designed to enhance spatial-frequency deepfake detection. Unlike previous methods, this approach applies the Discrete Wavelet Transform for multi-level frequency decomposition and Central Difference Convolution to capture local fine-grained discrepancies and focus on texture variances, while also incorporating Vision Transformers for global contextual understanding. The Frequency-Spatial Feature Fusion Attention module integrates these features, enabling the effective detection of fake artifacts. Moreover, in contrast to earlier work, subtle perturbations to both spatial and frequency domains are introduced to further improve generalization. Generalization cross-dataset evaluations demonstrate that WaViT-CDC outperforms state-of-the-art methods, when trained on both low-quality and high-quality face images, achieving an average performance increase of 2.5% and 4.5% on challenging high-resolution, real-world datasets such as Celeb-DF and WildDeepfake.
    Original languageEnglish
    JournalIEEE Open Journal of Signal Processing
    DOIs
    Publication statusE-pub ahead of print - 20 May 2025

    Keywords

    • Computer science and informatics

    Fingerprint

    Dive into the research topics of 'WaViT-CDC: wavelet vision transformer with central difference convolutions for spatial-frequency deepfake detection'. Together they form a unique fingerprint.

    Cite this