PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

Patch-Discontinuity Mining for Generalized Deepfake Detection

Huanhuan Yuan¹, Yang Ping^2,*, Zhengqin Xu¹, Junyi Cao¹, Shuai Jia¹, Chao Ma^1,*

¹Shanghai Jiao Tong University, ²Chinese Academy of Military Science

Code (Cooming Soon) arXiv

Abstract

The advancement of generative artificial intelligence has led to the creation of more diverse and realistic fake facial images. This poses serious threats to personal privacy and can contribute to the spread of misinformation. Existing deepfake detection methods usually utilize prior knowledge about forged clues to design complex modules, achieving excellent performance in the intra-domain settings. However, their performance usually suffers from a significant decline in unseen forgery patterns. It is thus desirable to develop a generalized deepfake detection method using a neat network structure. In this paper, we propose a simple yet efficient framework to transfer a powerful large-scale vision model like ViT to the downstream deepfake detection task, namely the generalized deepfake detection framework (GenDF). Concretely, we first propose a deepfake-specific representation learning (DSRL) scheme to learn different discontinuity patterns across patches inside a fake facial image and continuity between patches within a real counterpart in a low-dimensional space. To further alleviate the distribution mismatch between generic real images and human facial images consisting of both real and fake, we introduce a feature space redistribution (FSR) scheme to separately optimize the distributions of real and fake feature space, enabling the model to learn more distinctive representations. Furthermore, to enhance the generalization performance on unseen forgery patterns produced by constantly evolving facial manipulation techniques and diverse variations on real faces, we propose a classification-invariant feature augmentation (CIFAug) function without trainable parameters. Extensive experiments demonstrate that our method achieves state-of-the-art generalization performance in cross-domain and cross-manipulation settings with only 0.28M trainable parameters.

Method

The pipeline of the proposed GenDF framework. The facial images (real or fake) first enter a ViT backbone for representation embedding in a low-dimensional space. Then, we optimize the distributions to learn more discriminative features. Next, these two kinds of features go through the class-invariant feature augmentation procedure to improve the generalization abilities of our method.

Experimental Results

The comparison of intra-domain performance on FF++, Celeb-DF, DFDC and DFD datasets. Observations show that GenDF consistently surpasses all existing competitors, regardless of whether the custom-made, pre-trained, or fine-tuned methods.

Cross-domain evaluations by training on FF++(HQ) and evaluating on other datasets. The results domonstrate that GenDF achieves state-of-the-art generalization performance with the minimal trainable parameters compared with all existing deepfake detection methods, validating that our framework is remarkably simple but highly effective.

The Grad-CAM activation map of real and fake representations learned by the ViT backbone and our GenDF on the FF++(HQ) dataset. GenDF precisely captures the discriminative patterns of both authentic and various forged facial images generated by various manipulation techniques, outperforming the vanilla ViT model.

The t-SNE feature distribution visualization of real and forged facial images obtained by the baseline model (ViT) and the proposed method (GenDF) under the intra-testing and the cross-testing settings. (a) ViT on the intra-testing dataset (FF++). (b) Ours+ViT on the intra-testing dataset (FF++). (c) Ours+ViT on the cross-testing dataset (DFD). (d) Ours+ViT without Class Invariant Augmentation (CIFAug) function on the cross-testing dataset (DFD).

BibTeX

@article{Yuan2025Patch,
  title={Patch-Discontinuity Mining for Generalized Deepfake Detection},
  author={Huanhuan Yuan and Yang Ping and Zhengqin Xu and Junyi Cao and Shuai Jia and Chao Ma},
  journal={IEEE Transactions on Multimedia},
  year={2025},
  url={https://arxiv.org/abs/2512.22027}
}

More Works from Our Lab

Paper Title 1

Paper Title 2

Paper Title 3

Patch-Discontinuity Mining for Generalized Deepfake Detection

Abstract

Method

Experimental Results

BibTeX