Skip to main navigation Skip to search Skip to main content

Deep learning enabled pseudonymization for preserving data privacy of financial identifiers in public documents in India

  • R. Roopalakshmi*
  • , Saurabh Kailas
  • , R. Sreelatha
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The increasing digitization and transmission of government-issued electronic documents have intensified the need to protect the ’Handwritten signatures’-recognized as ’critical biometric identifiers’ from identity-related data breaches. For instance, as per 2025-RSA ID IQ Report, 40% of respondents reported Identity-related data breaches and 66% emphasized the significant damages caused by these breaches to their organizations. The existing privacy-preserving anonymization research is primarily focusing on facial features and fingerprints, whereas the Pseudonymization of handwritten signatures in publicly accessible documents remains largely underexplored in the literature. This research study proposes a new Fully Convolutional Neural Network (CNN)-based Pseudonymization framework using SuperPoint architecture integrated with Differentiable output decoding, which aims to identify and pseudonymize the handwritten signatures in public-domain documents, specifically in Indian Government issued Permanent Account Number (PAN) cards. In contrast to traditional anonymization approaches, this pseudonymization technique preserves document utility by securing sensitive data and thereby enables traceable identity protection without compromising the structural integrity of input documents. Extensive evaluations are carried out on a curated dataset of 500+ real-world PAN cards, which establishes the model’s robustness and applicability in large-scale deployments. The results of comparative analysis with baseline techniques including ORB, FAST, SIFT and deep CNN, clearly demonstrate the superior performance of the proposed method in terms of various metrics such as Precision, Recall, SSIM, runtime efficiency, and spatial overhead. In addition, the research findings suggest practical implications for embedding CNN-based pseudonymization into Public-sector Document processing pipelines, which supports secure and utility-preserved digital archiving in compliance with modern privacy GDPR standards.

Original languageEnglish
Article number8120
JournalScientific Reports
Volume16
Issue number1
DOIs
Publication statusPublished - 12-2026

All Science Journal Classification (ASJC) codes

  • General

Fingerprint

Dive into the research topics of 'Deep learning enabled pseudonymization for preserving data privacy of financial identifiers in public documents in India'. Together they form a unique fingerprint.

Cite this