Deepak Motwani

Work place: Amity University/ Department of Computer Science, Gwalior, 474005, India

E-mail: dmotwani@gwa.amity.edu

Website:

Research Interests:

Biography

Deepak Motwani is an Associate Professor at Amity University. He received the Ph.D. degree in computer science and engineering, the M.E. degree in software system, the M.C.A. degree, and the B.E. degree in electronics. His major field of study is computer science and engineering. He has over 20 years of academic and research experience. He has also worked with the Education and Research Department of Infosys Ltd. as a Corporate Trainer. He has supervised number of Master of Technology research scholars. His research interests include data mining, information retrieval and information extraction, big data analysis (Hadoop and cloud computing), DBMS, Python, machine learning, and data science. He has authored a total of 52 research publications. Dr. Motwani is actively engaged in research in the areas of data mining, information retrieval, big data analytics, and machine learning.

Author Articles
PARSeq-GeoAware: Explicit Geometric Modeling for Robust Scene Text Recognition in the Wild

By Shilpi Goyal Deepak Motwani

DOI: https://doi.org/10.5815/ijigsp.2026.03.08, Pub. Date: 8 Jun. 2026

Scene text recognition in unconstrained environments remains challenging due to geometric distortions including arbitrary orientations, curved baselines, and perspective deformations. Transformer-based methods achieve strong performance on regular benchmarks through implicit spatial learning but suffer accuracy drops of 8–12% on heavily curved text, where attention weights become diffuse and fail to capture explicit geometric structure. No prior work quantifies the isolated contribution of explicit geometric modeling within transformer architectures. To address this, we propose PARSeq-GeoAware, a dual-branch scene text recognition framework integrating an Enhanced Geometric Feature Extractor (GFE), adaptive coarse-to-fine rectification (affine + TPS), and a cross-attention fusion module combining explicit geometric representations with ViT-based visual features decoded by a CTC head. Trained on 176,630 image-label pairs across three progressive stages and evaluated on six standard benchmarks, PARSeq-GeoAware achieves 89.87% on IIIT5K, 82.07% on SVT, 84.55% on ICDAR13, 68.90% on ICDAR15, 71.26% on ArT, and 81.27% on Total-Text. On irregular and curved text benchmarks — the primary target of this work — our ±1 character accuracy reaches 84.10% on ArT and 90.05% on Total-Text, exceeding PARSeq's published word accuracy of 79.3% and 87.1% respectively by +4.8pp and +2.95pp, without a language model. Ablation studies confirm that disabling all geometric components reduces ArT word accuracy from 71.26% to 42.89% (−28.37pp), establishing the GFE as the primary driver of irregular text performance. The adaptive rectification module achieves full-pipeline inference in 11.9 ± 1.4ms on Tesla T4, which is 6.5× faster than DAN (78ms). A three-stage progressive training curriculum prevents catastrophic forgetting, retaining 89.87% regular accuracy after irregular specialization versus 80.6% with joint training (+14.8pp). These results demonstrate that explicit geometric modeling enables a single architecture to handle synthetic, regular, and irregular scene text without specialized language model post-processing. The code is available at https://github.com/Arni-123/PARSeq-GeoAware.

[...] Read more.
Other Articles