Shilpi Goyal

Work place: Amity University/ Department of Computer Science, Gwalior, 474005, India

E-mail: agarwal.shilpi1@gmail.com

Website:

Research Interests:

Biography

Shilpi Goyal is a researcher at the Amity University, Gwalior, India. She received the B.Tech. degree in computer science and engineering and the M.Tech. degree in computer science and engineering. Her major field of study is computer science and engineering. She has over 14 years of teaching experience. Her research interests include machine learning, image processing, and computer vision. She has authored a total of seven research publications in these areas.

Author Articles
PARSeq-GeoAware: Explicit Geometric Modeling for Robust Scene Text Recognition in the Wild

By Shilpi Goyal Deepak Motwani

DOI: https://doi.org/10.5815/ijigsp.2026.03.08, Pub. Date: 8 Jun. 2026

Scene text recognition in unconstrained environments remains challenging due to geometric distortions including arbitrary orientations, curved baselines, and perspective deformations. Transformer-based methods achieve strong performance on regular benchmarks through implicit spatial learning but suffer accuracy drops of 8–12% on heavily curved text, where attention weights become diffuse and fail to capture explicit geometric structure. No prior work quantifies the isolated contribution of explicit geometric modeling within transformer architectures. To address this, we propose PARSeq-GeoAware, a dual-branch scene text recognition framework integrating an Enhanced Geometric Feature Extractor (GFE), adaptive coarse-to-fine rectification (affine + TPS), and a cross-attention fusion module combining explicit geometric representations with ViT-based visual features decoded by a CTC head. Trained on 176,630 image-label pairs across three progressive stages and evaluated on six standard benchmarks, PARSeq-GeoAware achieves 89.87% on IIIT5K, 82.07% on SVT, 84.55% on ICDAR13, 68.90% on ICDAR15, 71.26% on ArT, and 81.27% on Total-Text. On irregular and curved text benchmarks — the primary target of this work — our ±1 character accuracy reaches 84.10% on ArT and 90.05% on Total-Text, exceeding PARSeq's published word accuracy of 79.3% and 87.1% respectively by +4.8pp and +2.95pp, without a language model. Ablation studies confirm that disabling all geometric components reduces ArT word accuracy from 71.26% to 42.89% (−28.37pp), establishing the GFE as the primary driver of irregular text performance. The adaptive rectification module achieves full-pipeline inference in 11.9 ± 1.4ms on Tesla T4, which is 6.5× faster than DAN (78ms). A three-stage progressive training curriculum prevents catastrophic forgetting, retaining 89.87% regular accuracy after irregular specialization versus 80.6% with joint training (+14.8pp). These results demonstrate that explicit geometric modeling enables a single architecture to handle synthetic, regular, and irregular scene text without specialized language model post-processing. The code is available at https://github.com/Arni-123/PARSeq-GeoAware.

[...] Read more.
Other Articles