Ahsan Habib

Work place: Department of Software Engineering, University of Frontier Technology, Bangladesh, Gazipur, Bangladesh

E-mail: ahsan0001@uftb.ac.bd

Website: https://orcid.org/0000-0002-5498-8339

Research Interests:

Biography

Md. Ahsan Habib ,is currently serving as a Lecturer in the Department of Software Engineering at the University of Frontier Technology, Bangladesh. He holds both an M.Sc. and B.Sc. (Engg.) in Computer Science and Engineering from Mawlana Bhashani Science and Technology University (MBSTU), where he graduated with distinction.

Previously, he served as a Senior Lecturer (July–August 2024) and Lecturer (2020–2024) in the Department of Computer Science and Engineering at Bangladesh University, where he taught undergraduate courses, supervised final-year research projects, and actively contributed to academic and curriculum development initiatives.

Mr. Habib’s research interests span Machine Learning, Deep Learning, Computer Vision, Image Processing, and Natural Language Processing, with a strong focus on generative adversarial networks (GANs), contrastive learning, and multimodal AI systems. His recent work explores the intersection of vision and language, emphasizing text-to-image synthesis, image-text alignment, and explainable artificial intelligence (XAI). He has published in several high-impact journals and conferences, including IEEE Access, as well as journals and proceedings from Wiley, Elsevier, and Tech Science Press.

In addition to his academic and research contributions, Mr. Habib has served as a Master Trainer in multiple cybersecurity training programs conducted in collaboration with JICA Bangladesh, delivering national-level workshops on cybersecurity awareness and risk mitigation. He has also conducted specialized training sessions with Universitas Indonesia and supported ICT skill development programs under the ICT Division of Bangladesh.

He is deeply committed to research that bridges theory and real-world application, and aims to continue advancing the field of intelligent systems through interdisciplinary collaboration and impactful innovation.

Author Articles
Text-to-Image Synthesis Using MoCoGAN with Attention Mechanisms: A Unified Approach to Semantic and Dynamic Visual Representation

By Ahsan Habib Deloara Khushi Masud Rana

DOI: https://doi.org/10.5815/ijem.2026.03.10, Pub. Date: 8 Jun. 2026

Generating realistic images from textual descriptions remains a core challenge in artificial intelligence, with broad applications in assistive technology, virtual environments, and creative media. Existing text-to-image synthesis models often struggle with fine-grained semantic alignment and motion-aware scene generation, particularly in dynamic or complex prompts. This paper presents MoCoGAN+ATT, an enhanced framework that extends the MoCoGAN architecture by integrating attention mechanisms and Bidirectional Encoder Representations from Transformers (BERT) to extract and align rich semantic features from text. The attention module enables precise correspondence between textual concepts and visual components, leading to semantically faithful and visually coherent image generation. We evaluate MoCoGAN+ATT on five benchmark datasets—COCO, CUB-200-2011, Oxford-102 Flowers, MSR-VTT, and Visual Genome—demonstrating notable improvements over existing baselines. Specifically, on the COCO dataset, the proposed model achieved an Inception Score of 28.71, FID of 11.91, and R-Precision of 94.92; on CUB-200-2011, it obtained 27.36, 12.72, and 93.53 respectively; on Oxford-102 Flowers, the model achieved 28.63 (IS), 14.53 (FID), and 73.78 (R-Precision); on MSR-VTT, results were 28.01, 12.62, and 96.43; and on Visual Genome, we recorded 28.15, 17.93, and 94.52. The key novelty of this work lies in fusing motion-aware generative modeling with fine-grained attention-guided textual conditioning for dynamic image synthesis. These results highlight the effectiveness of combining attention-based textual conditioning with motion-aware generative modeling and point toward promising future directions for advancing multimodal image generation.

[...] Read more.
Other Articles