Hybrid TCN-transformer Model with Multi-head Attention for Stock Price Forecasting

PDF (897KB), PP.173-187

Views: 0 Downloads: 0

Author(s)

Velaga Sai Krishna Kowshik 1 Desu Venkata Sai Manoj Kumar 1 Padarthi J. N. D. M. Prakash 1 Yanaganthi Sathwik 1 Jeethu V. Devasia 2,*

1. School of Computer Science and Engineering, VIT-AP University, India

2. School of Computer Science and Engineering, RV University, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2026.03.11

Received: 4 Jan. 2026 / Revised: 11 Mar. 2026 / Accepted: 12 Apr. 2026 / Published: 8 Jun. 2026

Index Terms

Stock Price Forecasting, Temporal Convolutional Networks, Transformer, Multi-Head Attention, Time-Series Analysis, Deep Learning, Financial Prediction

Abstract

In this research, a Temporal Convolutional Network (TCN) is combined with a Transformer model with multi-head attention to present a novel approach to stock price forecasting. The primary objective is to address the challenges of recognizing complex patterns and long-term interdependence inherent in the volatility of financial time series data. By fusing the powerful attention mechanisms of Transformers with the sequential processing capabilities of TCNs, the hybrid model provides a powerful solution. This method performs better than conventional deep learning models, including Long Short-Term Memories and standalone TCNs, according to extensive testing on historical stock market data. The outcomes highlight the efficacy of this approach for trustworthy stock market forecasting by demonstrating notable gains in prediction accuracy and model stability.

Cite This Paper

Velaga Sai Krishna Kowshik, Desu Venkata Sai Manoj Kumar, Padarthi J. N. D. M. Prakash, Yanaganthi Sathwik, Jeethu V. Devasia, "Hybrid TCN-transformer Model with Multi-head Attention for Stock Price Forecasting", International Journal of Intelligent Systems and Applications(IJISA), Vol.18, No.3, pp.173-187, 2026. DOI:10.5815/ijisa.2026.03.11

Reference

[1]R. J. Hyndman and G. Athanassopoulos, Forecasting: Principles and Practice, 2nd ed. OTexts, Jun. 2018. [Online]. Available: https://otexts.com/fpp2/
[2]S. Chen and H. He, "Stock prediction using convolutional neural network," IOP Conf. Ser. Mater. Sci. Eng., vol. 435, p. 012026, 2018, doi: 10.1088/1757-899X/435/1/012026.
[3]W. Bao, J. Yue, and Y. Rao, "A deep learning framework for financial time series using stacked autoencoders and long–short term memory," PLoS ONE, vol. 12, no. 7, p. e0180944, Jul. 2017, doi: 10.1371/journal.pone.0180944.
[4]M. Shabani, D. T. Tran, M. Magris, J. Kanniainen, and A. Iosifidis, "Multi-head temporal attention-augmented bilinear network for financial time series prediction," arXiv preprint, arXiv:2201.05459, 2022, doi: 10.48550/arXiv.2201.05459.
[5]A. Vaswani et al., "Attention is all you need," in Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008, 2017, doi: 10.48550/arXiv.1706.03762.
[6]S. Bai, J. Z. Kolter, and V. Koltun, "An empirical evaluation of generic convolutional and recurrent networks for sequence modeling," arXiv preprint, arXiv:1803.01271, 2018, doi: 10.48550/arXiv.1803.01271.
[7]S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997, doi: 10.1162/neco.1997.9.8.1735.
[8]N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting," J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014, doi: 10.5555/2627435.2670313.
[9]M. BuczyƄski, M. Chlebus, K. Kopczewska, and M. Zajenkowski, "Financial time series models—Comprehensive review of deep learning approaches and practical recommendations," Engineering Proceedings, vol. 39, no. 1, p. 79, 2023, doi: 10.3390/engproc2023039079.
[10]Y.-W. Cheung and M. D. Chinn, "Currency traders and exchange rate dynamics: A survey of the U.S. market," J. Int. Money Finance, vol. 20, no. 3, pp. 439–471, 2001, doi: https://doi.org/10.1016/S0261-5606(01)00002-X.
[11]A. Razouk, M. E. Falloul, A. Harkati, and F. Touhami, "Performance evaluation of technical indicators for forecasting the Moroccan stock index using deep learning," Indonesian J. Electr. Eng. Comput. Sci., vol. 32, no. 3, pp. 1785–1794, Dec. 2023, doi: 10.11591/ijeecs.v32.i3.pp1785-1794.
[12]A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional LSTM networks," Neural Networks, vol. 18, no. 5–6, pp. 602–610, 2005, doi: 10.1016/j.neunet.2005.06.042.
[13]B. Lim and S. Zohren, "Time-series forecasting with deep learning: A survey," Philosophical Transactions of the Royal Society A, vol. 379, no. 2194, p. 20200209, Jan. 2021, doi: 10.1098/rsta.2020.0209.
[14]F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions," in Proc. Int. Conf. Learn. Representations (ICLR), 2018, doi: 10.48550/arXiv.1511.07122.
[15]G. Zhang, B. E. Patuwo, and M. Y. Hu, "Forecasting with artificial neural networks: The state of the art," Int. J. Forecast., vol. 14, no. 1, pp. 35–62, 1998, doi: 10.1016/S0169-2070(97)00044-1.
[16]A. Graves, A. R. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," in Proc. IEEE ICASSP, pp. 6645–6649, 2013, doi: 10.1109/ICASSP.2013.6638947.
[17]Z. Cui, R. Ke, Z. Pu, and Y. Wang, "Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction," arXiv preprint arXiv:1801.02143, 2018, doi: 10.48550/arXiv.1801.02143.
[18]T. Fischer and C. Krauss, "Deep learning with long short-term memory networks for financial market predictions," Eur. J. Oper. Res., vol. 270, no. 2, pp. 654–669, Aug. 2018, doi: 10.1016/j.ejor.2017.11.054.
[19]K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 770–778, 2016, doi: 10.1109/CVPR.2016.90.
[20]O. B. Sezer, M. U. Gudelek, and A. M. Ozbayoglu, "Financial time series forecasting with deep learning: A systematic literature review: 2005–2019," Appl. Soft Comput., vol. 90, p. 106181, 2020, doi: 10.1016/j.asoc.2020.106181.
[21]J. Gehring, M. Auli, D. Grangier, and Y. N. Dauphin, "Convolutional sequence to sequence learning," in Proc. Int. Conf. Machine Learning (ICML), pp. 1243–1252, 2017, doi: 10.48550/arXiv.1705.03122.
[22]W. Jiang, "Applications of deep learning in stock market prediction: Recent progress," arXiv preprint, arXiv:2003.01859, 2020, doi: 10.48550/arXiv.2003.01859.
[23]F. A. Gers, J. Schmidhuber and F. Cummins, "Learning to forget: continual prediction with LSTM," 1999 Ninth International Conference on Artificial Neural Networks ICANN 99. (Conf. Publ. No. 470), Edinburgh, UK, 1999, pp. 850-855 vol.2, doi: 10.1049/cp:19991218.
[24]Y. Bengio, P. A. Simard, and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult," IEEE Trans. Neural Netw., vol. 5, no. 2, pp. 157–166, 1994, doi: 10.1109/72.279181.
[25]M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673–2681, 1997, doi: 10.1109/78.650093.
[26]A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks. Springer, 2013, doi: 10.1007/978-3-642-24797-2.
[27]M. T. Luong, H. Pham, and C. D. Manning, "Effective approaches to attention-based neural machine translation," in Proc. EMNLP, pp. 1412–1421, 2015, doi: 10.18653/v1/D15-1166.
[28]J. K. Chorowski et al., "Attention-based models for speech recognition," in Proc. 32nd Int. Conf. Machine Learning (ICML), pp. 577–585, 2015, doi: 10.48550/arXiv.1506.07503.
[29]X. Wen and W. Li, "Time Series Prediction Based on LSTM-Attention-LSTM Model," in IEEE Access, vol. 11, pp. 48322-48331, 2023, doi: 10.1109/ACCESS.2023.3276628.
[30]L. Lemin, S. Yafei, W. Ke, L. Sicong and S. Songhao, "BiLSTM-TCN: An Aerial Target Intent Recognition Model," 2023 China Automation Congress (CAC), Chongqing, China, 2023, pp. 1164-1170, doi: 10.1109/CAC59555.2023.10451474.
[31]J. Bai et al., "A temporal convolutional network–bidirectional long short-term memory (TCN-BiLSTM) prediction model for temporal faults in industrial equipment," Appl. Sci., vol. 15, no. 4, p. 1702, 2025, doi: https://doi.org/10.3390/app15041702.
[32]M. Wang and F. Qin, "A TCN-linear hybrid model for chaotic time series forecasting," Entropy, vol. 26, no. 6, p. 467, 2024, doi: 10.3390/e26060467.
[33]J. Liu et al., "Stock prices prediction using deep learning models," arXiv preprint, arXiv:1909.12227, 2019, doi: 10.48550/arXiv.1909.12227.