IJITCS Vol. 18, No. 2, 8 Apr. 2026
Cover page and Table of Contents: PDF (size: 789KB)
PDF (789KB), PP.146-160
Views: 0 Downloads: 0
Natural Language Processing, Deep Learning, Text Processing, Classification, Transfer Learning, Calibration, Explainable Artificial Intelligence (XAI), Movie Scripts
Automated film certification remains an underexplored regulatory challenge, requiring scalable yet transparent models capable of handling full-length multilingual scripts. This paper presents a unified framework that delivers lightweight, explainable, and calibrated neural classifiers for multilingual movie script certification in English, Hindi, and Marathi. Unlike prior studies that operate on short snippets or monolingual text, our approach models entire scripts through chunk-level transformer encoding, knowledge distillation, and file-level temperature calibration, coupled with explainability-guided rule mapping for interpretable decision refinement. The proposed pipeline systematically integrates six stages—baselines, teacher modeling, distillation, calibration, explainability, and rule enrichment—yielding a compact yet trustworthy system. Experiments show that the distilled students retain over 85% of teacher accuracy while being 3× smaller, and temperature scaling substantially improves reliability (English Expected Calibration Error 0.303→0.086, Brier 0.684→0.540). Faithfulness analysis using deletion Area Under Curve confirms interpretable token attributions (0.157, 0.239, and 0.258 for English, Hindi, and Marathi respectively). Moreover, rule integration improves accuracy (English 0.581→0.587) while offering human-auditable rationales. All models are deployment-feasible, exported to ONNX/TorchScript with 3.5× compression (545 MB→150 MB) and no performance loss. Together, these results establish a reproducible, end-to-end pipeline that works multilingual long-document modeling, calibration, and interpretability for film certification—advancing trustworthy Artificial Intelligence in regulatory Natural Language Processing. To our knowledge, this is the first work to build a unified, multilingual, and explainable pipeline for movie script certification using full-length scripts across MPAA and CBFC regulatory settings.
Pratik N. Kalamkar, Prasadu Peddi, Yogesh K. Sharma, "Lightweight and Explainable Neural Models for Multilingual Movie Script Certification", International Journal of Information Technology and Computer Science(IJITCS), Vol.18, No.2, pp.146-160, 2026. DOI:10.5815/ijitcs.2026.02.09
[1]Irena Petrova. Features of the script as text type and its translation when subtitling video content. Problems of cognitive and functional description of Russian and Bulgarian languages, 2024.
[2]Ahmad S. Haider and R. Hussein. Modern standard arabic as a means of euphemism: A case study of the msa intralingual subtitling of jinn series. Journal of Intercultural Communication Research, 51:628 – 643, 2022.
[3]M. Guillot. The pragmatics of audiovisual translation: Voices from within in film subtitling. Journal of Pragmatics, 2020.
[4]Miss. Ghavate Rutuja. Movie review system using nlp and svm. International Journal for Research in Applied Science and Engineering Technology, 9:1354–1358, 2021.
[5]V. Bharadi, A. Malji, A. Mestri, Y. Narvekar, and A. Anerao. Movie genre detection from subtitle using nlp: A novel approach to identify movie genre based on the emotions from subtitles. International Journal of Creative Research Thoughts (IJCRT), 11(1):672–675, 2023. Accessed: 2025-10-07.
[6]L. F. Pardo-Sixtos, Adrian Pastor Lopez-Monroy, Mahsa Shafaei, and T. Solorio. Hierarchical attention and transformers for automatic movie rating. Expert Syst. Appl., 209:118164, 2022.
[7]Yigeng Zhang, Mahsa Shafaei, Fabio Gonzalez, and T. Solorio. From none to severe: Predicting severity in movie scripts. ArXiv, abs/2109.09276, 2021.
[8]Mahsa Shafaei, Niloofar Safi Samghabadi, Sudipta Kar, and T. Solorio. Age suitability rating: Predicting the mpaa rating based on movie dialogues. pages 1327–1335, 2020.
[9]Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 1480–1489, 2016.
[10]Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of NAACL-HLT, 2019.
[11]Pratik N. Kalamkar and Yogesh Kumar Sharma. A review of research trends in movie automation. In 2025 International Conference on Computer Technology Applications (ICCTA), pages 154–163, 2025.
[12]Pratik N. Kalamkar and Yogesh Kumar Sharma. Blend of Vader and Textblob for movie script classification. In 2024 4th International Conference on Artificial Intelligence, Robotics, and Communication (ICAIRC), pages 875–881, 2024.
[13]Pratik N. Kalamkar and Yogesh Kumar Sharma. Lexical Baselines to Transformers: Hierarchical Ordinal Classification of Indic Movie Full Scripts. In Proceedings of the 3rd IEEE International Conference on Computational Intelligence and Network Systems (CINS 2025), Dubai, United Arab Emirates, 2025. IEEE. In Press.
[14]Pratik N. Kalamkar and Yogesh Kumar Sharma. Hierarchical ordinal framework for automated movie censorship using full-length scripts. October 2025. In Press.
[15]Amr Shahin and A. Krzyz˙ak. Genreous: The movie genre detector. pages 308–318, 2020.
[16]B. A, B.V. Suresh Reddy, A. Nandam, K. Naresh, Sivakumar Depuruu, and M. Sakthivel. Sentimental analysis of movie reviews using nlp techniques. 2023 5th International Conference on Inventive Research in Computing Applications (ICIRCA), pages 713–720, 2023.
[17]J. Alqurni, M. K. Alsmadi, Hayat Alfagham, Sharaf Alzoubi, Sohayla Ihab, Ahmed Sameh, Diaa Salama Abd Elminaam, and O. I. Khalaf. Streamlining video summarization with nlp: Techniques, implementation, and future direction. SN Comput. Sci., 6, 2025.
[18]Feroza D. Mirajkar, T. A. Mohanaprakash, Kaviya D, and Jasmine Fathima K. Scalable summarization of long-form video transcripts using nlp. 2025 6th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI), pages 672–677, 2025.
[19]Manikanta Grandhi and Sreebha Bhaskaran. A hybrid approach for multimedia summarization. 2022 IEEE North Karnataka Subsection Flagship International Conference (NKCon), pages 1–6, 2022.
[20]Ansh Tulsyan, Anshul Bhardwaj, Pranjal Shukla, Jatin Verma, and Tushar Singh. Online platform for movie recommendation. INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT, 2025.
[21]Joshua A Gross, Willie Roberson, and Bay Foley-Cox. Cs 230: Film success prediction using nlp techniques. 2021.
[22]Kyle Jorgensen, Haohong Wang, and Mea Wang. From screenplay to screen: A natural language processing approach to animated film making. 2023 International Conference on Computing, Networking and Communications (ICNC), pages 484–490, 2023.
[23]Mar Castillo-Campos, David Becerra-Alonso, and H. Boomgaarden. Automated detection of media bias using artificial intelligence and natural language processing: A systematic review. Social Science Computer Review, 2025.
[24]Yulia Otmakhova, Shima Khanehzar, and Lea Frermann. Media framing: A typology and survey of computational approaches across disciplines. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15407–15428, Bangkok, Thailand, August 2024. Association for Computational Linguistics.
[25]Maxime Cre´pel, Salome´ Do, Jean-Philippe Cointet, Dominique Cardon, and Yannis Bouachera. Mapping ai issues in media through nlp methods. pages 77–91, 2021. Accessed: 2025-10-07.
[26]Philip John Gorinski and Mirella Lapata. Movie script summarization as graph-based scene extraction. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2015. ScriptBase corpus associated.
[27]Iz Beltagy, Matthew E. Peters, and Arman Cohan. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020.
[28]Xiang Dai, Ilias Chalkidis, Sune Darkner, and Desmond Elliott. Revisiting transformer-based models for long document classification. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Findings of the Association for Computational Linguistics: EMNLP 2022, pages 7212–7230, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics.
[29]Renzo Alva Principe, Nicola Chiarini, and Marco Viviani. Long document classification in the transformer era: A survey on challenges, advances, and open issues. WIREs Data Mining and Knowledge Discovery, 2025.
[30]Hai Pham, Guoxin Wang, Yijuan Lu, D. Floreˆncio, and Changrong Zhang. Understanding long documents with different position-aware attentions. ArXiv, abs/2208.08201, 2022.
[31]Manzil Zaheer, Guru Guruganesh, Karan Dubey, Jason Ainslie, Chris Alberti, Santiago Ontan˜on, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed, and Zhitao Ahmed. Big bird: Transformers for longer sequences. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
[32]Robert Litschko, Ivan Vulic, Simone Paolo Ponzetto, and Goran Glavavs. On cross-lingual retrieval with multilingual text encoders. Information Retrieval, 25:149 – 183, 2021.
[33]Rochelle Choenni and Ekaterina Shutova. Investigating language relationships in multilingual sentence encoders through the lens of linguistic typology. Computational Linguistics, 48:635–672, 2022.
[34]Shijie Wu and Mark Dredze. Beto, bentz, becas: The surprising cross-lingual effectiveness of bert. ArXiv, abs/1904.09077, 2019.
[35]Alexis Conneau, Kartik Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzma´n, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pages 8440–8451, 2020.
[36]Telmo Pires, Eva Schlinger, and Dan Garrette. How multilingual is multilingual bert? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), pages 4996–5001, 2019.
[37]Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel Bowman, Holger Schwenk, and Veselin Stoyanov. Xnli: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2475–2485, 2018.
[38]Holger Schwenk and Xian Li. A corpus for multilingual document classification in eight languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC), 2018.
[39]Elias Frantar and Dan Alistarh. Optimal brain compression: A framework for accurate post-training quantization and pruning. ArXiv, abs/2208.11580, 2022.
[40]Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, and Daniel Soudry. Improving post training neural quantization: Layer-wise calibration and integer programming. ArXiv, abs/2006.10518, 2020.
[41]Bishwash Khanal and Jeffery M. Capone. Evaluating the impact of compression techniques on task-specific performance of large language models. ArXiv, abs/2409.11233, 2024.
[42]Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019.
[43]Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
[44]Se´rgio A. Balanya, Juan Maron˜as, and Daniel Ramos. Adaptive temperature scaling for robust calibration of deep neural networks. ArXiv, abs/2208.00461, 2022.
[45]Qingyang Xi and Brian McFee. Beyond hard decisions: Accounting for uncertainty in deep mir models. 2021. Accessed: 2025-10-07.
[46]I. Vos, Ishaan Bhat, B. Velthuis, Y. Ruigrok, and Hugo J. Kuijf. Calibration techniques for node classification using graph neural networks on medical image data. pages 1211–1224, 2023.
[47]Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning (ICML), volume 70 of Proceedings of Machine Learning Research, pages 1321–1330, 2017. ICML 2017.
[48]Javier Ferrando, Gerard I. Ga´llego, and M. Costa-jussa`. Measuring the mixing of contextual information in the transformer. ArXiv, abs/2203.04212, 2022.
[49]Ali Modarressi, Mohsen Fayyaz, Yadollah Yaghoobzadeh, and Mohammad Taher Pilehvar. Globenc: Quantifying global token attribution by incorporating the whole encoder layer in transformers. ArXiv, abs/2205.03286, 2022.
[50]Mohsen Fayyaz, Fan Yin, Jiao Sun, and Nanyun Peng. Evaluating human alignment and model faithfulness of llm rationale. ArXiv, abs/2407.00219, 2024.
[51]Zhixue Zhao and Boxuan Shan. Reagent: A model-agnostic feature attribution method for generative language models. ArXiv, abs/2402.00794, 2024.
[52]M. B. Zafar, Philipp Schmidt, Michele Donini, C. Archambeau, F. Biessmann, Sanjiv Das, and K. Kenthapadi. More than words: Towards better quality interpretations of text classifiers. ArXiv, abs/2112.12444, 2021.
[53]Zhixue Zhao and Nikolaos Aletras. Incorporating attribution importance for improving faithfulness metrics. arXiv preprint arXiv:2305.10496, pages 4732–4745, 2023.
[54]J. P. Amorim, P. Abreu, Joa˜o A. M. Santos, Marc Cortes, and Victor Vila. Evaluating the faithfulness of saliency maps in explaining deep learning models using realistic perturbations. Inf. Process. Manag., 60:103225, 2023.
[55]Vangelis Lamprou, Athanasios Kallipolitis, and I. Maglogiannis. On the evaluation of deep learning interpretability methods for medical images under the scope of faithfulness. Computer methods and programs in biomedicine, 253:108238, 2024.
[56]Michael Neely, Stefan F. Schouten, Maurits J. R. Bleeker, and Ana Lucic. A song of (dis)agreement: Evaluating the evaluation of explainable artificial intelligence in natural language processing. arXiv preprint arXiv:2205.04559, pages 60–78, 2022.
[57]Gino Brunner, Yang Liu, Damian Pascual, Oliver Richter, Massimiliano Ciaramita, and Roger Wattenhofer. On identifiability in transformers. arXiv: Computation and Language, 2019.
[58]Samira Abnar and W. Zuidema. Quantifying attention flow in transformers. arXiv preprint arXiv:2005.00928, pages 4190–4197, 2020.
[59]Jasmijn Bastings and Katja Filippova. The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? arXiv preprint arXiv:2010.05607, pages 149–155, 2020.
[60]Qiang Ding, Lvzhou Luo, Yixuan Cao, and Ping Luo. Attention with dependency parsing augmentation for fine-grained attribution. ArXiv, abs/2412.11404, 2024.
[61]Benjamin Cohen-Wang, Yung-Sung Chuang, and Aleksander Madry. Learning to attribute with attention, 2025.
[62]Shuaiqi Liu, Jiannong Cao, Ruosong Yang, and Zhiyuan Wen. Key phrase aware transformer for abstractive summarization. Inf. Process. Manag., 59:102913, 2022.
[63]Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. arXiv preprint arXiv:1703.01365, 2017.
[64]Sarthak Jain and Byron C. Wallace. Attention is not explanation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), pages 3543–3556, 2019.
[65]Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), pages 1321–1330. PMLR, 2017.
[66]Lehan Yang and Jincen Song. Rethinking the knowledge distillation from the perspective of model calibration. ArXiv, abs/2111.01684, 2021.
[67]Mari-Liis Allikivi, Joonas Ja¨rve, and Meelis Kull. Cautious calibration in binary classification. arXiv preprint arXiv:2408.05120, pages 1503–1510, 2024.
[68]Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning (ICML), pages 3319–3328. PMLR, 2017.
[69]Leila Arras, Franziska Horn, Gre´goire Montavon, Klaus-Robert Mu¨ller, and Wojciech Samek. Evaluating the faithfulness of explanation methods for neural models. Information Fusion, 2022.
[70]Juri Opitz. A closer look at classification evaluation metrics and a critical reflection of common evaluation practice. Transactions of the Association for Computational Linguistics, 12:820–836, 2024.
[71]Central Board of Film Certification. Guidelines — central board of film certification (cbfc), government of india. Accessed: 2025-10-10.
[72]Motion Picture Association. Film ratings — motion picture association. Accessed: 2025-10-10.
[73]Agboeze Jude and Jia Uddin. Explainable software defects classification using smote and machine learning. Annals of Emerging Technologies in Computing, 2024.
[74]Vikash R Singh, M. Pencina, A. Einstein, Joanna X. Liang, D. Berman, and P. Slomka. Impact of train/test sample regimen on performance estimate stability of machine learning in cardiovascular imaging. Scientific Reports, 11, 2021.
[75]B. Parker, Simon Gu¨nter, and J. Bedo˝. Stratification bias in low signal microarray studies. BMC Bioinformatics, 8:326 – 326, 2007.
[76]Chansik An, Y. Park, S. Ahn, Kyunghwa Han, Hwiyoung Kim, and Seung-Koo Lee. Radiomics machine learning study with a small sample size: Single random training-test set split may lead to unreliable results. PLoS ONE, 16, 2021.
[77]Rui Hou, Joseph Y. Lo, Jeffrey R Marks, E. S. Hwang, and Lars J Grimm. Classification performance bias between training and test sets in a limited mammography dataset. PLOS ONE, 19, 2024.
[78]Q. Doan, S. Mai, Quang Thang Do, and Duc-Kien Thai. A cluster-based data splitting method for small sample and class imbalance problems in impact damage classification. Appl. Soft Comput., 120:108628, 2022.
[79]Diego Minatel, Angelo Cesar Mendes da Silva, N´ıcolas Roque dos Santos, Mariana Cu´ri, R. Marcacini, and A. Lopes. Data stratification analysis on the propagation of discriminatory effects in binary classification. Anais do XI Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2023), 2023.
[80]Mohd Hafiz Zakaria, Jafreezal Jaafar, and S. J. Abdulkadir. Preliminary investigation of balanced stratified reduction (bsr) for imbalanced datasets. 2023 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), pages 55–60, 2023.
[81]Christophe Molina, Lilia Ait-Ouarab, and H. Minoux. Isometric stratified ensembles: A partial and incremental adaptive applicability domain and consensus-based classification strategy for highly imbalanced data sets with application to colloidal aggregation. Journal of chemical information and modeling, 2022.
[82]Maria Cristina Hinojosa Lee, Johan Braet, and Johan Springael. Performance metrics for multilabel emotion classification: Comparing micro, macro, and weighted f1-scores. Applied Sciences, 2024.
[83]Kanae Takahashi, Kouji Yamamoto, A. Kuchiba, and T. Koyama. Confidence interval for micro-averaged f1 and macro-averaged f1 scores. Applied intelligence (Dordrecht, Netherlands), 52:4961 – 4972, 2021.
[84]Tanu Sharma and Kamaldeep Kaur. Benchmarking deep learning methods for aspect level sentiment classification. Applied Sciences, 2021.
[85]Haoze Du, Jiahao Xu, Zhiyong Du, Lihui Chen, Shaohui Ma, Dongqing Wei, and Xianfang Wang. Mf-mner: Multimodels fusion for mner in chinese clinical electronic medical records. Interdisciplinary Sciences, Computational Life Sciences, 16:489 – 502, 2024.
[86]Glenn W. Brier. Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1):1–3, 1950.
[87]John C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Alexander J. Smola, Peter Bartlett, Bernhard Scho¨lkopf, and Dale Schuurmans, editors, Advances in Large Margin Classifiers, pages 61–74. MIT Press, Cambridge, MA, 1999.
[88]Jeena Kleenankandy and Abdul Nazeer K A. An enhanced tree-lstm architecture for sentence semantic modeling using typed dependencies. Inf. Process. Manag., 57:102362, 2020.
[89]E. Mercha, H. Benbrahim, and Mohammed Erradi. Heterogeneous text graph for comprehensive multilingual sentiment analysis: capturing short and long-distance semantics. PeerJ Computer Science, 10, 2024.
[90]Tomas Brychcin and Miloslav Konop´ık. Latent semantics in language models. Comput. Speech Lang., 33:88–108, 2015.
[91]S. Khudanpur and Jun Wu. Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling. Comput. Speech Lang., 14:355–372, 2000.
[92]Raphael Rossellini, Jake A. Soloff, Rina Foygel Barber, Zhimei Ren, and Rebecca Willett. Can a calibration metric be both testable and actionable? arXiv preprint arXiv:2502.19851, 2025.
[93]Daniel Zeiberg. Learning calibrated classifiers from nonrepresentative data. 2024.
[94]Toby Ord, R. Hillerbrand, and A. Sandberg. Probing the improbable: methodological challenges for risks with low probabilities and high stakes. Journal of Risk Research, 13:191 – 205, 2008.
[95]Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), pages 5998–6008, 2017.
[96]Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. In Proceedings of the British Machine Vision Conference (BMVC), page 151, 2018.
[97]Ishan Mishra, Riyanshu Jain, Dhruv Viradiya, Divyam Patel, and Deepak Mishra. Knowledge distillation with ensemble calibration. Proceedings of the Fourteenth Indian Conference on Computer Vision, Graphics and Image Processing, 2023.
[98]Xinyu Liu and Jian Yang. Enhance long text understanding via distilled gist detector from abstractive summarization. arXiv preprint arXiv:2110.04741, 2021.
[99]Yan Liu, Yazheng Yang, and Xiaokang Chen. Improving long text understanding with knowledge distilled from summarization model, 2024.