Debabrata Datta

Work place: Department of Computer Science, St. Xavier’s College (Autonomous), Kolkata, India



Research Interests: Data Structures and Algorithms, Data Mining


Debabrata Datta pursued his Master of Technology from University of Calcutta, India and he is currently pursuing his Ph.D. in Technology from the same university. He is an Assistant Professor in the department of Computer Science, St. Xavier’s College (Autonomous), Kolkata, India. He is a life member of IETE. He has published more than 20 research papers in various reputed international journals and conferences His main research work focuses on Data Analysis.

Author Articles
Text Classification Using SVM Enhanced by Multithreading and CUDA

By Soumick Chatterjee Pramod George Jose Debabrata Datta

DOI:, Pub. Date: 8 Jan. 2019

With the sudden growth of the internet and digital documents available on the web, the task of organizing text data has become a major problem. In recent times, text classification has become one of the main techniques for organizing text data. The idea behind text classification is to classify a given piece of text to a predefined class or category. In the present research work, SVM has been used with linear kernel using the One-V-Rest strategy. The SVM is trained using various data sets collected from various sources. It may so happen that some particular words were not so common around 5-6 years ago, but are currently prevalent due to recent trends. Similarly, new discoveries may result in the coinage of new words. This process can also be applied to text blogs which can be crawled and then analyzed. This technique should in theory be able to classify blogs, tweets or any other document with a significant amount of accuracy. In any text classification process, preprocessing phase takes the most amount of time – cleaning, stemming, lemmatization etc. Hence, the authors have used a multithreading approach to speed up the process. The authors further tried to improve the processing time of the algorithm using GPU parallelism using CUDA.

[...] Read more.
Apriori Algorithm using Hashing for Frequent Itemsets Mining

By Debabrata Datta Atindriya De Deborupa Roy Soumodeep Dutta

DOI:, Pub. Date: 8 Nov. 2018

Data Warehousing, data mining and analysis plays a very important role in decision support. Various commercial organisations are using tools based on these techniques to be used for decision support system. Apriori algorithm is a classic algorithm which works on a set of data in the database and provides us with the set of most frequent itemsets. It is used to find the association rules and mines the most frequent itemsets in a set of transactions. Here the frequent subsets are extended one item at a time. In this paper a hash-based technique with Apriori algorithm has been designed to work on data analysis. Hashing helps in improving the spatial requirements as well as makes the process faster. The main purpose behind the work is to help in decision making. The user will select an item which he/she wishes to purchase, and his/her item selection is analysed to give him/her an option of two and three item sets. He/she can consider choosing a combination of two item sets or three item sets, or he/she can choose to go with his/her own purchase. Either ways, the algorithm helps him in making a decision.

[...] Read more.
Application of Materialized View in Incremental Data Mining Operation

By Debabrata Datta Kashi Nath Dey

DOI:, Pub. Date: 8 Jun. 2017

Materialized view is a database object used to store the results of a query set. It is used to avoid the costly processing time that is required to execute complex queries involving aggregation and join operations. Materialized view may be associated with the operations of a data warehouse. Data mining is a technique to extract knowledge from a data warehouse and the incremental data mining is another process that periodically updates the knowledge that has been already identified by a data mining process. This happens when a new set of data gets added with the existing set. This paper proposes a method to apply the materialized view in incremental data mining.

[...] Read more.
A Frequency Based Approach to Multi-Class Text Classification

By Anurag Sarkar Debabrata Datta

DOI:, Pub. Date: 8 May 2017

Text classification is a method which involves managing and processing important information that can be categorized into predefined classes within a collection of text data. This method plays a vital role in the field of information processing and information retrieval. Different approaches to text classification specifically based on machine learning algorithms have been discussed and proposed in various research works. This paper discusses a classification approach based on the frequencies of some important text parameters and classifies a given text accordingly into one among multiple categories. Using a newly defined parameter called wf-icf, classification accuracy obtained in a previous work was significantly improved upon.

[...] Read more.
Other Articles