A. S. Alvi

Work place: Department of CSE, PRMIT &R, Badnera, Amravati, India

E-mail: abrar_alvi@rediffmail.com


Research Interests: Natural Language Processing


Dr. A. S. Alvi has graduated from Sant Gadge Baba Amravati University, Amravati in Computer Science and Engineering. He got his Master’s and a Ph.D. degree from the same university. Currently, he is working as a Professor in Computer Science and Engineering at PRMIT &R, Badnera, Amravati. He has more than 20 years of teaching experience. He has published more than 25 papers in international journals and conferences. His area of interest is Artificial intelligence and Algorithms. His interest also lies in Natural Language Processing. He is a Life time member of ISTE and IET professional bodies. He is also a research guide at SGB, Amravati University, Amravati.

Author Articles
Optimized Time Efficient Data Cluster Validity Measures

By Anand Khandare A. S. Alvi

DOI: https://doi.org/10.5815/ijitcs.2018.04.05, Pub. Date: 8 Apr. 2018

The main task of any clustering algorithm is to produce compact and well-separated clusters. Well separated and compact type of clusters cannot be achieved in practice. Different types of clustering validation are used to evaluate the quality of the clusters generated by clustering. These measures are elements in the success of clustering. Different clustering requires different types of validity measures. For example, unsupervised algorithms require different evaluation measures than supervised algorithms. The clustering validity measures are categorized into two categories. These categories include external and internal validation. The main difference between external and internal measures is that external validity uses the external information and internal validity measures use internal information of the datasets. A well-known example of the external validation measure is Entropy. Entropy is used to measure the purity of the clusters using the given class labels. Internal measures validate the quality of the clustering without using any external information. External measures require the accurate value of the number of clusters in advance. Therefore, these measures are used mainly for selecting optimal clustering algorithms which work on a specific type of dataset. Internal validation measures are not only used to select the best clustering algorithm but also used to select the optimal value of the number of clusters. It is difficult for external validity measures to have predefined class labels because these labels are not available often in many of the applications. For these reasons, internal validation measures are the only solution where no external information is available in the applications. 

All these clustering validity measures used currently are time-consuming and especially take additional time for calculations. There are no clustering validity measures which can be used while the clustering process is going on.

This paper has surveyed the existing and improved cluster validity measures. It then proposes time efficient and optimized cluster validity measures. These measures use the concept of cluster representatives and random sampling. The work proposes optimized measures for cluster compactness, separation and cluster validity. These three measures are simple and more time efficient than the existing clusters validity measures and are used to monitor the working of the clustering algorithms on large data while the clustering process is going on.

[...] Read more.
Performance Analysis of Improved Clustering Algorithm on Real and Synthetic Data

By Anand Khandare A. S. Alvi

DOI: https://doi.org/10.5815/ijcnis.2017.10.07, Pub. Date: 8 Oct. 2017

Clustering is an important technique in data mining to partition the data objects into clusters. It is a way to generate groups from the data objects. Different data clustering methods or algorithms are discussed in the various literature. Some of these are efficient while some are inefficient for large data. The k-means, Partition Around Method (PAM) or k-medoids, hierarchical and DBSCAN are various clustering algorithms. The k-means algorithm is more popular than the other algorithms used to partition data into k clusters. For this algorithm, k should be provided explicitly. Also, initial means are taken randomly but this may generate clusters with poor quality. This paper is a study and implementation of an improved clustering algorithm which automatically predicts the value of k and uses a new technique to take initial means. The performance analysis of the improved algorithm and other algorithms by using real and dummy datasets is presented in this paper. To measure the performance of algorithms, this paper uses running time of algorithms and various cluster validity measures. Cluster validity measures include sum squared error, silhouette score, compactness, separation, Dunn index and DB index. Also, the k predicted by the improved algorithm is compared with optimal k suggested by elbow method. It is found that both values of k are almost similar. Most of the values of validity measures for the improved algorithm are found to be optimal.

[...] Read more.
Other Articles