IJISA Vol. 13, No. 4, Aug. 2021
Cover page and Table of Contents: PDF (size: 127KB)
Social media has become incredibly popular these days for communicating with friends and for sharing opinions. According to current statistics, almost 2.22 billion people use social media in 2016, which is roughly one third of the world population and three times of the entire population in Europe. In social media people share their likes, dislikes, opinions, interests, etc. so it is possible to know about a person’s thoughts about a specific topic from the shared data in social media. Since, twitter is one of the most popular social media in the world; it is a very good source for opinion mining and sentiment analysis about different topics. In this research, SVM with different kernel functions and Adaboost are experimented using CPD and Chi-square feature extraction techniques to explore the best sentiment classification model. The reported average accuracy of Adaboost for Chi-square and CPD are 70.2% and 66.9%. The SVM radial basis kernel and polynomial kernel with Chi-square n-grams reported average accuracy of 73.73% and 68.67% respectively. Among the performed experimentation, SVM sigmoid kernel with Chi-square n-grams provided the maximum accuracy that is 74.4%.[...] Read more.
Propagated electromagnetic signal over the cellular radio communication channels and interfaces are usually highly stochastic and complex with unequal noise variation pattern. This is due to multipath nature of the propagation channels and diverse radio propagation mechanisms that impact the signal strength at the receiver en-route the transmitter, and verse versa. This also makes measurement, predictive modeling and estimation based analysis of such signal very challenging and complex as well. One important and popular parametric modelling and estimation technique in mathematics and engineering science, especially for signal processing applications is the least square regression (LSR). The dominance use and popularity of the LSR approach may be attributed to its simplified supporting theory, relatively fast application procedure and ubiquitous application packages. However, LSR is known to be very sensitive to outliers and unusual stochastic signal data. In this work, we propose the application of weighted least square regression method for enhanced propagation practical field strength estimation modelling over cellular radio communication networks interface. The signal data was collected from a commercial LTE networks service provider. Also, we provide statistical computational analyses to compare the resultant estimation outcome of the weighted least square and the standard least approach. From the result, it is found that the WLSR approach is reliably better the most popular standard least square method. The significance and academic of value of this paper is that our proposed and implemented WLSR method can used as replacement of the standard LSR approach for robust mobile signal processing of future communication system networks.[...] Read more.
World Health Organisation declared breast cancer (BC) as the most frequent suffering among women and accounted for 15 percent of all cancer deaths. Its accurate prediction is of utmost significance as it not only prevents deaths but also stops mistreatments. The conventional way of diagnosis includes the estimation of the tumor size as a sign of plausible cancer. Machine learning (ML) techniques have shown the effectiveness of predicting disease. However, the ML methods have been method centric rather than being dataset centric. In this paper, the authors introduce a dataset centric approach(DCA) deploying a genetic algorithm (GA) method to identify the features and a learning ensemble classifier algorithm to predict using the right features. Adaboost is such an approach that trains the model assigning weights to individual records rather than experimenting on the splitting of datasets alone and perform hyper-parameter optimization. The authors simulate the results by varying base classifiers i.e, using logistic regression (LR), decision tree (DT), support vector machine (SVM), naive bayes (NB), random forest (RF), and 10-fold cross-validations with a different split of the dataset as training and testing. The proposed DCA model with RF and 10-fold cross-validations demonstrated its potential with almost 100% performance in the classification results that no research could suggest so far. The DCA satisfies the underlying principles of data mining: the principle of parsimony, the principle of inclusion, the principle of discrimination, and the principle of optimality. This DCA is a democratic and unbiased ensemble approach as it allows all features and methods in the start to compete, but filters out the most reliable chain (of steps and combinations) that give the highest accuracy. With fewer characteristics and splits of 50-50, 66-34, and 10 fold cross-validations, the Stacked model achieves 97 % accuracy. These values and the reduction of features improve upon prior research works.
Further, the proposed classifier is compared with some state-of-the-art machine-learning classifiers, namely random forest, naive Bayes, support-vector machine with radial basis function kernel, and decision tree. For testing the classifiers, different performance metrics have been employed – accuracy, detection rate, sensitivity, specificity, receiver operating characteristic, area under the curve, and some statistical tests such as the Wilcoxon signed-rank test and kappa statistics – to check the strength of the proposed DCA classifier. Various splits of training and testing data – namely, 50–50%, 66–34%, 80–20% and 10-fold cross-validation – have been incorporated in this research to test the credibility of the classification models in handling the unbalanced data. Finally, the proposed DCA model demonstrated its potential with almost 100% performance in the classification results. The output results have also been compared with other research on the same dataset where the proposed classifiers were found to be best across all the performance dimensions.
Classification is a parlance of Data Mining to genre data of different kinds in particular classes. As we observe, social media is an immense manifesto that allows billions of people share their thoughts, updates and multimedia information as status, photo, video, link, audio and graphics. Because of this flexibility cloud has enormous data. Most of the times, this data is much complicated to retrieve and to understand. And the data may contain lot of noise and at most the data will be incomplete. To make this complication easier, the data existed on the cloud has to be classified with labels which is viable through data mining Classification techniques. In the present work, we have considered Facebook dataset which holds meta data of cosmetic company’s Facebook page. 19 different Meta Data are used as main attributes. Out of those, Meta Data ‘Type’ is concentrated for Classification. Meta data ‘Type’ is classified into four different classes such as link, status, photo and video. We have used two favored Classifiers of Data Mining that are, Bayes Classifier and Decision Tree Classifier. Data Mining Classifiers contain several classification algorithms. Few algorithms from Bayes and Decision Tree have been chosen for the experiment and explained in detail in the present work. Percentage split method is used to split the dataset as training and testing data which helps in calculating the Accuracy level of Classification and to form confusion matrix. The Accuracy results, kappa statistics, root mean squared error, relative absolute error, root relative squared error and confusion matrix of all the algorithms are compared, studied and analyzed in depth to produce the best Classifier which can label the company’s Facebook data into appropriate classes thus Knowledge Discovery is the ultimate goal of this experiment.[...] Read more.
The safety information dissemination plays a vital role in the VANET communication. It is a technique of transmitting the information at scheduled intervals or during road hazards by detecting the events using onboard system and interfaces. Information is shared between vehicles and road side units which are further used to predict vehicle collisions, road line crossings, environmental warnings, traffic data and road hazards. Interestingly the risk of lateral collisions and dense traffic for vehicles can be avoided by accomplishing fast data dissemination i.e. warning alerts by event detection. Vehicular technology which supports the safe mode of transportation is growing faster due to the deployment of new automated technology in the intelligent transportation system (ITS). The different scenarios used in vehicular communication are Vehicle to Vehicle (V-V), Vehicle to Infrastructure (V-I) and Vehicle to Internet. Some of the important characteristics of vehicular communications are the mobility, frequent changes in topology, varying transmission power of antennas, intermittent connectivity. ITS providing the solutions for most critical transportation issues and inspiring the researchers for the betterment of road safety. In this paper, we propose a multi agent based safety information dissemination scheme for vehicle to vehicle communication. The proposed algorithm performs the safety information dissemination with help of intelligent agents by optimizing the channel access techniques, message encoding and selection of intermediate nodes. Here the communication between source and destination is achieved with fever number of intermediate links by selecting the nodes in the special zone. Short interval codes which represent safety information are effectively transmitted in the intermittent nature of wireless connectivity. This proposed work describes the details of algorithm with associated network environment, multi agent functions and dissemination mechanism to illustrate the improvement in end to end delay, PDR, energy constraints etc. This method reduces the problem of broadcast storm by delivering the information to intended node. Simulation of the proposed work gives the improved results on PDR, latency and connection overhead.[...] Read more.