IJITCS Vol. 11, No. 9, Sep. 2019
Cover page and Table of Contents: PDF (size: 260KB)
This paper studies the forecasting mechanism of the most widely used machine learning algorithms, namely linear discriminant analysis, logistic regression, k-nearest neighbors, random forests, artificial neural network, naive Bayes, classification and regression trees, support vector machines, adaptive boosting, and stacking ensemble model, in forecasting first-generation college students’ six-year graduation using the first college year’s data. Five standard evaluating metrics are used to evaluate these models. The results show that these machine learning models can significantly predict first-generation college students’ six-year graduation with mean forecasting accuracy rate spanning from 69.58% to 75.17% and median forecasting accuracy rate spanning from 70.37% to 74.52%. Among these machine learning algorithms, stacking ensemble model, logistic regression model, and linear discriminant analysis are the best three ones in terms of mean forecasting accuracy rate. Furthermore, the results from the repeated ten-fold cross-validation process reveal that the variations of the five evaluating metrics exhibit remarkably different patterns across the ten machine learning algorithms.[...] Read more.
A systematic mapping study (SMS) of proposed EA measurement solutions was undertaken to provide an in-depth understanding of the claimed achievements and limitations in evidence-based research of enterprise architecture (EA). This SMS reports on 22 primary studies on EA measurement solutions published up to the end of 2018. The primary studies were analyzed thematically and classified according to ten (10) mapping questions including, but not limited to, positioning of EA measurement solutions within EA schools of thought, analysis of consistency-inconsistency of the terms used by authors in EA measurement research, and an analysis of the references to the ISO 15939 measurement information model. Some key findings reveal that the current research on EA measurement solutions is focused on the “enterprise IT architecting” school of thought, does not use rigorous terminology as found in science and engineering, and shows limited adoption of knowledge from other disciplines. The paper concludes with new perspectives for future research avenues in EA measurement.[...] Read more.
In investigation of consequences of atmosphere and commutating striking voltages, for simulation of the overvoltage are used the models of generators whose RC circuits have standard passive parameters of the elements upon which the form of striking overvoltage depends.
According to IEC 62 305-1 standard, these formulas in the theoretical model serve for dimensioning the RC circuit of the generator of striking voltages although the definitions of time constants and passive parameters have only axiomatic character. Related to classical solution, this paper presents the model formed by mathematical procedure the solutions of which give sufficiently accurate values of time constants and essential parameters of RC circuit as well as the shape of striking voltage wave. The formulas for voltages and currents in model contain parameters of passive elements, and their accuracy has been confirmed by diagrams obtained in simulation by means of adapted psbtrnsrg.mdl part of MATLAB program. Theoretical model is suitable for simulation of standard wave forms of striking atmospheric and commutating overvoltages which replace laboratory testing.[...] Read more.
As a result of the rapid development of technology, data that contain a large number of features are produced from various applications such as biomedical, social media, face recognition, etc. Processing of these data is a challenging task to existing data mining and machine learning algorithms to make the decision. To reduce the size of the data for processing, a feature selection technique is needed. The feature selection is a well-known attribute selection or variable selection. The objective of the feature selection is to minimize the number of attributes contains in the dataset by eliminating the unwanted and repeated attributes to improve the classification accuracy and reduce the computation cost. Although various feature selection methods are proposed, in literature, to classify the healthcare data especially cancer diagnosis, finding an informative feature for medical datasets has still remained a challenging issue in the data mining and machine learning domain. Therefore, this paper presents a feature selection approach with the wrapper method (WFS) using particle swarm optimization (PSO) search to improve the accuracy of healthcare data classification. This work is evaluated on five benchmark medical datasets publicly available from the UCI machine learning repository. The experimental results showed that the WFS-PSO approach produces higher classification accuracy applied to different classification algorithms.[...] Read more.
Multiple protein sequence alignment (MPSA) intend to realize the similarity between multiple protein sequences and increasing accuracy. MPSA turns into a critical bottleneck for large scale protein sequence data sets. It is vital for existing MPSA tools to be kept running in a parallelized design. Joining MPSA tools with cloud computing will improve the speed and accuracy in case of large scale data sets. PROBCONS is probabilistic consistency for progressive MPSA based on hidden Markov models. PROBCONS is an MPSA tool that achieves the maximum expected accuracy, but it has a time-consuming problem. In this paper firstly, the proposed approach is to cluster the large multiple protein sequences into structurally similar protein sequences. This classification is done based on secondary structure, LCS, and amino acids features. Then PROBCONS MPSA tool will be performed in parallel to clusters. The last step is to merge the final PROBCONS of clusters. The proposed algorithm is in the Amazon Elastic Cloud (EC2). The proposed algorithm achieved the highest alignment accuracy. Feature classification understands protein sequence, structure and function, and all these features affect accuracy strongly and reduce the running time of searching to produce the final alignment result.[...] Read more.
The influence of exponentially increasing camera-embedded smartphones all around the world has magnified the importance of computer vision tasks, and gives rise to a vast number of opportunities in the field. One of the major research areas in this field is the extraction of text embedded in natural scene images. Natural scene images are the images taken from a camera, where the background is random, and the variety of colors used in the image may be diverse. When text is present in such type of images, it is usually difficult for a machine to detect and extract this text due to a number of parameters. This paper presents a technique that uses a combination of the Open Source Computer Vision Library (OpenCV) and the Convolutional Neural Networks (CNN), to extract English text from images efficiently. The CNN model is based on a two-stage pipeline that uses a single neural network to directly detect the characters in the scene images. It eliminates the unnecessary intermediate steps that are present in the previous approaches to this task making them slower and inaccurate, thereby improving the time complexity and the performance of the algorithm.[...] Read more.