Work place: Center for Systems Development, Brazilian Army, Brazil
Wallace A. Pinheiro graduated in Electronic Engineering in 1998 at the Federal University of Rio de Janeiro (UFRJ). Master in Computer Systems in 2004 at the Military Engineering Institute (IME). Doctor in Systems and Computer Engineering in 2010 at UFRJ. From 2010 to 2014, He was a professor at the IME, working on the following themes: command and control, databases, data quality, and information retrieval. From 2014 to 2017, he worked at the Systems Development Center (CDS), an organization responsible for developing various systems used by the Brazilian Army. In 2018, he made a post-doctorate at UFRJ in Systems and Computer Engineering. From 2019 to 2022, he returned to work at the Systems Development Center (CDS), where he could apply his postdoctoral research. Currently, he is interested in data mining and artificial intelligence.
DOI: https://doi.org/10.5815/ijitcs.2023.05.01, Pub. Date: 8 Oct. 2023
Some problems involving the selection of samples from undisclosed groups are relevant in various areas such as health, statistics, economics, and computer science. For instance, when selecting a sample from a population, well-known strategies include simple random and stratified random selection. Another related problem is selecting the initial points corresponding to samples for the K-means clustering algorithm. In this regard, many studies propose different strategies for choosing these samples. However, there is no consensus on the best or most effective ap-proaches, even when considering specific datasets or domains. In this work, we present a new strategy called the Sam-ple of Groups (SOG) Algorithm, which combines concepts from grid, density, and maximum distance clustering algo-rithms to identify representative points or samples located near the center of the cluster mass. To achieve this, we create boxes with the right size to partition the data and select the representatives of the most relevant boxes. Thus, the main goal of this work is to find quality samples or seeds of data that represent different clusters. To compare our approach with other algorithms, we not only utilize indirect measures related to K-means but also employ two direct measures that facili-tate a fairer comparison among these strategies. The results indicate that our proposal outperforms the most common-ly used algorithms.[...] Read more.
Subscribe to receive issue release notifications and newsletters from MECS Press journals