Work place: University of Brasilia, Department of Tropical Medicine, Brazil
Ana B. S. Pinheiro graduated in Maternal and Child Nursing from the Federal University of Rio de Janeiro (2000). Post-graduated in Neonatal Intensive Care from Universidade Federal Fluminense (2003). Master in Tropical Medicine in the area of Molecular Biology from the University of Brasília (2019). From 2001 to 2004, she worked at the Army Health School. From 2004 to 2008, she worked at the Army Central Hospital. Besides, she worked in the following areas of the Military Polyclinic of Rio de Janeiro (2008 to 2014): Center for Study in Integrated Therapies; Commission for Infection Control, Control, and Combat against Dengue. Furthermore, she worked in the Army Health Directorate in the Preventive and Assistance Health Section (2015 to 2017) and at the Brasília Military Hospital in the area of physiotherapy (2017 to 2019). Currently, her interests involve: epidemiology, and information technologies focused on the health area.
DOI: https://doi.org/10.5815/ijitcs.2023.05.01, Pub. Date: 8 Oct. 2023
Some problems involving the selection of samples from undisclosed groups are relevant in various areas such as health, statistics, economics, and computer science. For instance, when selecting a sample from a population, well-known strategies include simple random and stratified random selection. Another related problem is selecting the initial points corresponding to samples for the K-means clustering algorithm. In this regard, many studies propose different strategies for choosing these samples. However, there is no consensus on the best or most effective ap-proaches, even when considering specific datasets or domains. In this work, we present a new strategy called the Sam-ple of Groups (SOG) Algorithm, which combines concepts from grid, density, and maximum distance clustering algo-rithms to identify representative points or samples located near the center of the cluster mass. To achieve this, we create boxes with the right size to partition the data and select the representatives of the most relevant boxes. Thus, the main goal of this work is to find quality samples or seeds of data that represent different clusters. To compare our approach with other algorithms, we not only utilize indirect measures related to K-means but also employ two direct measures that facili-tate a fairer comparison among these strategies. The results indicate that our proposal outperforms the most common-ly used algorithms.[...] Read more.
Subscribe to receive issue release notifications and newsletters from MECS Press journals