TY - GEN
T1 - Towards Data Normalization Task for the Efficient Mining of Medical Data
AU - Izonin, Ivan
AU - Ilchyshyn, Bohdan
AU - Tkachenko, Roman
AU - Gregus, Michal
AU - Shakhovska, Natalya
AU - Strauss, Christine
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The paper investigates the problem of data normalization in solving medical diagnostics tasks by machine learning algorithms. The authors describe five different data normalization methods' operations, advantages, and disadvantages. The effectiveness of their work was evaluated using two data sets with different Imbalanced Ratio, which is typical for medical tasks. The modeling was performed by solving a binary classification task using three different machine learning methods based on decision trees. It is experimentally established that the method of normalization ScalerOnCircle, unlike others, increases the efficiency of analyzing medical data based on researched machine learning methods. There was a significant increase in the F1-score value when using this normalization method. It is because ScalerOnCircle, in addition to normalization by columns, provides the possibility of considering relationships between the attributes of each vector of a given dataset. This problem is very acute in the medical field, where data sets designed for intellectual analysis are characterized by many attributes and complex nonlinear relationships between them. This fact must be taken into account when mining such datasets. ScalerOnCircle opens up several benefits for the efficient mining of medical data.
AB - The paper investigates the problem of data normalization in solving medical diagnostics tasks by machine learning algorithms. The authors describe five different data normalization methods' operations, advantages, and disadvantages. The effectiveness of their work was evaluated using two data sets with different Imbalanced Ratio, which is typical for medical tasks. The modeling was performed by solving a binary classification task using three different machine learning methods based on decision trees. It is experimentally established that the method of normalization ScalerOnCircle, unlike others, increases the efficiency of analyzing medical data based on researched machine learning methods. There was a significant increase in the F1-score value when using this normalization method. It is because ScalerOnCircle, in addition to normalization by columns, provides the possibility of considering relationships between the attributes of each vector of a given dataset. This problem is very acute in the medical field, where data sets designed for intellectual analysis are characterized by many attributes and complex nonlinear relationships between them. This fact must be taken into account when mining such datasets. ScalerOnCircle opens up several benefits for the efficient mining of medical data.
KW - binary classification
KW - binary classification; machine learning; decision trees; data preprocessing; small data; imbalanced ratio; data normalization
KW - data normalization, medical diagnostics
KW - data preprocessing
KW - decision trees
KW - imbalanced ratio
KW - machine learning
KW - medical diagnostics
KW - small data
UR - http://www.scopus.com/inward/record.url?scp=85141190141&partnerID=8YFLogxK
U2 - 10.1109/ACIT54803.2022.9913112
DO - 10.1109/ACIT54803.2022.9913112
M3 - Contribution to proceedings
AN - SCOPUS:85141190141
SN - 978-1-66541-049-6
SN - 978-1-66546-647-9
T3 - International Conference on Advanced Computer Information Technologies
SP - 480
EP - 484
BT - 2022 12th International Conference on Advanced Computer Information Technologies
PB - IEEE
CY - Piscataway
T2 - 12th International Conference on Advanced Computer Information Technologies, ACIT 2022
Y2 - 26 September 2022 through 28 September 2022
ER -