Not seeing the wood for the trees? The effect of class imbalance and noise on random forests classification accuracy

    Research output: Contribution to conferencePosterpeer-review

    Abstract

    Machine learning algorithms are increasingly attracting attention from management and marketing researchers due to their predictive accuracy. There is, however, an increasing awareness of the limitations of these methods, particularly when they are faced with unbalanced samples and noisy data. Random Forests (RF), a machine learning classification algorithm has grown in popularity due to its learning capacity and has even been described as the best ‟off-the-shelf” algorithm. Thus, it is becoming more important for researchers to know how class imbalance (i.e.one of the categories in the target variable being much less prevalent than other) and the amount of noise in the data affect RF classification accuracy. The aim of this study is to determine whether these influences operate independently or if the incidence of one affects the severity of the other. Our results show that as expected both noise and sample imbalance affect classification accuracy, and in particular affect classification accuracy of the minority class. However, the results also show that these two effects are not independent of each other and classification accuracy worsens when the algorithm is faced with data which is both noisy and unbalanced, compared to when dealing with data which is either noisy or unbalanced. The findings have implications for evaluating random forest performance, and for strategies for reducing the effects of sample imbalance.
    Original languageEnglish
    Publication statusPublished - 30 Jun 2023
    EventFBSS Research Conference 2023 - Kingston upon Thames, U.K.
    Duration: 30 Jun 202330 Jun 2023

    Conference

    ConferenceFBSS Research Conference 2023
    Period30/06/2330/06/23

    Bibliographical note

    Organising Body: Kingston University

    Keywords

    • Computer science and informatics

    Fingerprint

    Dive into the research topics of 'Not seeing the wood for the trees? The effect of class imbalance and noise on random forests classification accuracy'. Together they form a unique fingerprint.

    Cite this