Skip to main navigation Skip to search Skip to main content

StatAvg: mitigating data heterogeneity in federated learning for intrusion detection systems

  • Pavlos S. Bouzinis
  • , Panagiotis Radoglou-Grammatikis
  • , Ioannis Makris
  • , Thomas Lagkas
  • , Vasileios Argyriou
  • , Georgios Th Papadopoulos
  • , Panagiotis Sarigiannidis
  • , George K. Karagiannidis
    • MetaMind Innovations P.C.
    • University of Western Macedonia
    • K3Y Limited
    • Democritus University of Thrace
    • Kingston University
    • Harokopio University
    • Aristotle University of Thessaloniki

    Research output: Contribution to journalArticlepeer-review

    4 Downloads (Pure)

    Abstract

    Federated learning (FL) enables devices to collaboratively build a shared machine learning (ML) or deep learning (DL) model without exposing raw data. Its privacy-preserving nature has made it popular for intrusion detection systems (IDS) in the field of cybersecurity. However, data heterogeneity across participants poses challenges for FL-based IDS. This paper proposes statistical averaging (StatAvg) method to alleviate non-independently and identically (non-iid) distributed features across local clients' data in FL. In particular, StatAvg allows the FL clients to share their individual local data statistics with the server. These statistics include the mean and variance of each client's feature vector. The server then aggregates this information to produce global statistics, which are shared with the clients and used for universal data normalization, i.e., common scaling of the input features by all clients. It is worth mentioning that StatAvg can seamlessly integrate with any FL aggregation strategy, as it occurs before the actual FL training process. The proposed method is evaluated against well-known baseline approaches that rely on batch and layer normalization, such as FedBN, and address the non-iid features issue in FL. Experiments were conducted using the TON-IoT and CIC-IoT-2023 datasets, which are relevant to the design of host and network IDS, respectively. The experimental results demonstrate the efficiency of StatAvg in mitigating non-iid feature distributions across the FL clients compared to the baseline methods, offering a gain in IDS accuracy ranging from 4% to 17%.
    Original languageEnglish
    Pages (from-to)2944-2955
    Number of pages12
    JournalIEEE Transactions on Network and Service Management
    Volume22
    Issue number4
    Early online date25 Apr 2025
    Publication statusPublished - 2025

    Bibliographical note

    Note: This work was supported by the European Union's Horizon Europe research and innovation programme under grant agreement No 101070450 (AI4CYBER).

    Keywords

    • Computer science and informatics
    • intrusion detection systems
    • data heterogeneity
    • Cybersecurity
    • statistical averaging
    • federated learning

    Fingerprint

    Dive into the research topics of 'StatAvg: mitigating data heterogeneity in federated learning for intrusion detection systems'. Together they form a unique fingerprint.

    Cite this