From

Time

Data Science, Thesis

Reinforcement Learning and Federated Learning-based Multi-Band Assignment for IoT Short Packet Communications

Thèse co-dirigée par Lila BOUKHATEM, Professeur, Université Paris-Saclay, LISN, et Megumi KANEKO, Professor, National Institute of Informatics / The University of Tokyo, Japan

Speaker : Hugo De Oliveira

Jury members

  • André-Luc BEYLOT, Professor, Toulouse INP, Reviewer & Examiner
  • Kandaraj PIAMRAT, Maitre de Conferences, Nantes Université, Reviewer & Examiner
  • Nadjib AIT-SAADI, Professor, Université Paris-Saclay, Examiner
  • Fatiha ZAIDI, Professor, Université Paris-Saclay, Examiner
  • Steven MARTIN, Professor, Université Paris-Saclay, Examiner
  • Yusheng JI, Professor, National Institute of Informatics, Examiner
  • Kensuke FUKUDA, Professor, National Institute of Informatics, Examiner
  • Takashi KURIMOTO, Professor, National Institute of Informatics, Examiner

Abstract

5G services are classified into Ultra-Reliable Low-Latency Communications (URLLC), enhanced Mobile Broadband communications (eMBB), and massive Machine Type Communications (mMTC). Beyond 5G (B5G) and 6G communications are expected to cover extreme demands, jointly in terms of rate, delay, reliability. Emergent applications such as Extended Reality (XR) or autonomous driving are expected to require Terabits data rate, extreme reliability and/or delay. Moreover, given the tremendous amount of wireless devices, and especially IoT devices, meeting the requirements of future networks appears to be extremely challenging. Many recent studies have focused on the use of the high millimeter Wave (mmWave) frequencies due to their large bandwidth. However, mmWaves suffer of a severe sensitivity to obstacles and a high path loss, restricting their applicability. To overcome these frailties, many works proposed to jointly use the conventional Sub-6GHz frequencies and the mmWave. Most IoT applications, for example in industrial control, can be characterized by their generation of small packets (Short-Packet Communications (SPC)). For such traffic, Shannon’s capacity theorem becomes inapplicable due to the size and nature of SPC. To better model the achievable rates, we adopt the Finite Blocklength theory, which accounts for packet size and reliability constraints. This thesis investigates the problem of band association and radio resource allocation of IoT devices to maximize the system sum-rate under heterogeneous Quality of Service (QoS) demands. We first propose a unifying framework devoted to SPC that jointly optimizes the user partitioning, and the radio resource scheduling within each band. Leveraging centralized Deep Reinforcement Learning (DRL), the proposed method enables to better tackle the challenges imposed by dynamically varying mobile environments. Regarding the DRL-based user partitioning, we have investigated three different types of actions to obtain a high network performance and rapid convergence. Regarding the resource allocation, we designed two optimization methods, one that leverages Difference of Convex Programming (DCP), and the second that accelerates convergence. Numerical evaluations show that the proposed methods outperform conventional approaches. While our centralized framework efficiently optimizes users’ band associations, centralization entails important communication costs and increasing complexity. Recently, Multi-Agent DRL (MADRL) approaches attracted attention, as solving the problem at the edge avoids communication burden. However, efficient solutions must be provided to allow efficient coordination of distributed devices. To enhance coordination while avoiding heavy communication costs, Federated MADRL (F-MADRL) has been studied where each device trains its model locally while a periodic model aggregation done at a central node improves cooperation. Since each IoT device faces a unique mobile environment, personalized F-MADRL was introduced to improve local decision making. We propose two personalized F-MADRL methods that enable each user to adapt their model to the mobile environment, thereby improving the overall network performance and reducing signaling costs. Simulations show that our methods outperform benchmark MADRL schemes. Finally, we focus our work on denser and more dynamic network scenarios. We evaluate a third personalization strategy to improve robustness in heterogeneous environments. To reduce the complexity of the DCP-based resource allocation, we explore low-complexity algorithm, as the mathematical optimization we used until now requires a computationally intensive process. Results show promising performance of our proposed solution in denser networks, outperforming conventional algorithm in terms of outage probabilities or sum-rate. Finally, we evaluate robustness under user mobility and blockage, showing the fast adaptivity of our personalized F-MADRL framework.

Publications

  • Article dans une revue

    Hugo de Oliveira, Megumi Kaneko, Lila Boukhatem, Ellen Hidemi Fukuda. Deep Reinforcement Learning-Aided Optimization of Multi-Interface Allocation for Short-Packet Communications. IEEE Transactions on Cognitive Communications and Networking, 2023, 9 (3), pp.738 – 753. ⟨10.1109/TCCN.2023.3252661⟩. ⟨hal-03944071⟩

    ROCS

    Year of publication

Contact