TY - GEN
T1 - Self-maintaining [networked] systems
T2 - 3rd ACM Workshop on Hot Topics in Networks, HOTNETS 2024
AU - Hong, Freddie
AU - Sarantopoulos, Iason
AU - Hogg, Elliott
AU - Richardson, David
AU - Zhang, Yizhong
AU - Williams, Hugh
AU - Sweeney, David
AU - Chatzieleftheriou, Andromachi
AU - Rowstron, Antony
PY - 2024/11/18
Y1 - 2024/11/18
N2 - The vision of self-maintaining systems is to make cloud hardware automatically servicing and repairing using robotics. We define a self-maintaining system as one where software can control robotics that can automatically perform hardware maintenance tasks and repair operations. This reduces failure service windows and lowers the risk of repairs causing further cascading failures and outages. Self-maintaining systems are not purely reactive to failures, but also do proactive maintenance before failures occur which reduces future hardware failures. Operating an entire datacenter as a self-maintaining system is many years away, and we present four stages of automation, analogous to levels used for autonomous vehicles, required to reach the full vision for datacenters. To experiment with and learn about self-maintaining systems we have focused on datacenter networking. We have created basic robots that support common network maintenance tasks, such as reseating and cleaning optical transceivers and replacing optical fiber cables. The advantages of self-maintaining networks are lower costs and increased availability and reliability. Key is a cross-layering co-design approach; the core cloud services are co-designed with the robotic systems performing the repairs and maintenance. The services control the robots, and this is very analogous to how Software Defined Networking has evolved for broader network management.
AB - The vision of self-maintaining systems is to make cloud hardware automatically servicing and repairing using robotics. We define a self-maintaining system as one where software can control robotics that can automatically perform hardware maintenance tasks and repair operations. This reduces failure service windows and lowers the risk of repairs causing further cascading failures and outages. Self-maintaining systems are not purely reactive to failures, but also do proactive maintenance before failures occur which reduces future hardware failures. Operating an entire datacenter as a self-maintaining system is many years away, and we present four stages of automation, analogous to levels used for autonomous vehicles, required to reach the full vision for datacenters. To experiment with and learn about self-maintaining systems we have focused on datacenter networking. We have created basic robots that support common network maintenance tasks, such as reseating and cleaning optical transceivers and replacing optical fiber cables. The advantages of self-maintaining networks are lower costs and increased availability and reliability. Key is a cross-layering co-design approach; the core cloud services are co-designed with the robotic systems performing the repairs and maintenance. The services control the robots, and this is very analogous to how Software Defined Networking has evolved for broader network management.
KW - Automation
KW - Networks
KW - Self-repair
U2 - 10.1145/3696348.3696872
DO - 10.1145/3696348.3696872
M3 - Conference contribution
AN - SCOPUS:85215663051
T3 - HOTNETS 2024 - Proceedings of the 2024 3rd ACM Workshop on Hot Topics in Networks
SP - 159
EP - 166
BT - HOTNETS 2024 - Proceedings of the 2024 3rd ACM Workshop on Hot Topics in Networks
PB - Association for Computing Machinery, Inc
CY - New York, U.S.
Y2 - 18 November 2024 through 19 November 2024
ER -