Machine Learning Training: Research Challenges and Opportunities for the Distributed System Community

Speaker
Giovanni Neglia - INRIA, Sophia Antipolis, Francia

Date
Jan 28, 2020 - Time: 11:00 Sala Verde

In this talk, I will support the thesis that the Dystributed System community is not meant to simply apply machine learning (ML)  tools to solve specific problems, but can also contribute to design faster and more efficient distributed ML systems both for training and inference. I will first introduce machine learning training and show that computational speedups directly translate into better ML models. I will then explain why design choices for ML systems are inevitably entangled with optimization and statistical considerations. Finally, I will provide two examples from my recent research activity: dynamic (TCP-like) adaptation of the number of ML workers, and topology design.

Contact Person: D. Carra 
Data pubblicazione
Jan 8, 2020

Department
Computer Science