Machine Learning Training: Research Challenges and Opportunities for the Distributed System Community

Relatore
Giovanni Neglia - INRIA, Sophia Antipolis, Francia

Data
28-gen-2020 - Ora: 11:00 Sala Verde

In this talk, I will support the thesis that the Dystributed System community is not meant to simply apply machine learning (ML)  tools to solve specific problems, but can also contribute to design faster and more efficient distributed ML systems both for training and inference. I will first introduce machine learning training and show that computational speedups directly translate into better ML models. I will then explain why design choices for ML systems are inevitably entangled with optimization and statistical considerations. Finally, I will provide two examples from my recent research activity: dynamic (TCP-like) adaptation of the number of ML workers, and topology design.

Contact Person: D. Carra 
Data pubblicazione
8-gen-2020

Dipartimento
Informatica