Microsoft’s Tutel optimizes mixture of experts model training

Microsoft this week announced Tutel, a library to support the development of mixture of experts (MoE) models — a particular type of large-scale AI model. Lower “layers” of the MoE model extract features and experts are called upon to evaluate those features. The experts can receive a mix of data, and when the model is in operation, only a few experts are active — even a huge model needs only a small amount of processing power. Tutel has a “concise” interface intended to make it easy to integrate into other MoE solutions, Microsoft says. We demonstrate an efficient MoE implementation, Tutel, that resulted in significant gain over the fairseq framework.

