Production ML architectures (deployed at scale in production) are evolving at a rapid pace. We suggest there have been two generations so far: the first generation were very much fixed function pipelines with predetermined stages, the second generation was pluggable components with a bit more flexibility but still pretty constrained. If history is a guide (especially looking at the evolution of GPU APIs), the third generation is going to come from making the computational power accessible and flexible.
We share our experiences with Ray, a system that makes distributed computing accessible and flexible. We give a two slide introduction to Ray, and show how Rayâ€™s flexibility enables approaches like online reinforcement learning that are not easy to fit in to existing production ML architectures without some serious shoe-horning.
We then outline how different companies (such as Uber, Ant Financial, McKinsey) are using Ray in a way that extends beyond the constraints of existing second generation architectures.