In this session we will explore modern techniques and tooling which empower reusability in data and analytics solutions. Creating and leveraging reusable machine-learning code has many similarities with traditional software engineering but is also different in many respects.
We will discuss ways of developing, delivering, assembling and deploying reusable components. We will compare multi-repos with mono-repos, libraries with micro-libraries, components with templates and pipelines, and present tooling which fosters discoverability and collaboration. We will touch on code and data dependency resolution and injection, reusable data assets, data lakes and feature stores. Additionally, we will discuss tooling and MLOps automation which empowers rapid development and continuous integration/delivery. The discussion is going to frequently link back to functional and non-functional requirements like modularity, composability, single source of truth, versioning, performance, isolation and security.
This talk aims to cover tools of choice, processes and design patterns for building and sharing production ready ML components at scale. It will surface learnings and battle-scars after trying to prevent reinvention of the wheel in one of the largest consultancies with 2000+ analytics practitioners.