Building MLOps pipeline for intensive ML and AI projects

AIAWScloud-agnosticMLMLOpsMVPPoCSageMakerserverless

Our Clients

Client Background

Data Science team of a global company that helps customers, including global ones, to improve their marketing activities. Data Science team has been created to research new opportunities within Artificial Intelligence and Machine Learning (AI/ML) domain for marketing optimizations.

Business Challenge

The Data Science team has senior resources focused on solving ML/AI challenges in an experimental way and reached promising experimental results. The business pushes the team to deliver “sellable” products rather than experimental ones only. The team faces multiple challenges during the release cycle and fights against the same problems over and over again.

Challenges by names:

  • Manual operations are the primary way of doing anything including data preparation, training, deployments, etc.
  • Quality is a constant fight because the QA process is manual and time-consuming
  • No “real” time-lines because no one can predict the next obstacle to overcome
  • The released version is very ephemeral because no apparent artifacts of the process are defined
  • Process is not stable (repeating steps does not bring repetitively same result)

Value Delivered

Project status:

  • successfully delivered Proof-of-Concept (PoC) phase to demonstrate the potential of MLOps practices
  • Minimum Viable Product (MVP) delivered to automate basic train – evaluate – deploy flow

The overall project timeline:

  1. PoC phase – 3 weeks
  2. MVP phase – 2 months

Value delivered:

  1. PoC phase includes:
    • analysis of existing processes and challenges
    • research and selection of best fitting tools with opportunities for cloud agnostics in mind
    • prototype of the MLOps pipeline to cover training, validation, and deployment for models
  2. MVP phase includes:
    • pipeline for model training, evaluation, and candidate registration
    • approval process for candidate models
    • pipeline for model deployment into production and exposure as HTTP API endpoint
    • training and education
Circuit board and AI micro processor, Artificial intelligence of digital human. 3d render

SUCCESS STORY IN DETAILS

Background

ML and AI-based features become “must have” options for any industry-leading products. A common situation is when the small team is created as an experimental one, quickly shows the results sold to the customer, and then requires the “growth” of functionality. That is where all the problems begin if you do not have a proper Software Development Life-Cycle (SDLC) that can guarantee the repeatability and predictability of the results.

So, the objective of the MLOps pipeline is to bring that “repeatability and predictability” for SDLC and remove manual routine along with cost-reduction due to time-to-market speed improvements.

The article Machine Learning operations maturity model shows the typical path a data science team would pass to build a mature MLOps practices. We have started at Level 0 and targeted Level 3 out of 4.

Solution

Taking into account that the objective was to help to streamline the team’s work, we have proposed to focus on primary flow, common for any data science project: train – validate – deploy. The expectation was to build a template the team can re-use across multiple projects with an ability to adapt it to a specific case, if necessary. Another important concern was to make it cloud-agnostic as much as possible.

The decision was made to use AWS SageMaker as long as the AWS platform has been widely adopted within the client’s projects.

The template has been created to cover the following activities:

  • Git-based machine learning project with proper auto-triggers
  • Model training with proper training and test data sets management
  • Auto model evaluation to ensure proper quality
  • Candidate Model registration with manual Approval for promotion into Production
  • Auto-deployment to the Production environment with corresponding HTTP API endpoint exposure
  • Serverless inference execution to reduce costs during experiments and testing

As a result, the team has received a structured approach for model refinement with predictable timings for basic operations including experiments and testing. The pipeline has well-defined artifacts with corresponding version control and release management. Any model version could be deployed at any moment of time in any number of instances so that advanced experiments and testing are possible (for instance, A/B testing).

Robot's head close up. 3D illustration

Lessons learned

  • MLOps pipelines are complex to build but it pays back for invested efforts
  • Popular cloud providers have out-of-the-box solutions which could be adapted for the specific needs of your project
  • MLOps as a culture can help to stabilize and structure the work of the Data Science team, improve its performance