Machine Learning Model Testing and Validation for Production-Level Systems – Nov 2018


Leo Guelman
Chief Statistician and Director, Data Science
Royal Bank of Canada
Nov 19, 2018 5:30 – 6:30 pm
Auditorium 
88 Queens Quay West | Toronto, ON M5J 0B8 | Canada
Overview: Building reliable machine learning (ML) models for use in production-level systems present specific risk factors not commonly addressed in the practitioners’ literature. As ML continues to play a central role in decision making processes, it is critical to evaluate models under several conditions to identify potential defects. In this session, we introduce model testing and validation methods to ensure the production-readiness of a model. In particular, we discuss concepts such as statistical model criticism, performance uncertainty, model staleness, calibration, algorithmic bias, and interpretability. The underlying methods to tackle these issues are essential for the long-term health of ML production systems, and should be seamlessly integrated in data science pipelines.
References cited in the presentation:

1) Feature selection stability

“Variable selection with error control: another look at stability selection”, Shah et al., JRSS (2013)

2) Model decay

“The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction” , Breck et al., Google (2017)

3) Model comparison

“Time for a Change: a Tutorial for Comparing Multiple Classifiers Through Bayesian Analysis”, Benavoli et al., JMLR (2017).

4) Fair Machine Learning

“The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning”, Corbett et al., Stanford Working Paper (2018)

5) Hyper-parameterer tuning

“Practical Bayesian Optimization of Machine Learning Algorithms”, Snoek et al. , NIPS (2012)