Originally published June 13, 2019. We are taking a few weeks off. We’ll be back soon with new episodes.
Machine learning allows software to improve as that software consumes more data.
Machine learning is a tool that every software engineer wants to be able to use. Because machine learning is so broadly applicable, software companies want to make the tools more accessible to the developers across the organization.
There are many steps that an engineer must go through to use machine learning, and each additional step inhibits the chances that the engineer will actually get their model into production.
An engineer who wants to build machine learning into their application needs access to data sets. They need to join those data sets, and load them into a machine (or multiple machines) where their model can be trained. Once the model is trained, the model needs to test on additional data to ensure quality. If the initial model quality is insufficient, the engineer might need to tweak the training parameters.
Once a model is accurate enough, the engineer needs to deploy that model. After deployment, the model might need to be updated with new data later on. If the model is processing sensitive or financially relevant data, a provenance process might be necessary to allow for an audit trail of decisions that have been made by the model.
Rob Story and Kelley Rivoire are engineers working on machine learning infrastructure at Stripe. After recognizing the difficulties that engineers faced in creating and deploying machine learning models, Stripe engineers built out Railyard, an API for machine learning workloads within the company.
Rob and Kelley join the show to discuss data engineering and machine learning at Stripe, and their work on Railyard.