Judging a Bug by Its Title
Most people know the age-old adage, “Don’t judge a book by its cover.” I can still see my grandmother wagging her finger at me when I was younger as she said it. But what if it's not the book cover we’re judging, but the title? And what if it’s not a book we’re analyzing, but instead a security bug? The times have changed, and age-old adages don’t always translate well in the digital landscape. In this case, we’re using machine learning (ML) to identify and “judge” security bugs based solely on their titles. And, believe it or not, it works! (Sorry, Grandma!)
Mayana Pereira, Data Scientist at Microsoft, joins hosts Nic Fillingham and Natalia Godyla to dig into the endeavors that are saving security experts’ time. Mayana explains how data science and security teams have come together to explore ways that ML can help software developers identify and classify security bugs more efficiently. A task that, without machine learning, has traditionally provided false positives or led developers to overlook misclassified critical security vulnerabilities.
In This Episode, You Will Learn:
• How data science and ML can improve security protocols and identify and classify bugs for software developers
• How to determine the appropriate amount of data needed to create an accurate ML training model
• The techniques used to classify bugs based simply on their title
Some Questions We Ask:
• What questions need to be asked in order to obtain the right data to train a security model?
• How does Microsoft utilize the outputs of these data-driven security models?
• What is AI for Good and how is it using AI to foster positive change in protecting children, data and privacy online?
Microsoft Digital Defense Report
Article: “Identifying Security Bug Reports Based Solely on Report Titles and Noisy Data”
Microsoft Security Blog:
(Full transcript can be found at https://aka.ms/SecurityUnlockedEp16)
Hello, and welcome to Security Unlocked, a new podcast from Microsoft where we unlock insights from the latest in news and research from across Microsoft Security engineering and operations teams. I'm Nic Fillingham-
And I'm Natalia Godyla. In each episode we'll discuss the latest stories from Microsoft Security, deep dive into the newest threat, intel, research and data science-
And profile some of the fascinating people working on artificial intelligence in Microsoft Security.
And now let's unlock the pod.
Hello, Nic. How's it going?
Hello, Natalia. Welcome back. Well, I guess welcome back to Boston to you. But welcome to Episode 16. I'm confused because I saw you in person last week for the first time. Well, technically it was the first time for you, 'cause you didn't remember our first time. It was the second time for me. But it was-
I feel like I just need to justify myself a little bit there. It was a 10 second exchange, so I feel like it's fair that I, I was new to Microsoft. There was a lot coming at me, so, uh-
Uh, I'm not very memorable, too, so that's the other, that's the other part, which is fine. But yeah. You were, you were here in Seattle. We both did COVID tests because we filmed... Can I say? You, you tell us. What did we do? It's a secret. It is announced? What's the deal?
All right. Well, it, it's sort of a secret, but everyone who's listening to our podcast gets to be in the know. So in, in March you and I will be launching a new series, and it's a, a video series in which we talk to industry experts. But really we're, we're hanging with the industry experts. So they get to tell us a ton of really cool things about [Sec Ups 00:01:42] and AppSec while we all play games together. So lots of puzzling. Really, we're just, we're just getting paid to do puzzles with people cooler than us.
Speaking of hanging out with cool people, on the podcast today we have Mayana Pereira whose name you may have heard from a few episodes ago Scott Christiansen was on talking about the work that he does. And he had partnered Mayana to build and launch a, uh, machine learning model that looked at the titles of bugs across Microsoft's various code repositories, and using machine learning determined whether those bugs were actually security related or not, and if they were, what the correct severity rating should be.
So this episode we thought we'd experiment with the format. And instead of having two guests, instead of having a, a deep dive upfront and then a, a profile on someone in the back off, we thought we would just have one guest. We'd give them a little bit extra time, uh, about 30 minutes and allow them to sort of really unpack the particular problem or, or challenge that they're working on. So, yeah. We, we hope you like this experiment.
And as always, we are open to feedback on the new format, so tweet us, uh, @msftsecurity or send us an email email@example.com. Let us know what you wanna hear more of, whether you like hearing just one guest. We are super open. And with that, on with the pod?
On with the pod.
Welcome to the Security Unlocked podcast. Mayana Pereira, thanks for joining us.
Thank you for having me. I'm so happy to be here today, and I'm very excited to share some of the things that I have done in the intersection of [ML 00:03:27] and security.
Wonderful. Well, listeners of the podcast will have heard your name back in Episode 13 when we talked to Scott Christiansen, and he talked about, um, a fascinating project about looking for or, uh, utilizing machine learning to classify bugs based simply on, on their title, and we'll get to that in a minute. But could you please introduce you- yourself to our audience. Tell us about your title, but sort of what does that look like in terms of day-to-day and, and, and the work that you do for Microsoft?
I'm a data scientist at Microsoft. I've been, I have been working at Microsoft for two years and a half now. And I've always worked inside Microsoft with machine learning applied to security, trust, safety, and I also do some work in the data privacy world. And this area of ML applications to the security world has always been my passion, so before Microsoft I was also working with ML applied to cyber security more in the malware world, but still security. And since I joined Microsoft, I've been working on data science projects that kinda look like this project that we're gonna, um, talk today about. So those are machine learning applications to interesting problems where we can either increase the trust and the security Microsoft products, or the safety for the customer. You know, you would develop m- machine learning models with that in mind.
And my day-to-day work includes trying to understand which are those interesting programs across the company, talk to my amazing colleagues such as Scott. And I have a, I have been so blessed with an amazing great team around me. And thinking about these problems, gathering data, and then getting, you know, heads down and training models, and testing new machine learning techniques that have never been used for a specific applications, and trying to unde…