Aug 17, 2022
Randy Shoup on Evolving Architecture at eBay
This episode originally aired on Software Engineering Radio.
Randy Shoup is the VP of Engineering and Chief Architect at eBay. He was previously the VP of Engineering at WeWork and Stitch Fix, a Director of Engineering at Google Cloud where he worked on App Engine, and a Chief Engineer and Distinguished Architect at eBay in 2004.
Topics covered:
* eBay’s origins as a single C++ class
* The five-year migration to Java services
* Sharing a database between the old and new systems
* Building a distributed tracing system
* Working with bare metal
* Why most companies should stick to cloud
* Why individual services should own their own data storage
* How scale has caused solutions to change
* Rejoining a former company
* The Accelerate Book
* Improving delivery time.
Related Links:
@randyshoup
OpenTelemetry
LightStep
Honeycomb
Accelerate Book
The Memo
Value Stream Mapping
The Epic Story of Dropbox’s Exodus from the Amazon Cloud Empire
Transcript:
[00:00:00] Jeremy: Today, I'm talking to Randy Shoup, he's the VP of engineering and chief architect at eBay.
[00:00:05] Jeremy: He was previously the VP of engineering at WeWork and stitch fix, and he was also a chief engineer and distinguished architect at eBay back in 2004. Randy, welcome back to software engineering radio. This will be your fifth appearance on the show. I'm pretty sure that's a record.
[00:00:22] Randy: Thanks, Jeremy, I'm really excited to come back. I always enjoy listening to, and then also contributing to software engineering radio.
Back at, Qcon 2007, you spoke with Markus Volter he's he was the founder of SE radio. And you were talking about developing eBay's new search engine at the time.
[00:00:42] Jeremy: And kind of looking back, I wonder if you could talk a little bit about how eBay was structured back then, maybe organizationally, and then we can talk a little bit about the, the tech stack and that sort of thing.
[00:00:53] Randy: Oh, sure. Okay. Yeah. Um, so eBay started in 1995. I just want to like, you know, orient everybody. Same, same as the web. Same as Amazon, same as a bunch of stuff. So E-bay was actually almost 10 years old when I joined. That seemingly very old first time. Um, so yeah. What was ebay's tech stack like then? So E-bay current has gone through five generations of its infrastructure.
It was transitioning between the second and the third when I joined in 2004. Um, so the. Iteration was Pierre Omidyar, the founder three-day weekend three-day labor day weekend in 1995, playing around with this new cool thing called the web. He wasn't intending to build a business. He just was playing around with auctions and wanted to put up a webpage.
So he had a Perl backend and every item was a file and it lived on this little 486 tower or whatever you had at the time. Um, so that wasn't scalable and wasn't meant to be. The second generation of eBay's architecture was what we called V2 very, you know, creatively, uh, that was a C++ monolith. Um, an ISAPI DLL with essentially well at its worst, which grew to 3.4 million lines of code in that single DLL and basically in a single class, not just in a single, like repo or a single file, but in a single class.
So that was very unpleasant to work in. As you can imagine, um, eBay had about a thousand engineers at the time and they were, you know, as you can imagine, like really stepping on each other's toes and not being able to make much forward progress. So starting in, I want to call it 2002. So two years before I joined, um, they were migrating to the creatively named V3 and V3 architecture was Java, and.
you know, not microservices, but like we didn't even have that term, but it wasn't even that it was mini applications. So I'm actually going to take a step back. V2 was a monolith. So like all of eBay's code in that single DLL and like that was buying and selling and search and everything. And then we had two monster databases, a primary and a backup big Oracle machines on some hardware that was bigger, you know, bigger than refrigerators and that ran eBay for a bunch of years, before we changed the upper part of the stack, we, um, chopped up the, that single monolithic database into a bunch of, um, domain specific databases or entity specific databases, right?
So a set of databases around users, you know, sharded by the user ID could talk about all that. If you want, you know, items again, sharded by item ID transactions, sharded by transaction ID... I think when I joined, it was the several hundred instances of, uh, Oracle databases, um, you know, spread around, but still that monolithic front end.
And then in 2002, I wanna say we started migrating into that V3 that I was saying, okay. So that's, uh, that was a rewrite in Java, again, many applications. So you take the front end and instead of having it be in one big unit, it was this, uh, ER file, EAR, file, if run and people remember back to, you know, those stays in Java, um, you know, 220 different of those.
So like here is the, you know, one of them for the search pages, you know, so the, you know, one application be the search application and it would, you know, do all the search related stuff, the handful of pages around search, uh, ditto for, you know, the buying area, ditto for the, you know, checkout area, ditto for the selling area...
220 of those. Um, and that was again, domain, um, vertically sliced domains. And then the relationship between those V3, uh, applications and the databases was a many to many things. So like many applicants, many of those applications interact with items. So they would interact with those item databases. Many of them would interact with users.
And so they would interact with a user databases, et cetera, uh, happy to go into as much gory detail as you want about all that. But like, that's what, uh, but we were in the transition period. You know, when I, uh, between the V2 monolith to the V3 mini applications in, uh, 2004, I'm just going to pause there and like, let me know where you want to take it.
[00:05:01] Jeremy: Yeah. So you were saying that it was, um, it started as Perl, then it became a C++, and that's kind of interesting that you said it was all in one class, right? So it's wow. That's gotta be a gigantic
[00:05:16] Randy: I mean, completely brutal. Yeah. 3.4 million lines of code. Yeah. We were hitting compiler limits on the number of methods per class.
[00:05:22] Jeremy: Oh my gosh.
[00:05:23] Randy: I'm, uh, uh, scared that I have that. I happen to know that at least at the time, uh, Microsoft allowed you 16 K uh, methods per class, and we were hitting that limit.
So, uh, not great.
[00:05:36] Jeremy: So it's just kind of interesting to think about how do you walk through that code, right? You have, I guess you just have this giant file.
[00:05:45] Randy: Yeah. I mean, there were, you know, different methods. Um, but yeah, it was a big man. I mean, it was a monolith, it was, uh, you know, it was a spaghetti mess. Um, and you know, as you can imagine, Amazon went through a really similar thing by the way. So this wasn't soup. I mean, it was bad, but like we weren't the only people that were making that, making that a mistake.
Um, and just like Amazon, where they were, uh, they did like one update a quarter (laughs) , you know, at that period, like 2000, uh, we were doing something really similar, like very, very slow. Um, you know, updates and, uh, when we moved to V3, you know, the idea was to get to do changes much faster. And we were very proud of ourselves starting in 2004 that we, uh, upgraded the whole site every two weeks.
And we didn't have to do the whole site, but like each of those individual applications that I was mentioning, right. Those 220 applications, each of those would roll out on this biweekly cadence. Um, and they had interdependencies. And so we rolled them out in this dependency order in any way, lots of, lots of complexity associated w…