Nov 21, 2023
Mike Perham on Keeping it solo (RubyConf 2023)
Mike Perham is the creator of Sidekiq, a background job processor for Ruby. He's also the creator of Faktory a similar product for multiple language environments.
We talk about the RubyConf keynote and Ruby's limitations, supporting products as a solo developer, and some ideas for funding open source like a public utility.
Recorded at RubyConf 2023 in San Diego.
-- A few topics covered:
* Sidekiq (Ruby) vs Faktory (Polyglot)
* Why background job solutions are so common in Ruby
* Global Interpreter Lock (GIL)
* Ractors (Actor concurrency)
* Downsides of Multiprocess applications
* When to use other languages
* Getting people to pay for Sidekiq
* Keeping a solo business
* Being selective about customers
* Ways to keep support needs low
* Open source as a public utility Mike
* Mike's blog
* mastodon
* Sidekiq
* faktory
* From Employment to Independence Ruby
* Ractor
* The Practical Effects of the GVL on Scaling in Ruby Transcript
You can help correct transcripts on GitHub. Introduction
[00:00:00] Jeremy: I'm here at RubyConf San Diego with Mike Perham. He's the creator of Sidekiq and Faktory.
[00:00:07] Mike: Thank you, Jeremy, for having me here. It's a pleasure. Sidekiq
[00:00:11] Jeremy: So for people who aren't familiar with, I guess we'll start with Sidekiq because I think that's what you're most known for. If people don't know what it is, maybe you can give like a small little explanation.
[00:00:22] Mike: Ruby apps generally have two major pieces of infrastructure powering them. You've got your app server, which serves your webpages and the browser. And then you generally have something off on the side that... It processes, you know, data for a million different reasons, and that's generally called a background job framework, and that's what Sidekiq is.
[00:00:41] It, Rails is usually the thing that, that handles your web stuff, and then Sidekiq is the Sidekiq to Rails, so to speak.
[00:00:50] Jeremy: And so this would fit the same role as, I think in Python, there's celery. and then in the Ruby world, I guess there is, uh, Resque is another kind of job.
[00:01:02] Mike: Yeah, background job frameworks are quite prolific in Ruby. the Ruby community's kind of settled on that as the, the standard pattern for application development. So yeah, we've got, a half a dozen to a dozen different, different examples throughout history, but the major ones today are, Sidekiq, Resque, DelayedJob, GoodJob, and, and, and others down the line, yeah. Why background jobs are so common in Ruby
[00:01:25] Jeremy: I think working in other languages, you mentioned how in Ruby, there's this very clear, preference to use these job scheduling systems, these job queuing systems, and I'm not. I'm not sure if that's as true in, say, if somebody's working in Java, or C sharp, or whatnot. And I wonder if there's something specific about Ruby that makes people kind of gravitate towards this as the default thing they would use.
[00:01:52] Mike: That's a good question. What makes Ruby... The one that so needs a background job system. I think Ruby, has historically been very single threaded. And so, every Ruby process can only do so much work. And so Ruby oftentimes does, uh, spin up a lot of different processes, and so having processes that are more focused on one thing is, is, is more standard.
[00:02:24] So you'll have your application server processes, which focus on just serving HTTP responses. And then you have some other sort of focused process and that just became background job processes. but yeah, I haven't really thought of it all that much. But, uh, you know, something like Java, for instance, heavily multi threaded.
[00:02:45] And so, and extremely heavyweight in terms of memory and startup time. So it's much more frequent in Java that you just start up one process and that's it. Right, you just do everything in that one process. And so you may have dozens and dozens of threads, both serving HTTP and doing work on the side too. Um, whereas in Ruby that just kind of naturally, there was a natural split there. Global Interpreter Lock
[00:03:10] Jeremy: So that's actually a really good insight, because... in the keynote at RubyConf, Mats, the creator of Ruby, you know, he mentioned the, how the fact that there is this global, interpreter lock,
[00:03:23] or, or global VM lock in Ruby, and so you can't, really do multiple things in parallel and make use of all the different cores. And so it makes a lot of sense why you would say like, okay, I need to spin up separate processes so that I can actually take advantage of, of my, system.
[00:03:43] Mike: Right. Yeah. And the, um, the GVL. is the acronym we use in the Ruby community, or GIL. Uh, that global lock really kind of is a forcing function for much of the application architecture in Ruby. Ruby, uh, applications because it does limit how much processing a single Ruby process can do. So, uh, even though Sidekiq is heavily multi threaded, you can only have so many threads executing.
[00:04:14] Because they all have to share one core because of that global lock. So unfortunately, that's, that's been, um, one of the limiter, limiting factors to Sidekiq scalability is that, that lock and boy, I would pay a lot of money to just have that lock go away, but. You know, Python is going through a very long term experiment about trying to remove that lock and I'm very curious to see how well that goes because I would love to see Ruby do the same and we'll see what happens in the future, but, it's always frustrating when I come to another RubyConf and I hear another Matt's keynote where he's asked about the GIL and he continues to say, well, the GIL is going to be around, as long as I can tell.
[00:04:57] so it's a little bit frustrating, but. It's, it's just what you have to deal with. Ractors
[00:05:02] Jeremy: I'm not too familiar with them, but they, they did mention during the keynote I think there Ractors or something like that. There, there, there's some way of being able to get around the GIL but there are these constraints on them. And in the context of Sidekiq and, and maybe Ruby in general, how do you feel about those options or those solutions?
[00:05:22] Mike: Yeah, so, I think it was Ruby 3. 2 that introduced this concept of what they call a Ractor, which is like a thread, except it does not have the global lock. It can run independent to the global lock. The problem is, is because it doesn't use the global lock, it has pretty severe constraints on what it can do.
[00:05:47] And the, and more specifically, the data it can access. So, Ruby apps and Rails apps throughout history have traditionally accessed a lot of global data, a lot of class level data, and accessed all this data in a, in a read only fashion. so there's no race conditions because no one's changing any of it, but it's still, lots of threads all accessing the same variables.
[00:06:19] Well, Ractors can't do that at all. The only data Ractors can access is data that they own. And so that is completely foreign to Ruby application, traditional Ruby applications. So essentially, Ractors aren't compatible with the vast majority of existing Ruby code. So I, I, I toyed with the idea of prototyping Sidekiq and Ractors, and within about a minute or two, I just ran into these, these, uh...
[00:06:51] These very severe constraints, and so that's why you don't see a lot of people using Ractors, even still, even though they've been out for a year or two now, you just don't see a lot of people using them, because they're, they're really limited, limited in what they can do. But, on the other hand, they're unlimited in how well they can scale.
[00:07:12] So, we'll see, we'll see. Hopefully in the future, they'll make a lot of improvements and, uh, maybe they'll become more usable over time. Downsides of multiprocess (Memory usage)
[00:07:19] Jeremy: And with the existence of a job queue or job…