I’m leaving Redis for SolidQueue

(simplethread.com)

172 points | by amalinovic 5 hours ago

20 comments

  • jrochkind1 0 minutes ago
    [delayed]
  • ivolimmen 15 minutes ago
    Exactly what https://www.amazingcto.com/postgres-for-everything/ says; keep it simpel and use PostgreSQL.
    • antisthenes 4 minutes ago
      Isn't Redis just a lot less relevant these days since enterprise NVME storage is so ridiculously fast?

      How much latency could you really be saving versus introducing complexity?

      But I am not a storage/backend engineer, so maybe I don't understand the target use of Redis.

  • jacob-s-son 4 hours ago
    Every author of the free software obviously has rights to full control of the scope of their project.

    That being said, I regret that we have switched from good_job (https://github.com/bensheldon/good_job). The thing is - Basecamp is a MySQL shop and their policy is not to accept RDMS engine specific queries. You can see in their issues in Github that they try to stick "universal" SQL and are personally mostly concerned how it performs in MySQL(https://github.com/rails/solid_queue/issues/567#issuecomment... , https://github.com/rails/solid_queue/issues/508#issuecomment...). They also still have no support for batch jobs: https://github.com/rails/solid_queue/pull/142 .

    • chasd00 28 minutes ago
      If you’re tied so tight to MySQL that you’re labeled a “MySQL shop” then it seems logical to use MySQL specific features. I must be missing something.
    • downsplat 1 hour ago
      That sounds like the worst of possible worlds! At $WORK we're also on mysql, but I don't know what I would do without engine-specific queries. For one, on complex JOINs, mysql sometimes gets the query plan spectacularly wrong, and even if it doesn't now, you can't be sure it won't in the future. So for many important queries I put the tables in the intended order and add a STRAIGHT_JOIN to future-proof it and skip query planner complexity.
    • robertlagrant 45 minutes ago
      > their policy is not to accept RDMS engine specific queries

      Why? Is it so they can switch in future?

      • cl0ckt0wer 43 minutes ago
        Then they don't have to troubleshoot advanced queries.
    • brightball 1 hour ago
      Agreed. good_job is the ideal approach to a PG backed queue.
  • antirez 4 hours ago
    Every time some production environment can be simplified, it is good news in my opinion. The ideal situation with Rails would be if there is a simple way to switch back to Redis, so that you can start simple, and as soon as you hit some fundamental issue with using SolidQueue (mostly scalability, I guess, in environments where the queue is truly stressed -- and you don't want to have a Postgres scalability problem because of your queue), you have a simple upgrade path. But I bet a lot of Rails apps don't have high volumes, and having to maintain two systems can be just more complexity.
    • byroot 1 minute ago
      > The ideal situation with Rails would be if there is a simple way to switch back to Redis

      That's largely the case.

      Rails provide an abstracted API for jobs (Active Job). Of course some application do depend on queue implementation specific features, but for the general case, you just need to update your config to switch over (and of course handle draining the old queue).

    • watercolorblind 2 hours ago
      The primary pain point I see here is if devs lean into transactions such that their job is only created together with the everything else that happened.

      Losing that guarantee can make the eventual migration harder, even if that migration is to a different postgres instance than the primary db.

    • yawboakye 4 hours ago
      the problem i see here is that we end up treating the background job/task processor as part of the production system (e.g. the server that responds to requests, in the case of a web application) instead of a separate standalone thing. rails doesn’t make this distinction clear enough. it’s okay to back your tasks processor with a pg database (e.g. river[0]) but, as you indirectly pointed out, it shouldn’t be the same as the production database. this is why redis was preferred anyways: it was a lightweight database for the task processor to store state, etc. there’s still great arguments in favor of this setup. from what i’ve seen so far, solidqueue doesn’t make this separation.

      [0]: https://riverqueue.com/

      • erispoe 46 minutes ago
        > it shouldn’t be the same as the production database

        Why is that?

        • zarzavat 20 minutes ago
          If you need to restore the production database do you also want to restore the task database?

          If your task is to send an email, do you want to send it again? Probably not.

          • stavros 11 minutes ago
            It's not like I'll get a choice between the task database going down and not going down. If my task database goes down, I'm either losing jobs or duplicating jobs, and I have to pick which one I want. Whether the downtime is at the same time as the production database or not is irrelevant.

            In fact, I'd rather it did happen at the same time as production, so I don't have to reconcile a bunch of data on top of the tasks.

      • andrewstuart 2 hours ago
        It’s not necessary to separate queue db from application db.
        • yawboakye 2 hours ago
          got it. is it necessary, then, to couple queue db with app db? if answer is no then we can’t make a necessity argument here, unfortunately.
          • nick__m 1 hour ago
            Frequently you have to couple the transactional state of the queue db and the app db, colocating them is the simplest way to achieve that without resorting to distributed transactions or patterns that involve orchestrated compensation actions.
  • victorbjorklund 4 hours ago
    For people that does not think it scales. A similar implementation in Elixir is Oban. Their benchmark shows a million jobs per minute on a single node (and I am sure it could be increased further with more optimizations). I bet 99,99999% of apps have less than a million background jobs per minute.

    https://oban.pro/articles/one-million-jobs-a-minute-with-oba...

    • parthdesai 3 minutes ago
      Funny you mention Oban, we do use it at work as well, and first thing Oban tells you is to either use Redis as a notifier or resort to polling for jobs and just not notify.

      https://hexdocs.pm/oban/scaling.html

    • formerly_proven 2 hours ago
      This benchmark is probably as far removed from how applications use task queues as it could possibly be. The headline is "1 million jobs per minute", which is true. However...

      - this is achieved by queuing batches of 5000 jobs, so on the queue side this is actually not 1 million TPS, but rather 200 TPS. I've never seen any significant batching of background job creation.

      - the dispatch is also batched to a few hundred TPS (5ms ... 2ms).

      - acknowledgements are also batched.

      So instead of the ~50-100k TPS that you would expect to get to 17k jobs/sec, this is probably performing just a few hundred transactions per second on the SQL side. Correspondingly, if you don't batch everything (job submission, acking; dispatch is reasonable), throughput likely drops to that level, which is much more in line with expectations.

      Semantically this benchmark is much closer to queuing and running 200 invocations of a "for i in range(5000)" loop in under a minute, which most would expect virtually any DB to handle (even SQLite).

      • cess11 7 minutes ago
        The 5k batching is done when inserting the jobs into the database. It's not like they exert some special control over the performance of the database engine, and this isn't what they're trying to measure in the article.

        They spend some time explaining how to tune the job runners to double the 17k jobs/s. The article is kind of old, Elixir 1.14 was a while ago, and it is basically a write-up on how they managed a bit of performance increase by using new features of this language version.

      • uep 2 hours ago
        This isn't my area, but wouldn't this still be quite effective if it automatically grouped and batched those jobs for you? At low throughput levels, it doesn't need giant batches, and could just timeout after a very short time, and submit smaller batches. At high throughput, they would be full batches. Either way, it seems like this would still serve the purpose, wouldn't it?
  • speleding 1 hour ago
    We've been storing jobs in the DB long before SolidQueue appeared. One major advantage is that we can snapshot the state of the system (or one customer account) to our dev environment and get to see it exactly as it is in production.

    We still keep rate limiters in Redis though, it would be pretty easy for some scanner to overload the DB if every rogue request would need a round trip to the DB before being processed. Because we only store ephemeral data in Redis it does not need backups.

  • KolmogorovComp 3 hours ago
    > Job latency under 1ms is critical to your business. This is a real and pressing concern for real-time bidding, high frequency trading (HFT), and other applications in the same ilk.

    From TFA. Are there really people using Rails for HFT?

  • rajaravivarma_r 3 hours ago
    The one use case where a DB backed queue will fail for sure is when the payload is large. For example, you queue a large JSON payload to be picked up by a worker and process it, then the DB writing overhead itself makes a background worker useless.

    I've benchmarked Redis (Sidekiq), Postgres (using GoodJob) and SQLite (SolidQueue), Redis beats everything else for the above usecase.

    SolidQueue backed by SQLite may be good when you are just passing around primary keys. I still wonder if you can have a lot of workers polling from the same database and update the queue with the job status. I've done something similar in the past using SQLite for some personal work and it is easy to hit the wall even with 10 or so workers.

    • Manfred 3 hours ago
      In my experience you want job parameters to be one, maybe two ids. Do you have a real world example where that is not the case?
      • embedding-shape 3 hours ago
        I'm guessing you're with that adding indirection for what you're actually processing, in that case? So I guess the counter-case would be when you don't want/need that indirection.

        If I understand what you're saying, is that you'll instead of doing:

        - Create job with payload (maybe big) > Put in queue > Let worker take from queue > Done

        You're suggesting:

        - Create job with ID of payload (stored elsewhere) > Put in queue > Let worker take from queue, then resolve ID to the data needed for processing > Done

        Is that more or less what you mean? I can definitively see use cases for both, heavily depends on the situation, but more indirection isn't always better, nor isn't big payloads always OK.

        • azuanrb 2 hours ago
          If we take webhook for example.

          - Persist payload in db > Queue with id > Process via worker.

          Push the payload directly to queue can be tricky. Any queue system usually will have limits on the payload size, for good reasons. Plus if you already commit to db, you can guarantee the data is not lost and can be process again however you want later. But if your queue is having issue, or it failed to queue, you might lost it forever.

    • touisteur 2 hours ago
      Interesting, as a self-contained minimalistic setup.

      Shouldn't one be using a storage system such as S3/garage with ephemeral settings and/or clean-up triggers after job-end ? I get the appeal of using one-system-for-everything but won't you need a storage system anyway for other parts of your system ?

      Have you written up somewhere about your benchmarks and where the cutoffs are (payload size / throughput / latency) ?

    • michaelbuckbee 2 hours ago
      FWIW, Sidekiq docs strongly suggest only passing around primary keys or identifiers for jobs.
    • zihotki 3 hours ago
      Using Redis to store large queue payloads is usually a bad practice. Redis memory is finite.
      • dzonga 56 minutes ago
        this!! 100%.

        pass around ID's

    • ddorian43 2 hours ago
      > Redis beats everything else for the above usecase.

      Reminds me of Antirez blog post that when Redis is configured for durability it becomes like/slower than postgresql http://oldblog.antirez.com/post/redis-persistence-demystifie...

      • epolanski 2 hours ago
        There's been 6 major releases and countless improvements on Redis since then, I don't think we can say whether it's still relevant.

        Also, Antirez has always been very opinionated on not comparing or benchmarking Redis against other dbs for a decade.

  • dependency_2x 4 hours ago
    Postgres will eat the world
    • this_user 15 minutes ago
      At least until people - in a couple of years - figure out that the "Postgres for everything" fad was just as much of a bad idea as "MongoDB for everything" and "Put Redis into everything".
    • loafoe 4 hours ago
      Postgres will eat the world indeed. I'm just waiting for the pg_kernel extension so I can finally uninstall Linux :)
    • dzonga 55 minutes ago
      however mysql is easier to deal with - I say this as Postgres guy

      mysql less maintenance + more performant

    • cies 2 hours ago
      I use PGQM and PG_CRON now... Not looking back.

      The MySQL + Redis + AWS' elasti-cron (or whatever) was a ghetto compared to Postgres.

      • saberd 1 hour ago
        We use pgmq with the pgmq-go client, and it has clients in many different languages, it's amazing. The queues persist on disk and visualizations of queues can easily be made with grafana or just pure sql requests. The fact that the queues lives in the same database as all the other data is also a huge benefit if the 5-15ms time penalty is not an issue.
    • pjmlp 4 hours ago
      RDMS will eat the world.

      Turns out it is a matter of feature set.

      • yawboakye 4 hours ago
        schema migrations will save our careers! \o/
  • patwolf 1 hour ago
    I've been looking at DBOS for queuing and other scheduling tasks in a nodejs app. However, it only works with Postgres, and that means I can't use it in web or mobile with sqlite. I like that SolidQueue works with multiple databases. Too bad it needs rails.
  • downsplat 1 hour ago
    Not a ruby shop here so it's not directly comparable, but I'm very happy with beanstalkd as a minimalistic job queue. We're on mysql for historical reasons, and it didn't support SKIP LOCKED at the time, so we had to add another tool.
  • ashniu123 4 hours ago
    For Node.js, my startup used to use [Graphile Worker](https://github.com/graphile/worker) which utilised the same "SKIP LOCKED" mechanism under the hood.

    We ran into some serious issues in high throughput scenarios (~2k jobs/min currently, and ~5k job/min during peak hours) and switched to Redis+BullMQ and have never looked back ever since. Our bottleneck was Postgres performance.

    I wonder if SolidQueue runs into similar issues during high load, high throughput scenarios...

    • dns_snek 2 hours ago
      Facing issues with 83 jobs per second (5k/min) sounds like an extreme misconfiguration. That's not high throughput at all and it shouldn't create any appreciable load on any database.
      • cle 1 hour ago
        This comes up every time this conversation occurs.

        Yes, PG can theoretically handle just about anything with the right configuration, schema, architecture, etc.

        Finding that right configuration is not trivial. Even dedicated frameworks like Graphile struggle with it.

        My startup had the exact same struggles with PG and did the same migration to BullMQ bc we were sick of fiddling with it instead of solving business problems. We are very glad we migrated off of PG for our work queues.

  • jjgreen 4 hours ago
    Nice article, I'm just productionising a Rails 8 app and was wondering whether it was worth switching from SolidQueue (which has given me no stress in dev) to Redis ... maybe not.
    • michaelbuckbee 2 hours ago
      Unless you hit a performance wall with Postgres or absolutely need Batch capability you've probably got a very large runway with SolidQueue.
  • madethemcry 1 hour ago
    DHH also famously describe why and how they are leaving the cloud https://world.hey.com/dhh/why-we-re-leaving-the-cloud-654b47...

    I'm not a fan boy of DHH but I really like his critical thinking about the status quo. I'm not able to leave the cloud or I better phrase it as it's too comfortable right now. I really wanted to leave redis behind me as it's mostly a hidden part of Rails nothing I use directly but often I have to pay for it "in the cloud"

    I quickly hit an issue with the family of Solid features: Documentation doesn't really cover the case "inside your existing application" (at least when I looked into it shortly after Rails 8 was released). Being in the cloud (render.com, fly.io and friends) I had to create multiple DBs, one for each Solid feature. That was not acceptable as you usually pay per service/DB not per usage - similar how you have to pay for Redis.

    This was a great motivation to research the cloud space once again and then I found Railway. You pay per usage. So I've right now multiple DBs, one for each Solid feature. And on top multiple environments multiplying those DBs and I pay like cents for that part of the app while it's not really filled. Of course in this setup I would also pay cents for Redis but it's still good to see a less complex landscape in my deployment environment.

    Long story short, while try to integrate SolidQueue myself I found Railway. Deployment are fun again with that! Maybe that helps someone today as well.

  • reena_signalhq 4 hours ago
    Interesting migration story! I've been using Redis for background jobs for years and it's been solid, but the operational overhead is real.

    Curious about your experience with SolidQueue's reliability - have you run into any edge cases or issues with job retries/failures? Redis has been battle-tested for so long that switching always feels risky.

    Would love to hear more about your production experience after a few months!

    • withinboredom 3 hours ago
      Email is in my profile. I’m currently building something in this space and I’m looking for early adopters. Reach out, I’d love to show you what we have!
  • ckbkr10 3 hours ago
    Comparing Redis to SQL is kinda off topic. Sure you can replace the one with the other but then we are talking about completely different concepts aren't we?

    When all we are talking about is "good enough" the bar is set at a whole different level.

    • michaelbuckbee 2 hours ago
      I wrote this article about migrating from Redis to SQLite for a particular scenario and the tradeoffs involved.

      To be clear, I think the most important thing is understanding the performance characteristics of each technology enough that you can make good choices for your particular scenario.

      https://wafris.org/blog/rearchitecting-for-sqlite

    • zihotki 2 hours ago
      We're talking about business challenges/features which can be solved by using either of the solutions and analyzing pros/cons. It's not like Redis is bad, but sometimes it's an over-engineered solution and too costly
    • croes 3 hours ago
      Maybe Redis is just overkill
      • touisteur 2 hours ago
        I wish you'd have expanded on that. I almost always learn about some interesting lower-level tech through people trying to avoid a full-featured heavy-for-their-use-case tool or system.
    • hahahahhaah 2 hours ago
      Well they move from one thing not designed for queues to another not designed for queues. Maybe use a queue!
  • EugeneOZ 2 hours ago
    Chapter "The True Cost of Redis" surprised me.

    > Deploy, version, patch, and monitor the server software

    And with PostgreSQL you don't need it?

    > Configure a persistence strategy. Do you choose RDB snapshots, AOF logs, or both?

    It's a one-time decision. You don't need to do it daily.

    > Sustain network connectivity, including firewall rules, between Rails and Redis

    And for a PostgreSQL DB you don't need it?

    > Authenticate your Redis clients

    And your PostgreSQL works without that?

    > Build and care for a high availability (HA) Redis cluster

    If you want a cluster of PostgreSQL databases, perhaps you will do that too.

    • downsplat 1 hour ago
      I guess the point is that you're already doing it for postgres. You alrrady need persistent storage for your app, and the same engine can handle your queuing needs.
      • heartbreak 1 hour ago
        Exactly, if you’re already doing it for Postgres and Postgres can do the job well enough to meet your requirements, you’re only adding more cost and complexity by deploying Redis too.
  • skywhopper 1 hour ago
    Redis is fundamentally the wrong storage system for a job queue when you have an RDBMS handy. This is not new information. You still might want to split the job queue onto its own DB server when things start getting busy, though.

    For caching, though, I wouldn’t drop Redis so fast. As a in-memory cache, the ops overhead of running Redis is a lot lower. You can even ignore HA for most use cases.

    Source: I helped design and run a multi-tiered Redis caching architecture for a Rails-based SaaS serving millions of daily users, coordinating shared data across hundreds of database clusters and thousands of app servers across a dozen AWS regions, with separate per-host, per-cluster, per-region, and global cache layers.

    We used Postgres for the job queues, though. Entirely separate from the primary app DBs.

  • steviee 3 hours ago
    Wearing my Ruby T-Shirt (ok, Rubyconf.TH, but you get the gist) while reading this makes me fully approving and appreciating your post! It totally resonates with my current project setups and my trying to get them as simple as possible.

    Especially when building new and unproven applications I'm always looking for things that trade the time I need to set tings up properly with he time I need to BUILD THE ACTUAL PRODUCT. Therefore I really like the recent changes to the Ruby on Rails ecosystem very much.

    What we need is a larger user base setting everything up and discovering edge-cases and (!) writing about it (AND notifying the people around Rails). The more experience and knowledge there is, the better the tooling becomes. The happy path needs to become as broad as a road!

    Like Kamal, at first only used by 36signals and now used by them and me. :D At least, of course.

    Kudos!

    Best, Steviee