Spring is in the air (and, I hope, not in your Gemfile). Nature awakens, as does the world of tech: CFPs are blooming, and conferences are being scheduled. We'll reap the benefits of these transformations later. For now, let's take a quick look back at the last winter days.
Videos
This is the video from latest SF Ruby Meetup in which we presented the brand new AnyCable Presence primitive along with its Hotwire component—the <turbo-cable-presence-source>
element. We’re close to the final release of the feature, documentation and more!
Posts
This post gives an alternative meaning to the "AWS" acronym, going into details regarding the pitfalls of using AWS. The part about API Gateway WebSockets limitations and the cost calculations is of most interest to us:
If you have 50k clients you'll pay AWS 200€ every month just to keep them connected to a WebSocket 24/7. For comparison, a 60€ EC2 instance can handle more clients than that with no limitations.
News
Hatchbox upgraded AnyCable support
Now running AnyCable on Hatchbox is as simple as adding a line of configuration!
Releases
This release brings many improvements around using authentication tokens. For example, you can now specify the token as a WebSocket sub-protocol (to not expose it in query params and, potentially, server-side logs). Note: this feature requires the latest RC of AnyCable 1.6.
@anycable/turbo-stream · 0.8.0
This release brings experimental <turbo-cable-presence-source>
element that powers AnyCable Presence features to Hotwire. It’s still early days of this integration, so we’d love to hear your feedback! You can see it in action in this demo application.
Frame of curiosity: don’t slow your roll
Have you ever heard of a slow client problem? This happens when a socket cannot read data as fast as it's written into it. The causes vary from poor network to malicious actors. Yes, slow clients could be used to perform DDoS attacks. It's hard to distinguish a good client with low bandwidth from a bad client configured to read slowly intentionally.
Okay, how exactly can slow clients harm your system? This article in La Nueva Escuela blog provides a good illustration (and explanation) on how slow clients affect systems of different types—with blocking and non-blocking write operations.
Basically, there are two possible outcomes (if you don’t guard yourself from the problem): either your whole application periodically stalls for a noticeable amount of time, or you memory usage grows significantly (example issue). The former happens when writes are blocking and the latter is a result of buffering data before performing the actual write into a socket.
For example, in Node.js I/O writes are non-blocking and in Go they’re blocking (by default). In Ruby, I/O objects have both #write
and #write_nonblock
methods (though the latter may raise an exception if the underlying buffer is full).
How does this affect real-time applications and WebSockets in particular? And how can we prevent the potential consequences of serving slow clients?
Recently, one of AnyCable users, Caleb Thorsteinson, shared his investigation regarding increased broadcast latency when there are slow clients and a high broadcast rate. AnyCable server is written in Go, so writing messages to clients is a blocking operation; however, there is a buffer and a write timeout in place: that's how we protect the pub/sub component of the server from becoming unresponsive when trying to serve slow connections. Still, under some circumstances, that wasn't enough.
Caleb shared a minimal reproduction setup involving a few scripts to create subscribers, perform broadcasts at a given rate and a Toxiproxy configuration. Toxiproxy is an indispensable assistant in all kind of non-ideal network test scenarios.
With the help of the load test (more precisely, its upgraded version), we were able to measure the effect of slow clients on such WebSocket servers as Action Cable (w/ Puma), AnyCable, and Centrifugo.
Action Cable uses I/O polling to detect when the socket is writeable, and there is a buffer to store pending messages. There is no buffer size limit nor timeouts. Hence, when there is a slow client, the Action Cable server memory usage grows as we continue broadcasting messages. In our tests, the memory grew from 380 MiB to 450 MiB with just a single slow client in 10 minutes (and 20 messages per second sent). Without slow clients, the increase was just ~10 MiB.
In the case of AnyCable, we were able to reproduce the original issue—a server's pub/sub loop is blocked when writing to a slow client until the timeout triggers and we forcefully close the connection. In theory, that should have been enough; in practice, our default timeout value was too high—10s. So, some broadcasts could get stuck for 10 seconds—not good. We also used a Go channel as an application-level buffer for outgoing messages; this is a typical approach for Go applications, but it doesn't take into account payload sizes. We fixed this by switching to a queue as a buffer and adding pending data size tracking: whenever a client exceeds a limit, we mark it as slow and disconnect. Thus, writes are non-blocking now, and the potential memory increase per connection is capped. The queue idea was borrowed from Centrifugo, which we tested and confirmed that it handles slow clients well.
The change will ship in AnyCable v1.6 (and is already available in the latest release candidate).