Real-time magic, no elixirs: optimizing Sera with AnyCable
Realtime GPS data streamed by AnyCable Pro
Originally published on Martian Chronicles
In this project, we migrated one key element of the client’s system from Elixir to AnyCable, allowing them to run real-time features and ensure proper maintenance with no problems. Further still, we sped up the infrastructure based on AWS ECS, Fargate, and CloudFormation—a tech stack that is a little bit atypical for Evil Martians.
Sera Systems is a US-based provider of field service management software for plumbing, electrical, HVAC (heating, ventilation, air conditioning), and other home service businesses.
Its flagship product, a cloud-based software platform with accompanying mobile and web apps, enables home service companies to automate and speed up back office processes like booking, providing quotes, scheduling, invoicing, and sharing information between customers, technicians, and the office team.
Tracking the way for a tracker
One of the critical elements of their service ecosystem is a GPS tracker, which sends a worker’s GPS location data to the system and its admin dashboard so that users with admin roles can assign orders. Geodata is displayed on a map and processed for some position-related calculations.
Initially, this was a custom microservice based on legacy Elixir/Phoenix code, which was sending GPS data via WebSockets. This solution was tough to maintain in the Ruby on Rails environment: although it appears Ruby-like, it was an independent solution with many nuances. In addition, the team needed more features from the tracker. Therefore, they decided to migrate to a more cutting-edge solution, and to our delight, they opted for the Evil Martian product—AnyCable. After all, cables work better and are more native to Ruby on Rails applications. Our team was to help Sera with migration and deployment.
Tech stack and AnyCable client
The platform consisted of a Rails monolith on the backend, web applications, and a mobile application based on Vue and Ionic. The flow of the tracking service was also quite simple: get the data, write it to the database, and send it to end users who are looking at the map.
This allowed for a fairly fast, smooth migration to AnyCable. In the first stage, we slightly rewrote the current functionality and abstracted the real-time client directly. Then, we leveraged the server implementation and tested it using TestProf, a Martian open source project.
After that, we connected our anycable-client
to the mobile application and set up feature toggling from GitLab: first, to roll out the new server gradually, and second, to be able to roll back if something goes wrong. Mobile applications are not web apps—they won’t be updated on their own—so it was necessary to flexibly configure everything. In the end, we added our client to the Vue-based web application.
We used anycable-client
primarily because of automatic token renewal and the token expiration problem in particular. The second reason was that we needed a flexible configuration of the reconnection logic. Due to the product specifics, a persistent connection is not required; it’s only necessary when a tech worker is moving to or from a client’s physical address. Keeping the connection open means draining the battery. Therefore, we configured “monitor” (a term from the library) to connect only as needed.
We also set up client-side SDKs to support the new functionality.
Testing
We had the challenge of reworking visual or integration (through screenshot comparison) tests: it was ludicrous to believe that the reference screenshots from Google Maps would remain identical weeks or even days later—the POIs are constantly changing there. Accordingly, we prepared a Plan B in advance: to make “standards” within the test itself or to test not for similarities but, on the contrary, for visual changes.
We also improved the testing process using Evil Martians’ open source products: TestProf for performance analysis, and Fixturama to write elegant Ruby tests for cases requiring a lot of data (without cluttering it with unnecessary details). In addition, we opted for Dip—a project for simplifying the Docker Compose utilization from the Martian open source arsenal. Dip turned out to be very convenient in this project: we could deploy the environment with one command, perform fixes, run tests, and demolish the entire environment as needed.
Real-time evolution
At first, we used AnyCable to support communication between different applications (aka microservices) for tracked workers on the map. The performance of the underlying queues was quite important due to the (potentially) large number of incoming messages from trackers. So while we had initially switched to AnyCable in order to solve a specific problem, after solving it, we found that the solution had opened up even more improvement opportunities in fields unrelated to the initial task.
We deeply investigated other parts of the application to find where we could move some asynchronous processes from different user tasks to the background. For example, our clients have to build a lot of reports here and there; beforehand, all these reports were being prepared synchronously. But after AnyCable proved its usefulness, we started migrating all report preparation to the background, improving the user experience and better distributing the heavy load. Here, AnyCable acts as a channel connecting background jobs to the UI via real-time broadcasting notifications related to the job state.
Multi-tenancy
Around the time that our project with Sera had started, the task of wrapping calls of all RPC methods to implement sharding was simultaneously relevant for several of our projects. Back then, introducing multi-tenancy to a web application wasn’t easy—we needed to design or redesign a database schema, ensure all kinds of “requests” were bound to the right tenants, and so on.
However, the Action Cable update added command callbacks to Action Cable Connection: this feature would allow the handling multi-tenancy without any hacks. But it was focused only on classic Rails items: components, controllers, and background jobs. To take care of the channels and concurrent clients like AnyCable and beyond, we added this feature to anycable-rails and eventually brought multi-tenancy to both AnyCable and the Sera app. Further, our Rails integration included a backport for command callbacks for older Rails versions.
Call for CallRail
We also worked on a key business feature—integration with the CallRail service for phone call processing. The flow was quite simple: we received notifications about call status via webhooks, periodically went through the API to download lost calls (scheduled protective requests for downloading missed events via the API), put everything collected into the database, and forwarded the notification to the cable.
Infrastructure goals
Our other task from the Sera team was optimizing the project infrastructure. Although the load was not often at peak level, it was nonetheless pretty regular as all applications were connected somehow, and therefore, resources were still being wasted. Additionally, they wanted to rework their infrastructure to offload some performance, scalability, and infrastructure management tasks.
Many startups are keen to rely on something that is both widespread and familiar in terms of infrastructure and hosting (yes, Kubernetes is still not the standard for startups). Therefore, we dove deeply into AWS ECS, Fargate, CloudFormation, and all related proprietary services and solutions by AWS, since that was the infrastructure the customer preferred.
This wasn’t our first experience with solutions like ECS and CloudFormation, but because the client desired the most AWS-based and straightforward configuration, we had the chance to make a lot of on-the-spot improvements in the AWS configuration area.
In addition, we helped the Sera team solve some urgent and critical problems with memory, deployments, and authorization. Finally, we introduced all the new changes we made to the CloudFormation code that the client already had in place before our cooperation.
AnyCable deployment
Since we already had a wealth of experience in the AnyCable deployment area, bringing AnyCable into the project infrastructure was a piece of cake. This product is optimized for different environments, so it even fits well on AWS, which isn’t the most effortless infrastructure to enable all its features and benefits.
We worked with AnyCable-Go, which handles the WebSocket connections and talks to AnyCable-RPC (ruby-based) via GRPC. AnyCable-Go has integrated connection balancing, thus splitting the load across multiple anycable-RPC services, so there is no pressing need for a network mesh (which was ideal for Sera’s ECS case.)
The experience we went through deploying AnyCable also helped us later add configurations of two other microservices to the same complex environment: CallRail and Zapier.
Monitoring
One of the important tasks was monitoring deployment—the client had read (and liked) our article about monitoring processes organized with k6 and Yabeda: ”Real-time stress: AnyCable, k6, WebSockets, and Yabeda”. They wanted the same approach, especially the graphs in Grafana. But since we didn’t have Kubernetes on the table this time around, we had to conjure our magic over CloudFormation, including Yabeda, a new logic design, and the implementation of the rest of the monitoring stack.
Monitoring design turned out to be a non-trivial task in this environment. We needed more advanced monitoring features to reflect the current situation, specifically for AnyCable. That’s why we had to “reinvent the wheel” to collect all metrics from our apps into the services that AWS supported at that moment (Prometheus, Grafana, and CloudWatch). In particular, we had to manually write the service discovery script and set up custom, self-managed Prometheus instances for sending metrics into AWS Prometheus.
Run the runners
One of the critical tasks was to speed up their CI/CD pipeline. For that, we started migrating testing tasks to self-hosted GitLab CI runners. Although more robust, the shared GitLab runners are extremely restricted in resources. So, we had to take a lot of measurements to understand our goals and the possible configurations and prepare a flow that would simplify the switch between GitLab’s shared runners and our own.
We also optimized the CI/CD flow and managed to speed up the “push to deployment” testing process threefold, significantly reducing the feedback loop for engineers. For instance, the “start to deploy” test was reduced from 50 minutes to less than 22 minutes, and the test for Merge Request, especially critical for developers, was decreased from 30 minutes to just 7 minutes.
Help with Logs collection optimization
For better observability and debuggability, the Sera team pushed their production logs to the Amazon OpenSearch service. But, due to its complex configuration, they periodically ran out of space. We managed to improve the OpenSearch configuration, raise the volume, increase the number of machines, elevate the policies for storing and deleting old logs, and implement monitoring to trace resource availability.
This project gave us even more experience migrating from legacy real-time solutions to AnyCable, with all the accompanying writing and implementation of new business logic. We’ve also migrated other projects from Action Cable and seen successful migrations (from Sendbird and other solutions), and now we’ve worked with Elixir replacement, too.
Last but not least, the most valuable skills we acquired were urgent problem solving and how to squeeze out maximum performance from the AWS infrastructure.