Have you ever struggled to implement horizontal scaling with a stateful service?
Kilian Ruess, Senior Solutions Architect from Amazon Web Services interviewed Robert Graebert, CTO of Graebert GmbH. Watch the interview to learn how Graebert CAD Software scales its business CAD solution to match demand using Amazon Web Services (AWS) Auto Scaling and how it implements session draining based on Amazon Simple Notification Service (SNS) and AWS. You can read the transcript of the interview below:
Kilian Ruess, Senior Solutions Architect, AWS : Hello and welcome to this is my architecture, I’m Killian from AWS and I’m joined by Robert CTO from Graebert. So Robert, thank you for joining us.
Robert Graebert, CTO Graebert GmbH: Thank you
Kilian Ruess: So what does Graebert actually do?
Robert Graebert: So Graebert is a provider of DWG based CAD software. Our suite of products includes both mobile, desktop, but also cloud solutions and ARES Kudo is a browser-based CAD application that allows you to create edit and view your designs from a browser.
Kilian Ruess: So great, this is mainly focusing on the business user side?
Robert Graebert: Yeah sure, so the typical users are the manufacturing or in construction and they will use it sometimes they’ll just use it for a short amount and sometimes they’ll use it for the full day.
Kilian Ruess: What are the challenges of CAD software on the cloud?
Robert Graebert: So when a user tries to work with a design, opens it up, we’ll fetch it from typically from a cloud storage provider and load into memory and then send only to the client what’s necessary for the visualization and the interaction with that design and so we have two big needs, one is we need high memory servers but we also need to be really low latency, we need to have a high interactivity with the user
Kilian Ruess: So this means you have a lot of data running on the instance procession?
Robert Graebert: Correct, so every user every session is connected to a specific host, so we need to make sure that we always write all the requests there and then as we as more sessions get created we’ll scale out also our service.
Kilian Ruess: Oh let’s look at the architecture there and can you give us an overview?
Robert Graebert: Sure so we use route 53 as sort of our entry point, we then go to a set of load balancers in an auto-scaling group across multiple availability zones and those will then talk to another auto-scaling group of CAD hosts and these CAD hosts as I already mentioned are shared across with the users and as the demand grows during the day we’ll scale out this auto-scaling group based on a custom-memory based trigger.
Kilian Ruess: So actually you can adapt your consumption perfectly to your needs.
Robert Graebert: Correct and as we mentioned that those are business users, so we see a lot of usage during and sort of nine-to-five days and during the night and on the weekend in each region.
Kilian Ruess: Okay perfect, what about availability? we have those long-running sessions, this poses some problems I would say, can you walk us through an update here?
Robert Graebert: Yeah sure, so updating is a huge pain point now in the industry, so traditionally in CAD software if you had to put on a desktop to upgrade, it was a very IT heavy process and it would take a long time for the users to get the update and we think with cloud, we can envision that and so we actually are deploying updates nearly every two to three weeks and we’re doing this with zero downtime without disrupting any of the users. So let me show you how we do that, so let’s imagine we have 1 version release deployed right now, so all the users are currently being served by the auto-scaling group here. Now we want to deploy a version 2, we will use CloudFormation to update the launch configuration and auto-scaling group will automatically launch new instances we’ll register those instances at the load balancer and the load balancer will now service new traffic over here.
Some other questions here would be what do we do with the existing users? We don’t want to cut them off from the software because that would be disruptive we want to keep the servers running as long as we have users on them.
Kilian Ruess: So you need a draining feature here? Can’t you use the draining feature of our load balancer ELB?
Robert Graebert: Yeah so we investigated the load balancing options available on AWS, ELB, and ALB but in the end, we decided to go with our own HA proxy situation. Two reasons, one we needed one long-running session so it’s different than a web application and traditional web site but we also needed to be able to drain a session centers connection, so we need to be able to still send new requests to that to the server for that specific user. Although we do want to shift off course all the new traffic or the new sessions to the new servers.
Kilian Ruess: Okay, so how do you orchestrate this update then?
Robert Graebert: So now we’re in a state where the auto-scaling group is called a mistress triggering a notification saying, ‘hey I want to terminate an instance’, so we’ll subscribe to that using our SNS topic and inside the SNS and topic we will launch our LAMBDA script and the LAMBDA script is very simple it will just call SSM, which in turn runs a termination script on the host. The termination script really checks for one simple condition, Is there solo sessional life? if there’s no session alive. Simple, we notify the scaling group is safe to terminate. If there are active sessions, we will tell the auto-scaling to wait, we will also notify HA proxy to put the server in a weight of zero meaning that we will get note, we are no longer considered in the four new sessions but it so knows we’re live, we’re happy, so we ask a request for the old sessions.
Kilian Ruess: Perfect and then you’re through the update and everything is on the new version. Yes. So a tricky problem but a really elegant solution. So what about your global customers a year, you have customers worldwide using this solution?
Robert Graebert: Yes, so we have this solution applied now currently in six AWS regions worldwide, mostly to solve latency issues. We really want to be close to the customer and as demand grows and specific reasons we want to be able to deploy in that region and so we have a user of cloud formation allowing us to replicate the same infrastructure the same design all places, allowing us to seamlessly update this versions across the different regions. So we will use cloud formation to deploy into a region and then use route 53 geo-based routing to route the user to the correct region that’s closest to him.
Kilian Ruess: Very nice, thank you very much for sharing all that and thank you for watching this is my architecture.
Learn how Graebert’s Cloud-CAD Solution ARES Kudo compares to AutoCAD’s web app
Read a 46 pages long white paper comparing Graebert’s Cloud-based CAD Solution to AutoCAD’s web app. This paper is written by an independent industry expert Ralph Grabowski. Ralph Grabowski is an expert in the field of computer-aided design, with 30+ years of experience.