The “First” Cloud Migration — Items to Think Through

Cloud Migration is getting ubiquitous now a days. Every Enterprise is attempting to move their workload to Cloud because of — Fast Innovation, Digital Transformation (Yeah, Covid-19 is highly responsible for this now), Scalability, On-Premise Data Center getting closed (yeah, this one is also common now a days 😅 ), Cost, and many more reasons. Many of them have mastered the Art of Migration, but for Enterprises who are doing it for the 1st time, they need to think through quite a few things.

There are multiple articles available to highlight the Migration Strategies. One of them published by AWS is — Do take a look at this and other similar articles.

Going a bit off the track — I generally ask this one question to the Application Team who is planning to migrate the workload to Cloud— Is the Workload to be Migrated is Cloud Ready !!!

The answer to this question helps the team to freeze on the Migration Strategy to execute (Remember the 6 R’s, sorry its 7 R’s now !!!), thought to put here to set some ground. Well the intent of this question is — would your current Application Architecture help you reap the benefits offered by the Cloud? Does it meet all the standards required in terms of Security, Compliance, etc. for your application to be hosted in Public Cloud? Is the current Recovery Time Objective (RTO) and Recovery Point Objective (RPO) acceptable? If yes, lucky you !!!. But if not, do you have enough time to Refactor or Re-Architect the application and then migrate? If not, then you may want to execute a pure Lift and Shift OR may be a Lift and Shift with Tinker Migration Strategy and then once in cloud with workload running in stabilized mode, start making changes to leverage the Cloud Native Services and Managed Services as applicable to reap full benefits of Public Cloud.

Ok, now coming back to this blog — It covers some of the key checklist items around Actual Migration (Implementation) phase. Let’s take a look —

1) Designing Cloud Infrastructure

If there is a plan to use Golden Images (AWS AMI for example) baked by the Enterprise security team with all the security vulnerabilities patched, etc. ensure that Applications are deployed on those AMIs only. Ensure enough checks in the Cloud such that any usage of non-recommended images are flagged as Non-Compliance and is communicated to the Application team. This can be done with services like AWS Config Rules if we are deploying in AWS public cloud.

2) Cloud Network Security — Deciding on the Ingress and Egress Traffic

Options — Setting up new Virtual Firewall in the Cloud OR diverting all the Internet Egress Traffic from Cloud via On-Premise Firewalls and Ingress Traffic via On-Premise Firewalls to Cloud (assuming trust relationship between On-Premise and Cloud Zones). It all depends on the latency and throughput requirement of the application. And yes, Dollar numbers also play a crucial role in the decision 😏. So make a choice accordingly.

There are many other items which comes under the Security bucket — would refrain my self from clubbing those under this blog. That itself can be a big big topic of its own.

3) Identify Workload Dependencies

I know what you are thinking — What if the Dependent application are available on Public endpoints? Hmmmm…..If this is the case, and your security or compliance team is OK with it, then enjoy the day and have a🍺.

4) Data Migration Strategy

It all boils down to what kind of data is to be migrated — Hot data , Warm data or Cold Data and how is application using it. Is the Data maintained in partitions like — Year/Month/Day/Hour wise, or is data non editable once saved, etc. or is it maintained separately for each Tenant (assuming Multi-Tenant Application) which can be moved to Cloud one at a time. There will be many other factors apart from the ones mentioned above which helps you to decide the actual Data Migration strategy.

Some of the standard patterns for migration are:

  1. Migrate the historical data to cloud and then on the final migration day, take a downtime on the application for some time (minutes to hours) and copy the delta to cloud and you are done. Safest approach. But involves downtime on the application.
  2. Use 3rd Party Sync services to have near real time sync from On-Premise to Cloud. For instance Oracle Golden Gate for Database sync, AWS File Gateway to sync data between NFS and S3. Has, almost zero Downtime for the On-Premise application. But adds more cost and some overhead.

One also needs to consider the logistics part — a) How much data is to be migrated, b) How much time do you have for migrating the Data, c) What is the channel available for migrating the data — does it provide enough bandwidth to push the data from On-Premise to Cloud in a timely manner? d) Using External storage devices for sending the data to Cloud — think of Snowball kind of devices.

And the last one — Are you thinking of Fallback to On-Premise should anything go wrong when you move to Cloud. If yes, then you also need to think through how would you sync the data back from Cloud to On-Premise. I know….not easy, but if your application cannot take a downtime for hours, etc. then this is the option.

Ok, one more last one — Ensure that encryption at rest is enforced for data that requires it as per compliance. For example, using KMS service for encrytping data at rest for S3, RDS instances, EBS volumes. (I am a die hard fan of AWS, hence all my examples are from there 😏)

5) Traffic Dialing Strategy

Couple of patterns:

  1. 100% Dialing in one shot — Change the DNS entry and migrate all traffic to Cloud at once.
  2. Staggered Dialing — In this strategy, Application is live on both On-Premise and Cloud environment and traffic is dialed between the two in a staggered way. May be traffic of just one Tenant is going to Cloud, rest is being served by On-Premise. Or you may dial 5% of traffic to Cloud and 95% to On-Premise. Or there could be other conditions based on which you may want to split the traffic.

Why would you go with #2 — It all depends on how critical the application is and can you afford to take a downtime if something goes wrong? Its always better to take a hit for users and take corrective actions rather than failing for all. This approach helps you to observe the Cloud system, how it is behaving, understand the performance. And once satisfied, increase the traffic percentage. Warning !!! It also brings more headache of ensuring the data is available in real time at both Cloud and On-Premise as you may want to fallback to On-Premise if Cloud system is not performing as expected and there are issues being observed.

6) DNS Cutoff

7) Observability

8) Automation is the Key

9) Cost Optimization

10) Application specific settings

Cloud Migration is not a Cake Walk. Every Migration is unique and comes with its own set of challenges. And once the migration is successful, it makes you learn one more way of — How not to fail when doing Migration 😃

You may also be thinking — How about Cloud Governance, Account and Resource Tagging Strategy? Doesn’t that require planning, etc. Well, I deliberately skipped them as I plan to have a different blog covering only them.

So stay tuned !!!

#AWS #CloudArchitect #CloudMigration #Microservices #Mobility #IoT