Eric D. Schabell: Cloud Data - Understanding 3 common pitfalls

Wednesday, August 17, 2022

Cloud Data - Understanding 3 common pitfalls

 cloud data

The daily hype is all around you.

From private to public cloud, multi-cloud, and even hybrid cloud, you're overrun with information telling you this is the path to your digital future. To complicate matters while you are contemplating these choices, you are expected to keep up your daily tasks of enhancing customer experiences and agile delivery of those applications.

Wrapped up in all this delivery and architectural infrastructure, there's a multitude of decisions around data to be considered when engaging with any cloud experience. There are regulatory and compliance pressures that force you to evaluate how we collect, process, and store our observability data. Understanding the pitfalls around the collection, maintenance, and storage of your cloud data can mean the difference between failure and success within your cloud strategy.

This series is based on a talk given previously in Dublin, Ireland and was brainstormed with my good friend Roel Hodzelmans. The reactions from the audience inspired me to share the concepts in this series.

Introduction to cloud and data

This article introduces the cloud and data we are talking about, by looking closely at the decisions being made (and their effects) when monitoring your applications. Based on anonymised real world experiences, this story highlights three top issues you need to understand as you're transitioning your data needs into any cloud environment.

When we talk about cloud here, it's anything you are using in a public cloud. This can be a single cloud user experience with just a single node, or as complex as multi-cloud providers with a vast array of nodes across many regions. The story here is not focused on your choice of public cloud usage, but on the abstract concepts of public cloud and how data impacts you directly.

Data in the cloud has many forms and it is more than just storage of data we are talking about. It can be user data you collect on your customers, it can be data manipulated by your applications, it can be logs you want to collect, it can be metrics or even more complex observability data. 

Any time you talk about the cloud and data, it's very important to understand that the cost you least expect is the bandwidth used to transport any data in and out of your cloud environment. 

As data can be in many forms, there are plenty of examples of use cases where organizations using public clouds have been confronted with devastating usage bills. Take for example this simple illustration where an online news organization would have been devastated if they had made the choice to run their multiple news sites across several countries and languages in the cloud.

Once upon a time, there was an extreme emergency in a large city that caused chaos to a point that the local government chose to lock down their citizens as their law enforcement agencies attempted to get things under control. During this lockdown that lasted over 24 hours, the citizens of this large city went online at home and crashed the local language news sites, all of them. 

Being very resourceful citizens, they knew that the neighboring country to the north had a small region on their border that shared the same language. They also knew that there were two news sites online in the neighboring country that might have updates on their situation, so the diverted all their attention to these two sites. 

Now for a little background. The news organization that ran the two sites in the neighboring country also had four more news sites in a different language running in other regions of that northern country. The year before they had been approached by multiple public clouds to host all of their news sites in the cloud, but their CIO was smart and realized that bandwidth was one of their most important billing items. The quotes given were too steep for them to run all of their news sites in the cloud. Instead they had opted to continue to use data centers with physical machines that they owned located regionally near their various news sites.

On that fateful day that all of the resourceful citizens locked down in their city decided to divert to those northern border news sites, the CIO saw a spike in traffic over a 24 hour period. This spike was so big that he quickly did some math on a notepad, realizing that if they had hosted those two foreign language news sites in the cloud that his entire organization would have gone bankrupt. 

The reason for this story is to understand that data in the cloud means you have to think differently about how you architect your solutions. Not everything needs to run in the cloud and some of the costs might not work for your use case. If you are not aware and transition your solutions without planning, testing, and understanding where the architectural choices can lead... you might be in for some surprises. 

The forgotten data

In this article I've introduced the foundational concepts of cloud and cloud data showing that it's often more than just data storage. The following article in this series will take a look at how observability in the cloud is often the forgotten data.