Do Monitoring Solutions Have to be Rearchitected to Monitor the Cloud?


Anything cloud is hot these days, and monitoring is no exception. Check out “cloud monitoring” on Google and you’ll see how many vendors have jumped on this bandwagon. Adding to the confusion is the usage of different terms – e.g., “monitoring the cloud” vs. “monitoring in the cloud”! While “monitoring the cloud” refers to how you can monitor applications that are hosted in the cloud, “monitoring in the cloud” refers to having the management solution hosted in the cloud and offered as a pay-per-use service.

Are either of the approaches fundamentally new? The short answer is NO! Many managed service providers have been offering multi-tenant, pay-per-use monitoring as a service for several years and this is nothing but “monitoring in the cloud”.

Consider “monitoring the cloud” next. Is a fundamentally new approach needed for monitoring the cloud? Again the answer is NO. Just consider the features that one cloud monitoring provider advertises:

EXTERNAL END-TO-END MONITORING:
• Monitoring Frequency – from 1 minute to 60 minutes
• Multiple Check Locations – America, Europe, Asia and Australia
• Monitors Websites, EMail Servers, Firewalls, VoIP, Databases, Domain Name Servers, Routers, Web Servers from end user perspective
• Supported Protocols – HTTP, HTTPS, FTP, SMTP, POP3, IMAP, SSH, PING, TCP, UDP, SIP, MySQL, DNS

SERVER AND NETWORK MONITORING:
• OS – CPU, RAM, Disk Usage, Processes, System Events, Installed Software

APPLICATION (TRANSACTION) MONITORING:
• Transaction Recorder; Load time of each component of the page – check load time of each component of your web application

GENERAL:
• Instant Failure Alerts
• Schedule maintenance – define downtime periods during maintenance windows
• Escalation – escalate continuing problems to different staff members
• Alerting periods – specify alerting periods per contacts
• SLA Reporting – detailed reporting with SLA metrics
• Public reports – show your uptime to your customers

How many of these are specific to the cloud? Very few!

To expand on this further, let’s look at different aspects of monitoring and see how these are impacted by the cloud:

The monitoring architecture: Conceivably, a business service could use applications, some of which could be hosted in one cloud (public cloud) and some others in another cloud (private cloud), and there may be stringent firewall rules prohibiting communication across these clouds except over standard ports, using standard protocols. Many of the large monitoring frameworks have been designed to use SNMP or other proprietary communication protocols and are ideal for monitoring in networks that are governed by a single domain. As services start to span multiple clouds and multiple domains of control, these single domain monitoring frameworks fail to function effectively.  Hence, if you are using one of the large management frameworks that have been architected decades ago, then yes – you need to consider a new approach for cloud monitoring!

In contrast, the next gen monitoring solutions like eG Enterprise have been designed ground up with multiple domains of control in mind. Administrators can choose between agent-based and agentless monitoring. All communications happen over standard web protocols HTTP/HTTPS, and the agents do not listen on any ports. Not only is this architecture secure, it also lends itself well for operating across cloud providers. Hence, if you are using a next gen monitoring solution like eG Enterprise, the advent of private and public clouds does not mandate an architectural change.

The metrics collected: Internal monitoring of the applications (by deploying agents to integrate with the application) still has value as it can provide additional details of malfunctioning applications (e.g., which SQL query is slow, or which Java method is taking time). The value of internal monitoring is not de-emphasized because you have moved the application to the cloud.

On the other hand, the advent of the cloud has increased the importance we have to place on external monitoring of applications. The best way of determining if your application hosted in a cloud is working well or not is by periodically measuring its availability and responsiveness. Tools and techniques for performing such external monitoring already exist (we have used external monitoring for years to check on the performance of hosted applications and servers) – you just to make sure you deploy them effectively and that you pay attention to the results.

Based on the above discussion, we can conclude that the cloud has not fundamentally changed the type of metrics that you need to be collecting. Of course, to know if the cloud is working well and what portion of the resources you have paid for are being used, new metrics will have to be incorporated. Most cloud providers have published APIs from which these metrics can be obtained.

Metric analysis and root-cause diagnosis: The analysis of metrics is no different in a cloud environment. The same cannot be said about root-cause diagnosis. Just like the introduction of virtualization introduced additional complexity that had to be handled as part of the root-cause diagnosis process, the introduction of the cloud forces root-cause diagnosis to take into account the functioning of the cloud. For instance, if an application is exhibiting poor response times, is it because the application is malfunctioning, or the workload is unusually high, or is it because the cloud provider is not providing the resources you had requested for the application, or could it be that you have not provisioned the cloud server correctly and hence, you are hitting a resource crunch.  Root-cause diagnosis technology must evolve to handle this additional complexity. As with any new technology, the emphasis of most early solutions for cloud is on the metrics and not on the root-cause diagnosis capability.

To summarize, “monitoring in the cloud” is nothing more than a remote hosting model for the management system. “Monitoring the cloud” requires adaptations to monitoring solutions to accommodate the additional infrastructure tiers involved. The changes required are evolutionary and not radically new in nature.

4 thoughts on “Do Monitoring Solutions Have to be Rearchitected to Monitor the Cloud?

  1. John Worthington March 24, 2010 / 12:31 pm

    the ability to monitor what is happening at every layer of every component in an end-to-end IT service infrastructure — across private and public clouds — and automatically isolate which layer of which component is the source of an anomaly (i.e., ‘root-cause’)

    goes to the purpose of the Event Management process as defined by the IT Infrastructure Library:

    “to detect Events, makes sense of them, and determine the appropriate control action…”

    whether a cloud provider views this as being in their interest is another story…it will be up to the customer to insist on the transparency that they require. This can go against the appeal of cloud computing in the first place (offload complexity to the service provider)…

    but just because the Event is in the cloud doesn’t mean that it’s not wreaking havoc with your Business, and getting buried in yet more data and having to make sense of the madness yourself by talking to your trusted partners doesn’t sound like much fun to me.

    I say ‘Trust but Verify’ if you know what I mean….don’t confuse collecting data and presenting information via fancy ‘dashboards’ with really knowing what is happening. That requires real monitoring intelligence, which is what many products are lacking.

  2. John Cavanaugh March 26, 2010 / 2:28 pm

    Next Generation monitoring solutions like eG Enterprise build the baseline of a new combination of requirements that span between ITIL Service Management (Capacity and Availiblity) with ITIL Security Management.

    I see many vendors (and prospective users) treating cloud architecture much like a electric utility that must generate, transmit and distribute electricity. The electric utility has a SLA for delivery but no knowledge of how the electricity is used (freezer vs life support equipment).

    Cloud computing WILL require knowledge of how the users expect to receive the cloud computing benefit in order to support the Confidentiality, Integrity, and Availability (CIA) security elements. An example is authorized users accessing and using their data from a cloud provider. If non-authorized users gain access, normal operation monitoring would only see a increase in volume, with no awareness about a confidentially breach.

    Next Generation monitoring will require a platform to extend both Service and Security management and leaders like eG will be able to meet this challenge.

  3. Parthi April 7, 2010 / 7:54 pm

    This blog is a must read !!! Thanks

  4. Rolf Frydenberg July 22, 2010 / 5:42 am

    I agree that there is little fundamentally new in monitoring Cloud Computing versus “traditional” in- or outsourced IT. Rather, an existing problem looms larger: That monitoring too often is in the hands of those being monitored! Monitoring – at least for SLA compliance reporting – should not be done by the “producer” of the service, but by the “consumer” or a trusted third party on his behalf.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s