How to Resolve a Complex IT Problem in Just a Few Clicks with eG Enterprise


topimg_22877_complexity_600x400

One of the most common and yet difficult things for any admin to accomplish is to trouble shoot end user “slow-time” issues. Application, database, network and server unresponsiveness or “slow-time” negatively affects enterprise performance and end user productivity ten times more often than downtime and can originate from just about anywhere within the enterprise.

Misconfiguration due to human error, missing drivers, intermittent memory faults, network IP cache errors, unbalanced workloads and constrained virtual resources can all be the root-cause of slow-time, or they could just be a resulting symptom, the key to resolving such issues is getting to the root-cause quickly before they spread to other systems and bring productivity to a standstill.

In the following walkthrough, I detail how such a scenario can be resolved quickly and easily before end users notice with the help of eG Enterprise. This example focuses primarily on Citrix and vmware but eG Enterprise can help IT departments maintain maximum productivity for millions of combinations of enterprise components.

eG Enterprise is a 100% web based solution making it possible for anyone in IT from the CIO and IT Managers to admins and helpdesk specialist to proactively monitor their environment anytime, anywhere on any device.

eG Alarms Window for Blog

When a “slow-time” error occurs, eG Enterprise automatically generates an alarm to the appropriate admin so they can take action immediately. The solution correlates and color codes the minor, major and critical alerts and displays them using a layer model with the most critical alert at the top.

eG Alarms Window Details for Blog

According to the alert Virtual CPU usage in the vmware ESX system console is high. The system console is a bootstrap operating system of ESX and should only be using about 2% of the CPU allocated to it, by scrolling over the description we can see the usage has suddenly increased to 100%; left alone this would surely affect Citrix performance and generate a large number of support calls to IT from end users.

eG Detailed Diagnosis Window for Blog

Fortunately, eG Enterprise patented detailed diagnosis technology makes identifying the root-cause a breeze. By scrolling to the right and using the magnifying glass icon, the root-cause is revealed within the Detailed Diagnosis window. The window is displaying information for the top 10 processes using virtual CPU resources; to the right of the window those processes are listed as SAMBA backups.

eG Fix Feedback Window

The root-cause is simply this, the vmware admin is performing a normal backup but it is taking place before the end of the workday potentially affecting Citrix users when they attempt to log on and access applications. The best solution is to contact the vmware admin, explain the situation and either agree to reschedule the backups or adjust virtual resources.

In this case, the Citrix and ESX admin agree to reschedule the backups and then the Citrix admin uses the Fix Feedback feature within eG Enterprise to document the event, as well as the agreed solution and save the record. Resolving the issue took just a few clicks and a quick phone call between admins.

eG Enterprise provides universal insight across platforms and domains whether they exist in the Cloud, the data center or in virtual space, it is for this reason that the Citrix admin had the visibility they needed to identify the root-cause as a virtual resource constraint within the vm’s that support Citrix.

The following is a more detailed look at the Citrix admins view of the eG Enterprise Universal Insight dashboard as well as the methodology and technology behind the solution.

eG Universal Insight Dashboard for Blog

The color codes of eG Enterprise are familiar, green is “Normal”, yellow is a “Minor” alert, orange is a “Major” alert and red is a “Critical” alert requiring immediate attention.

The dashboard provides universal insight for the 12 different components that comprise the two Citrix services they are monitoring. The Component Type panel lists the details for each of the 12 components.

Clicking on the listed services in the middle of the Infrastructure Health panel on the left will reveal which of the two services is generating alerts.

The Measure at – A – Glance panel at the bottom left lists the measurements and tests conducted for each of the 12 components that comprise the two different Citrix service being monitored. Details include CPU utilization, Free memory, Active Citrix sessions, and more.

The bell icon on the top right of the window is a link to the alarm window details viewed previously.

eG Infomart Services Window for Blog

After clicking on the Services panel eG Enterprise presents two Citrix service icons for the different services, it appears that Infomart is the service experiencing major issues, clicking on the Infomart icon opens the list of Web Transactions for the Infomart service.

eG Informart Transactions Window for Blog

The list indicates that Application Access and User Logons are experiencing errors. The Citrix admin may click on the Topology tab or the transactions themselves for a service topology graph for the Infomart service.

eG Citrix Topology Window for Blog

Navigating the Infomart service topology from left to right, end users are connecting to Infomart through a network node, then a web server that is experiencing minor errors; the requests then reach a Citrix Zone Data Collector, which sends the request to one of the Citrix XenApp servers, which is experiencing major errors. The XenApp server then accesses the appropriate file, print or database server on the backend.

Based on the color codes eG Enterprise is indicating that the primary focus should be on the Citrix XenApp servers. Clicking on them will present the Citrix admin with either a physical or a virtual topology for Citrix XenApp depending on the supporting host.

eG Virtual Citrix XenApp Topology Window for Blog

The virtual topology indicates that the vmware ESX virtual machine hosting the Citrix Zone Data Collector, the Web server and Citrix XenApp are experiencing critical errors, that is where the focus needs to be. Clicking on that area of the virtual service topology reveals the elements that support those virtual machines.

eG Layer Model Window for Blog

This is the eG Enterprise Layer model for the Citrix XenApp service topology. On the right are all of the elements that support the virtual machines as well as the various tests that correspond to that layer. As an example within the OS layer for the virtual hypervisor are measurements and tests for the System Console, CPU, Disk Space and more; depending on which layer is selected the information within the right panel changes accordingly.

eG Detailed Diagnosis Window for Blog

eG Enterprise has already identified that virtual CPU resources within the system console are constrained. Using the magnifying glass icon on the far right opens the same Detailed Diagnosis window previously accessed from the Alarms window and the same SAMBA errors are viewable, this confirms the previous diagnosis.

Regardless of the path an admin chooses to use, identifying the root-cause of a complex IT problem takes just a few clicks with the eG Enterprise.

eG Infrastructure Health Reporting Window for Blog

The last thing I will cover are some of the reporting benefits that eG Enterprise provides. By returning to the Universal Insight dashboard and selecting the Reporter tab, anyone in IT can pull performance reports for the infrastructure.

Reports are available based on Function, Component, Service, or Segment; two of the more important are Operational KPI and Capacity Planning. Easy access to comprehensive reporting make it possible to maintain business continuity, predict peak needs, ensure future readiness for emerging technologies while keeping costs down and increasing productivity.

The eG Enterprise methodology is simple, the technology is powerful and the universal insight is comprehensive.

For a free trial, to schedule a live demo or obtain more information about eG Enterprise send a request to info@eginnovations.com or go to our website at www.eginnovations.com

Making IT Service Management Simple and Proactive


One of the big challenges that enterprises face with monitoring tools relates to “false alerts”. Administrators often get too many alerts when nothing is wrong. So very often, they end up turning the thresholds up in such a way that they don’t get many alerts. If this is done for all the metrics collected, this may end up defeating the purpose of having a monitoring system!

Setting thresholds for metrics being collected by the monitoring system is often a challenge. Many a times this is done manually and requires a lot of domain knowledge and expertise. Often, the threshold settings also have to be done differently from one server to another (a bigger server can take more load than a smaller one).

Many new age monitoring systems have used analytical techniques that use historical performance data to intelligently set thresholds for metrics. As we have discussed in a new whitepaper, even such approaches have challenges. The new whitepaper “Make IT Service Monitoring Simple and Proactive with Intelligent Thresholding and Alerting” touches upon these and many other topics that have to be considered when determining how thresholds are computed for metrics and how alerting is done by the monitoring system.

Related links:

All whitepapers from eG Innovations: http://www.eginnovations.com/whitepaper/whitepaper-request.htm

Managed Services 2.0 is All About Delivering Business Value


Far too often, the focus of monitoring tools and of Managed Service Providers (MSPs) has been on the technical capabilities. For example, the number of metrics collected, the types of applications and devices monitored, the number of reports available, etc. have tended to dominate any MSP’s monitoring offering. Faced with increasing competition and intense price pressure, MSPs have to look at alternative sources for revenue growth. MSPs will need to move up the service provider food chain and closer to their customer’s business in order to offer value-added services.

The greatest value and greatest revenue potential is when the service provider can influence the key business services of an enterprise. Consider the figure below. This figure indicates that a minute of downtime for a business critical infrastructure supporting ERP services is around $13,000. Now consider if the MSP can reduce the average down time of an incident from 60 mins to 30 mins. They will he helping the customer save $390,000 for one single incident. If the MSPs proactive monitoring can avoid say three outages of one hour each in a year, that’s a saving of over $2.3 million. It is such savings that MSPs need to strive for.

The cost of a minute of downtime in an IT infrastructure depends on the business critical services it supports

In the next wave of managed services – Managed Services 2.0 – MSPs need to focus beyond CPU, memory, disk metrics and start to deliver demonstratable business value to their customers. It is such MSPs that will become trusted advisors to their customers and will survive the competitive market of today. To succeed, MSPs need to have the right monitoring tools, the right processes, and the right people. It is this combination that you should aim to develop for success in the MSP marketplace.

We have just published a new whitepaper that discusses how MSPs can move to delivering managed services 2.0 and how the eG Enterprise suite can help MSPs get there. If you are interested in a free copy of this whitepaper, click here >>>

Collaborative Management – An Approach to Achieving IT Service Management Excellence


For long, we at eG Innovations have propagated the concept of “Collaborative Management” – something that Gartner analysts have also mentioned in recent times. Many organizations have looked at ITIL as a way to achieve IT service management excellence. Most of these initiatives are driven top-down in the organization.

Often such ITIL initiatives take a long time to deliver or fail because there is no buy-in from within the organization. Success of these initiatives depends on how well the IT staff understands and appreciates what these initiatives seek to achieve, and how they can benefit the organization and the staff.

An alternative approach that we’ve seen working – that too in a short timeframe – is a bottom-up approach. We call this the Collaborative Management approach because of the way in which it evolves and what it achieves. Typically, this approach starts with one group of IT experts seeking to understand how the other silos are performing. The reason to do this is because this group is often being blamed for service problems. For example, Citrix administrators are often faced with complaints such as “Citrix is slow” or “Citrix is not working”. They look at all the Citrix application metrics and can’t find what’s wrong. Yet user complaints persist. These admins now want to know how the other components that are involved in supporting their service is performing – e.g., is the network working properly, and how the profile servers are doing, and what is going on with the database.

To get this additional information, they will probably not have administrative access to these components. Hence, the monitoring tool they look for should be able to work with “incomplete visibility” into the infrastructure and still be able to deduce where the bottlenecks are in the infrastructure.

Once this group finds metrics about the performance of the other domains, they can provide the evidence to the other groups. Not only is this information useful in fixing problems, it also gets the other groups interested in looking at a common monitoring console for metrics. Each domain administrator now wants to look at the common monitoring console and make sure that his/her domain cannot be blamed for a problem. At the same time, they are interested in seeing how the other domains are performing. This bottom approach provides an evolutionary model for achieving IT service management. As more domains get to use the common monitoring solution, you start to get complete end-to-end visibility.

To support collaborative management, a monitoring solution must:

  • Be able to provide a service oriented view of the infrastructure, so administrators can correlate service performance with that of their respective silos;
  • Be able to support personalized views, so administrators who only want to view the state of their silos can still do so;
  • Be able to collect metrics about different domains without needing complete visibility into those domains. As more access becomes available, the monitoring solution should be able to provide greater visibility into these domains.

The interesting part of a collaborative approach to service management is the way it evolves and how administrators think about it. Enterprise-wide buy in is not required for its adoption. Collaborative management starts small and percolates into the organization as its value is demonstrated. This approach does not require all administrators to understand or believe in IT service management.

The figure below summarizes how administrators look at the collaborative management solution. The Citrix admin looks at it as a Citrix monitor, while the database admin looks at the tool as a database monitoring solution. Sometimes the converse is also true – the Citrix administrator uses the tool to monitor the performance of the network (something he/she had no visibility to earlier) and the database administrator uses the tool to monitor the web front-end’s performance!

Collaborative enterprise infrastructure management
How different administrators view a collaborative infrastructure management solution

In a whitepaper on this subject, we have provided a couple of real-world case studies highlighting how well this approach has worked in reality. Be sure to check this whitepaper here.