Monitoring as a Discipline and the Systems Administrator

Monitoring as a Discipline and the Systems Administrator

Monitoring as a discipline is a practice designed to help IT professionals escape the short-term, reactive nature of administration often caused by insufficient monitoring, and become more proactive and strategic.

Gerardo Dada is Vice President of Product Marketing for SolarWinds.

Today’s rate of change in the data center is rapidly accelerating. From simply racking and stacking servers decades ago to the recent integration of new technologies like virtualization, hyperconvergence, containers and cloud computing, to name a few, traditional data center systems have undergone considerable evolution.

And with the new reality of hybrid IT, in which an organization’s IT department must manage a set of critical services on-premises that are connected with another set of services in the cloud, the systems administrator’s role has become that much more complex. More importantly, businesses today run primarily on software and applications, and the expectation that these will always work and work well (fast) has never been higher.

Thus, as systems complexity continues to grow alongside the expectation that an organization’s IT department should deliver a quality end-user experience 24/7 (meaning no glitches, outages, application performance problems, etc.), it’s important that IT professionals give monitoring the priority it deserves as a foundational IT process.

Making the Case for Monitoring as a Discipline

Traditionally, monitoring in the data center is somewhat of an afterthought. For most organizations, it’s “a necessary evil:” a resource that the IT department can leverage when there’s a problem that needs solving, and often a job that’s done with just a free tool, either open source or whatever was included by the hardware vendor.

The truth is, an IT department will always be stuck on the reactive (troubleshooting) without better visibility into the health and performance of its systems and a tool that can provide early warnings. By establishing monitoring as a core IT function (a.k.a. monitoring as a discipline), businesses can benefit from a more proactive, early-action IT management style, while also streamlining infrastructure performance, cost and security.

In the face of enterprise technology’s exponential rate of change, monitoring as a discipline is a concept that calls for monitoring to become the defined job of one or more IT administrators in every organization. The most important benefit of such a dedicated role is the ability to turn data points from various monitoring tools and utilities into more actionable insights for the business by looking at all of them from a holistic vantage point, rather than each disparately.

Of course, a dedicated monitoring role may not be feasible for organizations with budget and resource constraints, but the primary goal is to put a much larger emphasis on monitoring in daily IT operations, using a comprehensive (although not necessarily expensive) suite of tools.

Consider the host of data breaches that took place in 2015. Networks, systems and cloud providers alike were infiltrated, millions of individuals’ personal information was leaked or stolen and the monetary consequences totaled hundreds of millions of dollars. Many of these breaches could have been prevented with a holistic and dedicated approach to monitoring that included tracking network traffic, logs, software patches, configuration changes, credentials and which users attempted to access server data.

In addition, more strategic monitoring—meaning tracking only select metrics that provide actionable insights and align with business needs—will help systems administrators fine-tune infrastructure. As much as 50 percent of an IT department’s infrastructure spend can be wasted as a result of inaccurate capacity planning, overprovisioning, zombie resources and resource hogs.

This is a concern especially for systems administrators in hybrid environments, where careful attention should be paid to provisioning and workload allocation to realize maximum cost efficiency. For applications or workloads that may be hosted offsite, poor performance monitoring can also result in an inability to diagnose problems or latency issues.

By leveraging insights from proactive and targeted monitoring, like historical usage and performance metrics, systems administrators can better optimize resources save their organizations money, and address performance issues before the end-user even notices anything is wrong.

Getting Started

Of course, refining or redesigning the way a business approaches monitoring will take time, and not every organization will have the resources to dedicate just one person to monitoring. But there are several ways systems administrators and all other IT professionals can bolster their skillset and integrate the principles of monitoring as a discipline into daily operations to increase efficiency and effectiveness in the data center.

  • Establish metrics that matter to your business. Monitoring can be very tactical. Many IT professionals rely on the data that monitoring tools generate by default, often hundreds of resource metrics of little value and a barrage of alerts. To create a more thoughtful monitoring strategy, IT departments should identify which metrics matter most to the business, such as overall system throughput, efficiency and health of crucial application components and services, and from there assign alerts.
  • Define alerts that are actionable and tied to your usable metrics. Many monitoring tools provide data at a very granular level. When IT professionals get tactical alerts every time a resource metric goes off the acceptable range, it results in most alerts being ignored. Alerts should be sent only when action is required, and the alert should provide the proper context to guide the action. User experience is where good alerts start. For example, an alert should notify an admin when website response time goes down, not when one of the Web server CPUs crosses the 80 percent threshold. This approach will help systems administrators focus on what is important and avoid being bogged down in endless, often irrelevant metrics and alerts.
  • Ensure your organization leverages a monitoring tool that provides full stack visibility. It’s no secret that IT has traditionally functioned in siloes. IT professionals have disparately managed servers, storage and other infrastructure elements for decades. But today’s businesses run on software and applications, which utilize resources from the entire system: storage, server compute, databases, etc., which are all increasingly interdependent. IT professionals need to have visibility into the entire application stack in order to identify the root cause of issues quickly and proactively identify problems that could impact the end user experience and business bottom-lines if not corrected quickly.

System administrators, without the benefit of a comprehensive monitoring tool, are forced to go back and forth with multiple software tools—and in the case of hybrid IT, tools for both physical hardware and cloud-based applications—to troubleshoot issues. The result is often finger pointing and hours of downtime spent looking for the problem rather than fixing it, or better yet, preventing it. Organizations should look for and invest in a tool that consolidates and correlates data to deliver more breadth, depth and visibility across the data center.

  • Embrace performance as a requirement. In today’s business, uptime is not enough. End user’s performance expectations have increased dramatically, thanks largely to speed at which most of today’s websites function. An application that takes seconds to respond is almost as bad as an application that is down. The acceptable page-load time for customer-facing applications is now under two seconds. Furthermore, there is an increasingly obvious link between performance and infrastructure cost, especially in virtualized and cloud environments. As a result, applications need to perform at their best. Understanding what drives performance and what impacts performance over time is another aspect of monitoring that IT departments must embrace.
  • Be proactive. The daily job of some IT teams feels like a game of whack-a-mole, moving from fire drill to fire drill, consuming all of the team’s time and energy. When IT adopts monitoring as a discipline, problems can be caught and solved when the first warning signs show up, preventing fire drills and avoiding business impact. Being proactive also means doing proper capacity planning, security assessments, patching software, compliance reporting, fine tuning and other maintenance tasks that can be automated or simplified with the insights provided by a proper monitoring process. A proactive IT team suffers less downtime and spends more time on strategic initiatives that continuously improve the technology foundation on which the organization runs.

In sum, monitoring as a discipline is a practice designed to help IT professionals escape the short-term, reactive nature of administration, often caused by insufficient monitoring, and become more proactive and strategic. From there, organizations can spend more time building the right monitoring system for their business that will intelligently alert administrators to problems. As the data center continues to integrate new technology and grow in complexity, and especially as hybrid IT increases, IT professionals should establish monitoring as a discipline, adopt best practices to improve systems awareness, tune performance and deliver the highest quality end-user experience possible.

Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish