Good SLAs make good neighbors.
In selecting the right data center colocation, important considerations are the visibility and management tools available for the customer. Monitoring power consumption, having a DRready environment and truly partnering with the provider are vital consideration points in making the proper data center choice. This is the forth article in a series on Colocation Selection and Best Practices. Below is an abridged version of Section 4 of the Data Center Knowledge Guide to Colocation Selection.
During the planning phases, things like contracts, expectations and management tools must be laid out to ensure that everyone is on the same page. When working with a colocation provider, there will be important planning points and ongoing considerations around a good data center rollout.
Working with a Service Level Agreement
When selecting the right colocation provider, creating or having a good SLA and establishing clear lines of demarcation are crucial. Many times, an SLA can be developed based on the needs of the organization and what is being hosted within the data center infrastructure. This means identifying key workloads, applications, servers and more. From there, an organization can develop base service agreements for uptime, issue resolution, response time and more. Creating a good SLA document can take some time — but it’s important to do so carefully since this can govern the performance of your environment. Some very high uptime environments will build in credits into their LA. In these situations, for example, a colocation provider could issue credits if power is unavailable. Creating an SLA is a partnership between the data center provider and the customer. Expectations must be clearly laid out to ensure that all performance, recovery and other expectations are met. Surprises or encountering unknowns in a production, highly utilized environment can result in loss of productivity, time and dollars.
Maintenance and Testing
Don’t forget, when you buy data center colocation you are buying a slice of critical infrastructure and ongoing maintenance. Without a robust maintenance program, technology will fail. Look for documented MOPs (method of procedure) and SOPs (standard operating procedure) that are used consistently and improved over time. Make sure your SLA does not exclude maintenance windows or emergency maintenance. Your colo provider should be able to show you their monthly, quarterly, and annual maintenance schedules for all critical elements of the mechanical and electrical systems including chillers, air handlers, generators, batteries, and UPSs. You should be able to observe and even participate in maintenance exercises. How are you notified about maintenance windows and procedures? Finally, ask the ultimate question, “Do you plan for and test a full utility outage?”
Systems need to be designed with sufficient redundancy to allow for proper maintenance. Colocation providers are reluctant to maintain systems if it could potentially cause an outage. The industry best practice is to be able to “fix one and break one, concurrent with a utility outage.”
Having a DR-ready Contract
For some organizations, moving to a colocation data center is the result of a disaster recovery plan. In these situations, it may very well be possible to integrate a DR contract into an SLA or as a standalone agreement. In this document, the organization and colocation provider establish which internal systems must be kept live and create a strategy to keep those systems running. When designing a contract around a disaster recovery initiative, consider some of the following points:
- Use your BIA. As mentioned earlier, a business impact analysis will outline the key components within a corporate environment which must be kept live or recovered quickly.
- Communicate clearly. Good communication between the colocation partner and the organization is vital in any DR plan. A situation in which an unknown system or component (that was deemed as critical but not communicated) goes down will become a serious problem.
- On-site and off-site supplies. If a disaster occurs, you need both on-site and off-site sources of key supplies. Are there onsite supplies of diesel fuel for generators and water for cooling systems? Are there established services in place for delivery of water and diesel fuel should onsite supplies be depleted? Does the colo provider conduct disaster recovery scenarios with key suppliers?
Using Management Tools
One of the most important management aspects within any environment is the ability to have clear visibility. This means using not only native tools, but ones provided by the data center partner. Working with management and monitoring tools for the workload is very important. Also important is to have a good view into the physical infrastructure of the data center environment. Data and reporting from these monitoring tools should be
made available through a secured portal.
- Power monitoring. Always monitor the power consumption rates of your environment. The idea here is not only to know how much power is being used, but to make the environment more efficient.
- Cooling monitoring. Much like power, keeping an eye on cooling is important as well. This can be outlined as part of an SLA or an organization can manually monitor cooling as well.
- Rack conditions and environmentals. Keeping track of the environment variables will helpcreate a more efficient rack design. Some serverswill generate more heat while others may need more power. By seeing what system is taking up which resources, administrators can better position their environment for optimal use.
- Uptime and status reports. Regularly check individual system uptime reports and keep an eye on the status of various systems.
- Logs. A log monitoring platform is always very important. One recommendation is to have a log aggregation tool which collects various server, system and security logs for analysis.
Issue Resolution and Communication
A large part of having an effective environment will be the issue resolution practices and partner-to-customer communication. Although a lot of this can be outlined in the SLA; specific issue resolution matters need to be discussed.
When designing an issue resolution conversation, it’s important to identify core data center components and then communicate that to the colocation provider. For example, a hard drive within a particular system may take priority over another issue should a simultaneous event occur. In this scenario, the SLA and the BIA are used to create a clear plan for resolving issues quickly and in the right order. There will be instances when a specific problem takes precedence over others due to the nature of the occurrence. Without good communication, the colocation provider may not know which problem to fix first and assign a resource to a lesser importantticket. Share your BIA findings and clearly communicate which data center components need resolution first.
The process of selecting the right colocation provider should include planning around contract creation and ensuring that the right management tools are in place. A colocation data center is an important extension of any organizations and therefore must be properly managed. Good data center providers will oftentimes offer tools for direct visibility into an infrastructure. This means engineers will have a clear understanding of how their racks are being cooled, powered and monitored. These types of decisions make an infrastructure more efficient and much easier to manage in the long-term.
To get more details on how to structure your colocation SLA and other colocation selection best practices download the complete Data Center Knowledge Guide to Colocation Selection.