Jim McGann is vice president of information management company Index Engines. Connect with him on LinkedIn.
Everyone’s talking about unstructured data lately – the cost, the risk, the massive growth – but little is being done to control it.
Analyst group IDC estimates unstructured data growth at 40 to 60 percent per year, a statistic that is not only startling, but puts a great deal of emphasis on the need to start managing it today or at least have it on the schedule for 2014.
With budgets tightening – often to pay for storage costs – data center managers are struggling to find the highest impact projects that will see an immediate ROI. While there’s no one project that will reclaim all of the unstructured data rotting away in the data center, there are 10 crucial data projects not to leave off the schedule in 2014.
1. Clean up abandoned data and reclaim capacity: When employees leave the organization, their files and email languish on networks and servers. With the owner no longer available to manage and maintain the content it remains abandoned and clogs up corporate servers. Data centers must manage this abandoned data to avoid losing any valuable content and to reclaim capacity.
2. Migrate aged data to cheaper storage tiers: As data ages on the network it can become less valuable. Storing data that has not been accessed in three years or longer is a waste of budget. Migrate this data to less expensive storage platforms. Aged data can represent between 40-60% of current server capacity.
3. Implement accurate charge-backs based on metadata profiles and Active Directory ownership: Chargebacks will allow data center to accurately recoup storage expenses and work with the departments to develop a more meaningful data policy including purging of what they no longer require.
4. Defensively remediate legacy backup tapes and recoup offsite storage expenses: Old backup tapes that have piled up in offsite storage are a big line item on your annual budget. These tapes can be scanned, without the need of the original backup software, and a metadata index of the contents generated. Using the metadata profile, relevant content can be extracted and archived and the tapes can be defensibly remediated, reclaiming offsite storage expenses.
5. Purge redundant and outdated files and free-up storage: Network servers can easily be comprised of 35 – 45% duplicate content. This content builds over time and results in wasted storage capacity. Once duplicates are identified a policy can be implemented to purge what is no longer required such as redundant files that have not been accessed in over three years, or those owned by ex-employees.
6. Audit and remove personal multimedia content (ie. music, video) from user shares: User shares become a repository not only aged and abandoned files, but personal music, photo and video content that have no value to the business and in fact may be a liability. Once this data is classified reports can be generated showing the top 50 owners of this content, total capacity and location. This information can be used to set and enforce quotas and work with the data owners to clean up the content and reclaim capacity.
7. Profile and move data to the cloud: Many data centers have cloud initiatives where aged and less useful business data is migrated to more cost effective hosted storage. Finding the data and on-ramping it to the cloud however is a challenge of you lack understanding of your data: who owns it, when it was last accessed, types of files, etc.
8. Archive sensitive content and support eDiscovery more cost effectively: Legal and compliance requests for user files and email can be disruptive and time consuming. Finding the relevant content and extracting it in a defensible manner is the key challenge. Streamlining access to critical data so you can respond to legal requests quicker, not only lessons their time burden but saves you time and money during location efforts.
9. Audit and secure PII to control risk: Users don’t always abide by corporate data policies. Sharing sensitive information containing client social security and credit card numbers, such as tax forms, credit reports and application, can easily happen. Find this information, audit email and servers, and take the appropriate action to ensure client data is secure. Some content may need to be relocated and moved to an archive, encrypted or even purged from the network. Managing PII ensures compliance with corporate policies and controls liability associated with sensitive data.
10. Manage and control liability hidden in PSTs: Email contains sensitive corporate data including communications of agreements, contracts, private business discussions and more. Many firms have email archives in place to monitor and protect this data, however, users can easily create their own mini-archive or PST of the content that is not managed by corporate. PSTs have caused great pain when involved in litigation as email that was thought to be no longer in existence suddenly appears in a hidden PST.
There are a number of ways companies can approach these projects, but to maximize impact in a smaller time frame, a number of file-level metadata tools, sometimes referred to unstructured data profiling, exist.
Through the file-level information date, owner, location, file type, number of copies and last accessed information can be determined, which will help data center managers classify data and put disposition policies in place.
The benefits of managing unstructured data include reduced risk, capacity and budget. With finances already tight and data growing rapidly, don’t leave these projects off the schedule in 2014.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.