Jim McGann is vice president of information management company Index Engines. Connect with him on LinkedIn.
Data center migrations and consolidations are a common occurrence, especially as data center growth averages 40-60 percent per year, leaving data center managers with three choices: upgrade their capacity, migrate data to less expensive storage or consolidate environments.
As budgets prohibit most organizations from blindly upgrading their storage capacity, companies are turning to migrations and consolidations to control costs.
But when faced with a migration to a new storage platform or consolidation of multiple environments, movement of outdated, abandoned and aged data is complicating and polluting the process.
This value-less data can easily account for 30 - 50 percent of the total capacity. But now new data profiling technology enables data center managers to eliminate data with no business value before a consolidation or migration occurs.
Data profiling is the metadata analysis of unstructured user files. Providing an efficient and cost- effectively index (via NFS/CIFS/NDMP) of data storage, data profiling works by extracting key metadata. Last modified/accessed dates, owner, location, size, even duplicate content can be located with custom queries.
In addition, integrating data profiling software with Active Directory/LDAP allows reports and analysis to be summarized by groups and departments as well as active versus inactive (ex-employees) users.
This shows data at the file level that is no longer being used or belongs to ex-employees and enables data center managers to manage use by department and locate documents needed for regulatory or legal purposes.
This analysis software differs greatly from existing solutions that analyze access logs and network metadata. Data profiling goes deep within the files, even a full text profile if required, and delivers comprehensive access to file information. When managing files, this is the only solution that provides the level of knowledge, as well as the analytical tools and disposition capability needed to efficiently migrate data.
Filters and Queries
Data profiling provides flexible filters, queries and dynamic summary reports that provide the knowledge corporate data centers need to make appropriate decisions.
For example, according to the Bureau of Labor Statistics, organizations are currently facing a 3.3 percent turnover rate. Take an example of a 5,000 employee organization; this represents 165 ex-employees annually. If these ex-employees were generating a meager 5GB of unstructured content annually this would represent almost 1TB of forgotten data annually.
Considering how the files one person creates are shared within the company and stored on other people’s desktops, mail attachments, archives and backups, this number quickly turns into 10TB of annual useless content cluttering the data center. Over 10 years this will explode to 100TB of abandoned data that will continue to grow annually. Data profiling can locate and remove this data prior to a migration or consolidation.
Analysis is flexible and will allow the user to understand the current state of data as well as how it changes. From finding and managing data that has outlived its business value to finding data that must be preserved in an archive, data profiling delivers the reports and disposition tools needed to get the job done.
With dynamic reports displaying your environment and narrowing down the analysis of the data set, it is then easy to manage the disposition of the content. Using the built-in tools to delete, migrate or archive the data, or exporting a csv text file so you can utilize existing tools, content can be easily managed.
Deletion of content, while a sensitive subject, is performed in a defensible manner to protect the organization from penalties. Once you have refined a subset of content to be purged and have received sign off from legal or compliance, it is one click and the data is deleted. The software creates a log of this activity, including the person, time and specific files, is stored in a database for future reference.
Migration of data can be managed including moving content to a more appropriate platform, preserving it in an existing archive, or pushing it out to a cloud repository. This allows the tiering of data based on value and access requirements and freeing up expensive storage for more important content.
Streamlined Migrations and Consolidations
As the cost-saving and risk-mitigating trend of migrating and consolidating data centers continue, it must be streamlined to be truly effective. Wasting effort and expense in moving data that has outlived its business value is a significant waste of resources.
Typical enterprise servers can easily contain 22 percent abandoned data, 14 percent that has aged and outlived its business value, 24 percent duplicate content, and 6 percent personal files such as vacation photos and music libraries. This could account for over 50 percent of wasted capacity.
Managing a migration or consolidation where you can cut the volume in half would free up tremendous resources and expense. These expenses will continue to saved annually.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.