Insights on Getting the Data Right!
Technology is always changing. As such, opportunities arise for companies to adopt new software solutions and technologies to provide better internal operations as well as to compete more effectively in their respective markets. One of the keys when moving to new and better technologies is a smooth data migration from an existing platform to a new one.
Myriads of firms have experienced frustrations and even failures migrating data from old systems to new. What follows is a short practical outline and set of straightforward recommendations to help make your data migration project a success.
First, it is important to note that a data source is not just a computerized repository of data. In most cases, it is an intelligently designed repository put together and managed by IT people, who are experts at creating and managing a data repository, working closely with their business counterparts who best know how the business intends to use that data. It is those same two groups of people who are the knowledge experts with regard to how and why the data was stored and related the way it was.
Define/Adopt a practical process:
The following sections describe a simple disciplined process that can be expanded, as needed, to address increasingly complex data migration projects.
Recommendation 1- PLAN
The first recommendation is to talk to the IT and business experts, ask plenty of questions, take copious notes, and make sure to play it back to them to validate that you heard what they have said.
Here are some do's regarding engaging the IT & Business SMEs:
Identify/find the recognized business data owners
- Get a demo of the existing system, document how the data is presented to the user.
- See how the users currently consume the data
Interview the Business SMEs – Here are some questions to ask:
- Ask how they know when they are looking at bad data.
- Ask how they validate Key metrics in the current system.
- Ask what calculations are currently performed that can be reproduced on the target system
- What are the problems the business has with the current data? Try to determine if you have clean data. The right information must be in the correct format and in the correct fields.
- What data tables are to be migrated?
- How do the different tables relate to one another?
- Do the DB tables stand on their own or are there reference tables required to fully understand the data?
- How much data history needs to be migrated?
- How are dates managed in the current system?
- What is the core time zone? Does the time zone have an impact on how the user consumes the data?
- Determine if there is a method to identify duplicate/redundant data
- Identify data requiring protection, document business critical information
- Understand if there’s regulated data that needs special security controls for compliance purposes
- Determine which data is no longer needed
Identify/find the recognized IT data owners
- Get a tour of the databases, tables, and fields. Obtain the definitional assets that IT maintains for the data you are to migrate (ER Diagrams, Update schedules, reference datasets, etc.)
Inventory the IT Data SMEs – Here are some questions to ask:
- Is there an ER diagram available, is it current?
- Ask what the volumes of records are in each table to be migrated
- Ask what approach will IT take to provide the source data? (CSV Exports, single or multiple files, history, etc.)
- Explore the need for data history
- Create a data dictionary
- List the tables needed by the new system
- List the data attributes needed by the new system (fields, types, lengths)
- Identify the primary and foreign keys that define the relationships between tables
- Identify the key owners of the data
- Document the approach and process you have decided on to perform the migration
- How did you decide what data to migrate
- How did you decide how the data would be stored
- What mechanisms did you put in place to migrate the data
- Document the source and target table and field definitions
Map the Data from old system to new
Data mapping is the process of matching fields from one database to another. The goal is to move the data to the new location and ensure that it is accurate and usable. Failure in this step likely means that data may become corrupted, lost, or misapplied as it moves to its destination.
Create a mapping workbook
Here are the high-level steps to creating a data map:
- You have already completed the first task having worked with the Business and IT SMEs to identify the data to be moved (tables, fields, formats). Often, data mapping efforts include data owners, detailed business use descriptions, how certain data is used in calculations, etc.
- The next step is to match source fields to destination fields.
- Create source to target definitions. Document the following for both source and target.
- Source Table
- Source Field
- Source field definition
- Source Table Keys
- Source field specific use
- Example data for that field
- Target Table
- Target Field
- Target field definition
- Target Table Keys
- Target field specific use
- Example data for that field (perhaps transformed)
- Lastly, If a data field needs to be adjusted (transformed) then a definition on what needs to be done is also recorded. Perhaps a date field is in the form of dd/mm/yy and needs to be changed to mm/dd/yyyy.
- Create source to target definitions. Document the following for both source and target.
NOTE: A data map is a living document that will likely require updates over time. There may be:
- New data sources identified
- Trusted data sources may change
- New requirements for the new system are identified.
Recommendation 2 - EXECUTE
If a data migration project is small enough, you could adopt a DIY approach creating migration scripts to ferry data from source to target. Unfortunately, these approaches do not scale well and are typically more difficult to debug when things go wrong.
Use a Tool
Professional data migration tools, both on-prem and cloud based, provide a level of reliability, visual work environment, debugging, logging, performance and scalability (cloud based may scale dynamically while on-prem solutions are restricted by the hardware they are run on).
Regardless of the approach taken there are some practical things to do.
A recent data migration project called for CSV table exports from a legacy database to be migrated to cloud-based SharePoint lists. The key to starting with a CSV export is to make sure the export is complete with regards to the number of records and fields within the records. Careful counts of records, fields, and row widths is important to ensure your migration project is on solid footing from the start.
Here are some practical ways to Interrogate/test the CSV export:
- Pull the data into an Excel workbook
- Filter the columns, check the values within the filters
- Check for duplicate records
- Check for missing data, especially for mandatory fields
- Check for odd situations like hieroglyphic characters, data that doesn't belong, data that doesn't make sense, E.G. text strings in a date field, text strings in a numeric field, etc.
Note: Often times your data migration efforts expose problems with the source data that the business and IT users are unaware of.
Create an issue log
- Work the log - Report the discrepancies back to the IT and Business SMEs
Run a test migration
- Run a migration for a subset of the overall data you will eventually migrate.
Recommendation 3 - TEST
Test the source to target migration
- Create a data comparison workbook
For one of our recent projects (Legacy source data migrated to SharePoint lists), we created a set of Excel-based PowerQuery Reports pulling data from the SharePoint lists and compared that data side-by-side with the CSV file exports from the Legacy DB tables
Here are some of the many tests we performed against the source and target datasets:
- Ad-hoc data checks, visually looking for odd data
- Manually checked Business supplied KPIs validating results meaningful to the business
- Performed record counts
- Validated Data Types and formats
- Performed data length tests
- Confirmed date fields (date formats) to determine if transformations were needed
- Detected duplicate data records
- Searched for Forbidden characters
- Checked for dropped records & fields (Record lengths)
- Checked for Invisible Characters (fields with unprintable characters)
- Compared field definitions for extents (Character limits in source and target fields)
- Composed unique keys from combinations of fields and both source and target and compared that the keys matched
- Searched for transposed data - First name, Last name vs. Last name, First name
- Performed aggregate totals tests (which might include financial amounts or other summary data)
- Checked for cases where no data was loaded.
- Checked for corrupted or incomplete extract files
- Missing data for periods of time
- Cases where we couldn't interpret the source data
- Searched for control characters
- Field by field lengths, definitions – attribute comparisons
Continue to use the Issue Log
- Work the log – Report and resolve data anomalies and determine where the fixes/changes are to be made
Recommendation 4 – CONFIRM
Share your data comparison workbook with the IT and Business SMEs
- Walk IT and the business teams through the migration process you followed. Walk through the artifacts (Data Dictionary, Data Mapping Document, Data Comparison Workbook, Issue Log)
- Work with IT to test the small data migration
- Work with the business teams to test the small data migration
Given your success migrating a small subset of data, testing the migration yourself and having that migration validated by the IT and business teams, you are now ready to iterate through the remaining table migrations repeating steps 2, 3, & 4
As you can see, "Recommendation 1 – Plan" is by far the most involved step in your data migration journey. In that step, you follow a structured and disciplined process to learn about the source data and system with an eye toward migrating that data in a quality fashion into the new repository having developed a solid understanding of the data and how it is used.
Recommendation 2 – Execute begins with ensuring that the source data is complete. That all the data intended to be migrated has been identified and made available. This step marries the source data with your data learnings and translates/migrates those learnings into a physical "Target" data repository.
Recommendation 3 – Test applies the proof steps carefully comparing source to target, ensuring the data made the journey intact and more importantly made it to the destination in a way that the business can have full confidence in the result. In this step you prove to yourself that you got it right by creating the "Data Comparison Workbook" comparing source to target employing a variety of tests.
Recommendation 4 – Confirm invites the IT and the Business SMEs to share the data migration assets that you have created so they can confirm what you have already proven to yourself. In this last step, transparency is the approach that best convinces the customer that you have been a good steward of their data -carefully migrating their precious data assets from legacy source to a new more functionally rich platform.
We hope this helps- good luck on your journey! If you have any additional questions or need help with a migration, contact us!