We certainly do not suffer from a lack of data these days. But all the tools in the world will not get that data talking until it is in a format you can use. Excel is the most common place to fix data because it has tools to help scrub up your data files. Here are three common data problems and ways to fix them in Excel.
1. Extracting data items from a text string
Say you have pulled all the data you need from your data extract, but the piece you really need is included in a larger text string. The functions LEFT, RIGHT and MID are perfect problem solvers here. Let’s also say that in addition to this combined customer account number, you also need the routing number (first nine digits), account type (next four digits) and account identifier (last seven digits). Here’s a possible solution using these functions.
Cell A1 contents 07100159912140101999
=LEFT(A1,9) = 071001599 On the left of A1, return nine characters.
=MID(A1,10,4) = 1214 In the middle of A1, starting with and including the 10th character, return four characters.
=RIGHT(A1,7) = 0101999 On the right of A1, return seven characters.
2. Different data items in one cell
When multiple data items appear in the same cell, such as last name, first name and phone number as in our example, you can use the Text to Columns tool to separate them. Unlike the previous example where you need to keep the original value whole and create three new fields, Text to Columns assumes that you do not need to keep the original combined field whole. You’ll find Text to Columns on the Data tab in the Data Tools group.
The data have to be separated either by a unique character (Delimited) or in the same character position (Fixed width).
An example of fixed width might be a VIN (Vehicle Identification Number). The first character tells you in what region your car was built; position 4-8 tells you the vehicle model, engine style and body style. This tool creates a new column for each new data. This can be a “gotcha.” If you have more data to the right of the one you are separating, make sure you create a new blank column to the right for each new piece of data.
3. Duplicate records
The Remove Duplicates tool, found on the Data tab, Data Tools group, is the easiest way to get rid of duplicate data. If each column in one record has to match each column in another record to qualify as a duplicate, it’s just two clicks away. Click the button and confirm by clicking OK. If your duplicate is based on only a few columns, then use the check boxes to indicate which columns you want Excel to examine.
Check out two more data fixes online, “Two Data Problems, Two Solutions,” at the “Office Tech Pro” blog.