Some Efficiency Techniques for Competent Data Cleansing
In this blog, we will give you some effective techniques so that you can know how to improve data quality and it helps you to build a strong data cleansing strategy which is very simple and easy to understand. These strategies really work.
1. Make a copy
You generally receive an electronic copy of all the collected paper-based data. If you have made an entry error in electronic copy, then you have to check back to the original copy of paper. With the help of data cleansing, you always able to undo any cleansing mistakes you have done. So, we advise you to create a duplicate worksheet for your data.
You need to call the original one ‘Raw Data’ and the “Cleaning in Progress” until the cleaning process finished. After that you can change the sheet tab name “Clean Data”
You need to make sure both worksheets have unique ID columns. This is the first most important step in data cleansing.
2. Clean your data in a separate worksheet
Always use ‘Find and Replace’ to clean an individual column of data sheet. ‘Find and Replace’ work only on selected column. It will be better if you copy that column into a spare worksheet and work on it there. When you have finished cleaning, you can copy back and replace the unclean column with clean column.
3. Report errors back to the original source
In the case of shared worksheet, always make sure that you report back to the original source if you found any error. By this way, when you get some more data from the same source you need to do less cleaning.
4. Use excel functions to do the hard work
It would be better if you do not clean data manually as most of the wrong entries come from manual entry. By using same method, you can get same problem.
In excel you can clean data efficiently as it has many functions for data cleaning. So, try to use them.
For example, if you need to remove duplicates or perform data de-duplication then you can use Excel’s “Remove Duplicates” function.
Here are some simple steps to use Remove duplicates function in Excel:
- First copy the selected text in spare worksheet twice.
- In the data tools groups, click Remove Duplicates
- You’ll get a warning where you need to choose 1 option out of 2 and you can go with “Continue with the current selection”
- You’ll then be asked for column or data headings, you can click on OK after checking inside tick-box
- Excel will now collapse the column into all its unique elements
You now have a list of all the unique elements in your data. It might be useful at this stage to see how many of each of the unique elements are present.
To do this we need to do a tiny little bit of programming. Don’t panic! It’s really quite simple:
1. If your variable is in column E and your list of unique elements is in column F, then in the cell to the right of your first unique element (cell G2), enter the following code:
2. The number of times that that unique element appears in your list will be counted by Excel and will appear in the cell
You’ll want Excel to do this for all your unique elements, and there’s a great short-cut way for this:
1. Select the same cell again (G2) and position your mouse to the lower right-hand corner of the cell until the cursor changes to a plus sign
2. Double-click the plus sign
You’re done! In column E you should have the original data, in column F a list of all the unique elements in your data and in column G will be the number of times each element appears in your data.
You can now use ‘Find and Replace’ to correct misspelled entries, including correcting entries with the wrong case, like ‘case’, ‘Case’ or ‘CASE’. Do this with all the misspelled entries in your unique elements list until all errors have a count of zero.
So how long did that take? A couple of minutes?
How long would it have taken you to go through each cell by eye to find all the errors and correct them individually? Hours?
Learn all these little tips and tricks in Excel and your data cleaning processes might not feel like sticking pins in your eyes!
5. And use excel formulae to do the even harder work
Excel has a few different formulae that can be used to detect and trim spaces and other unwanted characters, like:
The above tips will surely help you in cleansing your data accurately and quickly. Try it!
Add Your Prospective
Thanks for dropping by! We would love to get your input on this topic. However, please bear in mind that all comments on this blog are moderated. So, please do not use profane language or post meaningless remarks use a spammy keyword or a domain as your name, etc. We're looking forward to hear your viewpoint, ideas and thoughts.
Subscribe To Our Blog
Sign up to get updates from our blog delivered directly to your inbox.
Data Outsourcing India Blog
The DOI Blog consists of interesting stories and informed opinions about data, e-publishing, website design, web development, internet marketing, e-commerce, photo editing and other services. Also, we love sharing tips and ideas to improve our clients' business efficiency.