The Honest Marketer

​​​Data preparation tips


Paul's handy hints to prepare your data before analysis & insight

1 First things first  Before you start any data manipulation, always save an unaltered copy – possibly write protected (read only) – keeping it to hand. I also create a second worksheet within my working document, which means that I never alter the initial data but create a duplicate column/worksheet, altering that instead.

2 Fail to prepare, prepare to fail  Make sure you read all documentation accompanying the data to understand any stated limitations and scope, such as data rounding or data suppression (for data protection). It can mean some data tables, e.g. UCAS, Hesa free and open data, include totals which may not tally with the body of data they pertain to, which can be confusing to the untrained eye!

3 What's in a name?  When working with multiple documents and worksheets, naming conventions are critical to avoid confusion when referencing in reports. For example, you may append original document names with “_org”. This can be applied to columns too, i.e. “name_org”. Amended worksheets or columns can keep their original descriptions with additions to explain status or version control “School_Correctedv2” or “Faculty_Final”, etc.

4 Computer says 'no'  Raw data often contains common issues, typically the result of human input and/or multiple sources of origin. These may include: spelling mistakes; multiple spellings of the same reference; variations of the same name/description; inconsistent capitalisation; and extra whitespace. Always scan the data for 'typical' problems before you start any manipulation.

5 Choose your tools wisely  When it comes to organising your data, taking your copy and converting it to an Excel Table  makes sorting and filtering much easier. It enables banded rows, calculated columns and a totals row, rather than plain rows and columns.

6 Correcting data  Sorting and filtering will automatically identify some of the issues above. You can also correct these simply and efficiently using Excel functions to clean data and to manage things like extra spaces, text case, duplicates and formatting.

7 A helping hand  You can create and use 'helper' columns to store intermediate results, making an existing set of results or formula readily accessible if you have a dataset that requires repeat elements. If functions get complex, it may be useful to do any transformations in stages across more than one column - showing your 'workings out'. It’s always possible to 'hide' columns if you find you have multiple stages or want to simply remove the original from view (right click on worksheet column and select 'hide').