Now, all the business is data-driven. More and more tradition business move to technology management.
Data quality pertains to issues such as:
The quality of data is often evaluated to determine usability and to establish the processes necessary for improving data quality. Data quality may be measured objectively or subjectively. Data quality is a state of completeness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific use.
Cleansing can be an elaborate process depending on the method chosen and has to be planned carefully to achieve the objective of elimination of dirty data.
Our methods include:
- Automated data cleansing
- Manual data cleansing
- The combined cleansing process
Error Identification Process (Data Audit)
The first and foremost step is to identify and categorise the various errors . This is also called the data audit process. A data audit would reveal the volume of these error types. A data audit process will provide:
- The error types that need cleansing. These are called as critical errors types.
- The error types that can safely be ignored as they are not business critical. These can be classified as non-critical error types.
- Data volume of each of the critical error types.
Automated data cleansing
According to the different data property, we apply individual data cleansing tools. For example:
- check and correct the email address based on the RFC
- check and correct the address based on the AU Post spec
- check and correct the name based on the customary usage
- check and correct the date, currency based on the country
Manual data cleansing
The need for manual data cleansing arises from the fact that not all the errors can be automatically cleansed. There exist certain error types wherein neither a logical conclusion can be drawn nor rules can be formulated about the value that a particular field will take. The only way to cleanse data is to do it manually. A generalised process can be formulated for such a data cleansing procedure in the same lines as we did for the automated process.
The combined cleansing process
Largely, the dirty data present are in both categories:
- Data error types that can be automatically cleansed without manual intervention and
- Data error types that require manual intervention to be cleansed.
This nature of data dictates the use of a combined process. The data errors are categorized into those which can be resolved by the automatic process and those which require manual corrections. The result of this is the employment of a combined data cleansing process.