What is Dirty Data? (with pictures)

Malcolm Tatum

Dirty data is a term used to describe any type of electronic data that is outdated, incomplete, or otherwise not accurate. Data of this type may be created due to errors in data entry, a failure to update the data on a regular basis, or even the entry of the same data more than once. At times, the incorrect data is nothing more than errors in punctuation in the text of electronic documents. In other instances, dirty data may be information that is intentionally misleading, such as attempts to modify accounting records to present a specific image to investors and others.

Businesses sometimes manage the correction of inaccurate data by proofreading the data after it is entered and making necessary updates.
Businesses sometimes manage the correction of inaccurate data by proofreading the data after it is entered and making necessary updates.

For the most part, the accumulation of dirty data in any type of database is unintentional. Individuals who are entering new information into the database may misspell words, leave out punctuation that is important to understanding the intent of text, or fail to follow a specific formatting strategy. With situations of this type, correcting the incorrect information is a relatively simple process that requires nothing more than altering the incorrect text and saving the changes. Businesses sometimes manage this process by proofreading data after it is entered and making the necessary updates.

Errors found in databases may be the result of human error in entering the data.
Errors found in databases may be the result of human error in entering the data.

Dirty data may also occur due to a failure to update existing records when information changes. For example, if salespeople fail to update customer files when personnel changes occur with a given customer, those files are no longer accurate and are considered dirty. As with correcting spelling and punctuation errors, taking the time to remove outdated information and replace it with current data helps to increase the overall usability of the database.

There are situations where the creation of dirty data is intentional. Companies may choose to omit specific information from a database in order to create a specific perception regarding finances, such as highlighting the amount of generated revenue for a given period, but choosing to not enter data that relates to the amount of collected revenue for the same period. In this type of dirty data, the information that is presented is accurate as far as it goes, but is considered incomplete.

With some types of dirty data, the decision may be to not take the time and effort to make corrections. This is common when the incorrect data does not have any impact on the ability of the business to function properly, or presents no potential for causing any great distress. This means that just about any entity that maintains some type of database probably has at least a little dirty data interspersed with other information that is current and accurate.

Malcolm Tatum
Malcolm Tatum

After many years in the teleconferencing industry, Michael decided to embrace his passion for trivia, research, and writing by becoming a full-time freelance writer. Since then, he has contributed articles to a variety of print and online publications, including EasyTechJunkie, and his work has also appeared in poetry collections, devotional anthologies, and several newspapers. Malcolm’s other interests include collecting vinyl records, minor league baseball, and cycling.

You might also Like

Discuss this Article

Post your comments
Login:
Forgot password?
Register: