What Is Database Integrity?
Database integrity is the practice of ensuring that data stored in a database are accurate, consistent and free of errors. The slightest spelling error or repetition of information can cause massive problems for a database, so database integrity is taken seriously. The three proponents of integrity are entity integrity, domain integrity and referential integrity. Along with these three, normalization and error checking are applied to further ensure integrity. Most of these processes are automated, but administrators often also check manually to ensure no errors occur in the database.
When an administrator wants to add, change or delete information from the database, the database integrity must first be checked. The inclusion, or changing, of any information can be disastrous, because the data may have far-reaching effects on other tables and sections of the database. To alleviate the chance of this, and to keep the information already in the database accurate, integrity is always checked before any changes can be made.
Entity integrity is the first constraint for database integrity. This checks the table with which the administrator wants to work and looks at the row names. If rows in the same table have the same name, this creates redundant information, which is confusing to the database.
Next comes domain integrity. This checks the type of data consistently added to the table or section and ensures the new data conforms to that data type. For example, if the table is specified to hold only dates and someone tries to enter a word, then the domain integrity process will alert the administrator to a consistency error. This is because the table is only meant to handle dates, so having a word will mess up the regular processing and can cause errors in the future.
The third constraint in database integrity is referential integrity. The tables in a database are rarely alone — other tables often reference them and they reference other tables. If the administrator commands one table to reference another, but the second table is misspelled or does not exist, this causes processing errors. Referential integrity checks to ensure all references are valid.
Aside from these three integrity issues, the rules of normalization and error checking also are applied for true database integrity. Normalization seeks out redundant information and unnecessarily long tables, eliminating the redundancy and turning long tables into several short tables, respectively. Error checking analyzes the database for any other potential errors and ensures all data are valid.
The explanation is quite easily understandable. Please do mention here about the key constraints, e.g., the primary key should not be a null value. If so, there would be an error because the primary key is unique, and also used to identify the tuple/row. Please note that I am a student. If I am wrong, please correct me. Thank you!
@NathanG - I don’t know everything about database software. However, the one thing that I practice is to simplify, simplify. I find that the fewer tables there are in the database, the easier it is to manage and reduce database errors. So that’s what I strive to do.
@MrMoody - I use Access a lot and I’ve found that the best way to enforce some of the data integrity rules is when you build the tables. For example if I’m adding a field to the table, in Access I can add a check at design time that will verify that the field is of the correct format and range.
Let’s say I have a field labeled “Discount.” Let’s assume we are not allowed to give discounts greater than 60%. I can actually put this business rule in the discount field, so that if a number greater than 60 is attempted to be inserted, Access will complain.
That’s one way to make sure you get the right data all the time. You don’t have to do it this way. We have developers who write code that will check the tables, but to me that’s just adding an extra step.
I’ve found that there are some practical steps you can take to reduce or eliminate some of the data integrity issues described in the article.
You can, for example, add an auto number field as a primary key. What this will do is that it will eliminate duplicate rows. Every time a new row is added, the auto number gets incremented by one.
This does not, of course, prevent some of the other information in the other fields from getting duplicated. For example you could have duplicate customer name and street address.
These fields may be technically duplicated, although the auto number is unique. What you can do then is set a primary key on both the customer name and street address, which tells the database that both of these fields have to be unique.
Then if you attempt to add a duplicate row the database will complain and stop you from doing so.
Post your comments