What is the Data Mining Process?
The data mining process is a tool for uncovering statistically significant patterns in a large amount of data. It typically involves five main steps, which include preparation, data exploration, model building, deployment, and review. Each step in the process involves a different set of techniques, but most use some form of statistical analysis.
Before the data mining process can begin, the researchers typically set research objectives. This preparation step usually determines what types of data need to be studied, what data mining techniques should be used, and what form the results will take. This initial step in the process may be crucial to gathering useful information.
The next step in the data mining process is exploration. This step usually involves gathering the required data from an information warehouse or collection entity. Then, mining experts typically prepare the raw data sets for analysis. This step usually consists of gathering, cleaning, organizing, and checking all of the data for errors.
This prepared data usually then enters the third step in the data mining process, model building. To accomplish this, researchers typically take small test samples of data and apply a variety of data mining techniques to them. The modeling step is often used to determine the best method of statistical analysis required to achieve the desired results.
There are four main techniques that can be applied in the data mining process. The first is classification, which arranges data into predefined groups or categories. In the second technique, called clustering, researchers allow the computer to organize the data into groups, as it chooses. A third data mining technique seeks associations between variables. The fourth typically looks for sequential patterns in the data that may be used to predict future trends.
The final step in the data mining process is deployment. To do this, the techniques chosen in the model are applied to the larger data set, and the results are analyzed. The report that comes from this step usually shows the patterns found in the entire process, including any classifications, clusters, associations, or sequential patterns existing within the data set.
Review is often an important final step. This phase in the process usually involves repeating mining models with a new data set to make sure that the main set was representative of the entire population of data. The results cannot predict trends in the larger population if the data sample does not accurately represent it.
Tremendous amounts of our personal data is in the hands of just a few companies. Facebook, Google, Yahoo, Microsoft, Blizzard Entertainment, and a few others.
Bottom line, you're giving up some privacy no matter what you do. But make sure you can live with the terms before you give away your personal information.
Data mining is an emerging field, and it's only going to get bigger in the coming years, because the stakes are so high.
We don't often realize how much power we give companies in return for "free services". If you read the fine print in the terms of service for your user agreement, you might find that they have a lot more right to snoop around in your data than you may want them to have.
There is so much data flowing around on anyone who uses the Internet, and that is gold to marketers. Even if it is "anonymous", and I have my doubts, it still reveals a wealth of information about buying habits of different groups, areas of interest, social relationships, and all kinds of other things you normally would not tell a stranger, and you certainly would not give it to a marketing company for free.
Post your comments