What are the Different Types of Data Mining Analysis?
Data mining analysis can be a useful process that provides different results depending on the specific algorithm used for data evaluation. Common types of data mining analysis include exploratory data analysis (EDA), descriptive modeling, predictive modeling and discovering patterns and rules. Utilization of each of these data mining tools provides a different perspective on collected information. Professionals using these techniques can acquire additional insight into an issue or problem of concern based on the specific analysis tool used.
Because of the different outcomes that data mining analysis tools provide when employed, it is pertinent to consider a basic review of each. Exploratory data analysis, or EDA, involves the review of a dataset without any clear outcome goals for examination. Variables that define the data are used as a foundation for providing visual representations to the researcher. As the number of variables increases, this analysis tool may become less effective for visualizing data.
Descriptive modeling is a data mining analysis tool used to collectively describe all of the data in a given dataset. Specifically, this approach synthesizes all of the data to provide information regarding trends, segments and clusters that are present in the information searched. Descriptive data mining analysis is commonly used in advertising. One example of this is market segmentation in which marketers take larger customer groups and segment them by homogeneous characteristics.
Other tools also include predictive modeling. Predictive modeling involves the development of a model based on existing data. The model is then used as a basis for the prediction of another variable that is relevant to the data reviewed. The term "predictive" indicates that this data mining tool can enable the user to predict some value based on what is known in the dataset. Predictive analysis may be used by marketers to determine what products customers are seeking. Based on current purchasing trends, marketers may be able to make predictions about which new products may be popular in the future.
Discovering patterns and rules differs from descriptive and predictive data mining tools. While descriptive and predictive tools employ model building as a foundation for analysis, discovering patterns and rules focuses on identification of patterns in the data. Marketers working for grocery stores, for example, often use this data mining analysis tool as a means to determine purchase patterns. By determining what products customers consistently purchase in the same order, targeted promotions for the items can be developed.
The article mentions places like grocery stores using data mining to determine shopping trends. Who gets hired to do these jobs, and what kind of training do they have?
Do department and grocery stores really hire statisticians to analyze the ways people shop, and how they can get people to buy more products? Are there special programs that have been invented that an take all of the data and output the best arrangements for stores?
@kentuckycat - That is a good point that data mining undermines the use of statistics, but in some cases I think it could be justified.
Taking exploratory data analysis as an example, if you had a huge medical study where hundreds of variables were collected on several patients, it may not be immediately clear what the best predictors of success or failure of a drug are. By exploring the data, you can limit the data to a handful of variables and then do the proper statistics.
Using the same example for descriptive modeling: in a random sample of people, it may not be clear until you do some data mining how the groups should be separated.
Unless there was some justification for the procedures, statisticians would not have spent years developing the ideas.
Like the article mentions, data mining can be useful for some fields like marketing or advertising, but from a scientific standpoint, data mining is seen as something that should be avoided at all costs.
Just like the term suggests, data mining means trying to dig into statistics and get positive results. In reality, a good researcher will know before collecting any data what exactly needs to be collected and what statistical procedures will be used to analyze the data. Whether the results are good or bad, they must be accepted.
Simply taking previously collected data and trying a multitude of tests until a positive result is achieved undermines the purpose of statistics.
I work in a research laboratory, and I have seen a few reputations and even careers ruined by scientists performing data mining.
Post your comments