Is Data Mining the Same as Collection?
Data mining is a different type of task
Data mining is a term used a lot out of context. For many, “data mining” is simply the act of acquiring and organizing data for cleaning, analysis, and presentation. I like to clarify the difference between the acquisition of data and actual data mining, where more advanced methods are applied. These methods are helpful in finding useful data in a large cluster. Patterns observed will help problem formulation and how you think about final presentation.
The definitions of data mining that I like to use are:
- Jiawei Han – Extraction of interesting patterns or knowledge from huge amounts of data. It is potentially useful, previously unknown, and non-trivial.
- Sunita Sarawagi – Process of semi-automatically analyzing large databases to find patterns that are valid, useful, understandable, and novel.
- Vipin Kumar – Exploration and analysis by automatic means of large quantities of data in order to discover meaningful patterns.
Notice the emphasis on patterns, this is where data mining distinguishes itself from just a query or acquisition. The value of a pattern is what will be the basis of machine learning models, database management systems, data warehouses, big data analytics, data science, and business intelligence. Within these concepts there are different views of data mining.
The viewpoint in which data mining is being used is important to the framework of the entire process. Each domain has a different methodology for how data should be collected, mined, and ultimately used.
Here, I will describe a few common domains:
- Database View
A database view of data mining will be focused on process and techniques that connect the data warehouses to the discovery of patterns.
In database systems people build data warehouses to integrate data from various transactional databases
2. Machine Learning
A machine learning aspect is very different than that of a database view. Here, the focus is on taking data and processing it for model training and post processing analysis. From this view, a data scientist would be looking to make some type of statement from new data, based on previous data that was processed.
3. Business Intelligence
The business intelligence view is modeled so that decisions come from the mined data. This is where the initial data is refined and structured before mining occurs. Decisions are made directly from the mining process. The data collection and aggregation will drive the entire model.
Finally, we come to the four dimensions of data mining.
- Data to be mined, the input.
- Knowledge to be discovered, the output.
- Techniques to be utilized, connects the input to the output.
- Applications adopted, where to use our findings.
When we consider these dimensions, we need to prepare for the actual process. This is drastically different from data collection and querying where much of the process is figuring out how the data is pooled into a useable state. During routine data collection you do not need to plan for functionalities that you want to provide through your data mining process. Functionalities to consider when data mining are:
- Lower-level output such as patterns of data, similarity of data, or association of data.
- Decision-driven output such as classification, clustering, trend/deviation, prediction, and outlier analysis.
- Descriptive or predictive data mining.
Once the functionalities are established, what techniques will be used? Data cubing, machine learning, statistics, pattern recognition are just a few in a long list of techniques. The applications for this extend into many territories. Retail, telecommunication come to mind immediately, but the application in other domains can be just as impactful. Consider the use of data mining soil composition for a mining company and creating an early warning system for landslides. The possibilities are endless, with a complex and robust topic like data mining.
2 thoughts on “Is Data Mining the Same as Collection?”
Excellent explanation and write-up! I am really looking forward to watching this develop!
Thanks! feel free to suggest topics of interest or a more in-depth review of anything I covered.
Comments are closed.