Data Mining for Hidden Treasures (Part 1)

Today, if we substitute the word “data” for the word “gold,” the statement still rings true.

Like a gold mine, a data mine contains valuable nuggets that need to be extracted from the dross that surrounds it. And techniques for excavating these treasures are  constantly evolving.

What we now call data collection and database creation was made possible in the 1960s by computers the size of small buildings. During the 1970s and 1980s, database management systems led to hierarchical database systems, and later, to relational database systems.

With the ability to index databases, database technology increased geometrically, and new theories and practices quickly spread around the world. Query languages, user interfaces, pre-fabricated forms and reports, transaction management, data recovery, and online transactional processing (OLTP) all came into play.  And by the time the Internet emerged in the early 1990s, database technology was a booming industry.

Web-based systems thrived, and data and web mining became sophisticated disciplines. Relational technology made efficient storage, retrieval and management of large amounts of data possible. And advanced data models—including  extended-relational, object-oriented, object-relational, and deductive—enabled spatial, temporal, multi-media, active, scientific, knowledge, and office information databases to flourish.

In some ways, technology outpaced practical application, and in many cases “data rich, information poor” companies had no idea what to do with the reams of data they had collected. These massive repositories of dormant data became known as “data tombs.”

Data mining—also known as Knowledge Discovery in Databases (KDD) —is how smart marketers extract meaningful data from these tombs. In order to convert facts into knowledge, analysts look for patterns within the data, then identify and categorize them. Using this information, they create a predictive model that flags people who resemble current customers  in key ways.  This is a simplified explanation of what is actually a very complex process, but you get the gist.

Several scientific organizations, most notably the Data Mining Group (DMG), have pooled resources in an effort to create a uniform method for data mining using the Predictive Model Markup Language, PMML.  IBM, Microsoft, SAP, Oracle, NCR, and most major computer and software companies are members of this group.

Advanced data mining can reveal insights about customers, former customers, prospects, and leads.  When combined with purchasing patterns and behavior, the data can be used to drive sales, reduce churn, and support cross-sell and up-sell initiatives.

There truly is gold in them there hills, if you know where and how to look.

In part two of this article, we’ll explore the seven steps in KDD:

  1. Data Cleaning
  2. Data Integration
  3. Data Selection
  4. Data Transformation
  5. Data Mining
  6. Pattern Evaluation
  7. Knowledge Presentation

See part 2 here


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s