PART II: DATA MINING FOR HIDDEN TREASURES—7 STEPS OF KNOWLEDGE DISCOVERY IN DATABASES

By Scott Levine, VP, Strategy—July 28, 2011

Every database marketing program begins with a rhetorical question that the marketer already knows the answer to: How good is the data?

The answer is usually, “Not good,” because many companies overlook the essential first step of Knowledge Discovery in Databases (KDD):

Step 1 Data Cleansing

Also known as data hygiene—this process perpetually cleans and updates the data as part of the sales and billing process. Companies that overlook data cleansing, give it a low priority or sweep it under the rug soon find themselves with dirty data on their hands. But organizations that keep their data squeaky clean have the best chance of mining their data successfully because they can check off Step 1, and head right to:

Step 2 Data Integration

Sometimes, it’s desirable to combine more than one set of data—such as customers and prospects or leads that are in various stages of the demand waterfall. You may also want to aggregate prospects from more than one source, including both purchased and rented lists. Although there are several steps involved in data integration, the most important is de-duplicating the records. This can eliminate a tremendous amount of waste. But you must establish rules that define which source is preferred when duplicates are found.

Step 3—Data Selection

The data selection team needs to determine thresholds, limitations and other selection criteria. For example, if firmographic attributes are the most important criteria, then only the data models that meet the minimum threshold for annual income or revenue would be selected. If psychographic data matter more, then records might be selected for specific interests such as camping, concerts or social causes.

Step 4 Data Transformation

Once the best data has been selected, it must be transformed into a uniform set and optimized for use in a marketing program or campaign. All the fields must be consolidated, merged and purged so that they will be easy to index and use for data mining. If you’re using personalization in your campaign—and you should—this step is essential to ensure accuracy.

Step 5 Data Mining

This process is exacting, but in a nutshell, it involves searching the various fields of the database for specific attributes. These are then used to identify trends that can be matched against the predictive models that represent the marketer’s ideal prospects. The process is complete when the mined data resembles the data models. The Predictive Model Mark-up Language (PMML) developed by the Data Mining Group enables uniform data mining processes and techniques across vendors.

Step 6 Pattern Evaluation

The patterns that emerge during the data mining process must be evaluated to determine which are relevant to the model and which aren’t. If one of the new patterns contradicts the original persona, revisiting the model is a good idea. If the two are consistent, the model is validated. Pattern evaluation can lead to the discovery of trends that might not have been apparent to the team that created the original model. And using the knowledge that is revealed can have a very positive effect on the entire program.

Step 7 Knowledge Presentation

The proof is in the pudding. Once the final data are selected, a report that explains why the chosen data are the best for the program is delivered. Everything that was learned during the data mining process—including trends, patterns, and anomalies—is included in the knowledge presentation to the user. The key is to present the findings in a clear, easy-to-digest format.

While this has been a brief and simplified description of data mining, the entire process—which involves a number of different algorithms—is actually quite complex. Classification algorithms that predict one or more discrete variables, regression algorithms that predict one or more continuous variables, segmentation, association, and sequence algorithms are all used. When practiced correctly, database marketing, data mining, and predictive modeling can all yield maximum ROI.

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s