How Data Mining can Help Advertising Management

 

 

 

 

 

 

 

 

 

by

Seokmin Hong

 

Doctoral Student

smhong@mail.utexas.edu

 

Department of Advertising

College of Communication

The University of Texas at Austin

Austin, Texas 78712

 

And

 

John D. Leckenby

 

Everett D. Collier Centennial Chair

in Communication

john.leckenby@mail.utexas.edu

 

Department of Advertising

College of Communication

The University of Texas at Austin

Austin, Texas 78712

 

 

 

 

Working Paper

 

(Do not cite without permission of the authors)

 

September 22, 2000

 

 

 

 

 

 

 

 

 

 

 

 

How Data Mining can Help Advertising Management

 

 

 

Abstract

 

Data Mining is a relatively new set of analytical techniques designed to provide management information for decision-making purposes.  The distinguishing characteristic of data mining is the application of techniques and rules designed to extract information from very large databases.  These databases are often developed in the course of ordinary business both online and offline.  This paper addresses the question of the relevance of data mining techniques to the field of advertising management.  First, the definition of data mining is explored followed by an exposition of the core concepts in data mining.  Second, an example of data mining developed by the first author is explained.  Finally, the applicability of data mining to advertising management problems is addressed.

 

 

 

 

 

 

 

 

 

How Data Mining can Help Advertising Management

 

 

 

 Introduction

In the current advertising and marketing business and in the press, data mining has been and continues to be the focus of a great deal of attention (Small 1997;Elder & Ridgeway 1999; Abbott, Matkovsky & Elder 1998; Condon 1997; Foley, John & Bruce Caldwell 1996; Miller 1997; Mitchell 1997; Novack 1996; O’Harrow 1998; Stone 1997). But care must be taken to separate fact from myth. Data mining is a useful tool, a new approach that combines discovery with analysis. Data mining is not a newly discovered branch of mathematics embodied in software that will, when hooked-up to a large database, inexplicably and inevitably reveal the business insights contained in the millions of records stored therein. Yet it is a tool that will increasingly become important for competitive businesses (IBM 1998, URL: http://www.ibm.com).

 The past two decades has seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation of data has taken place at a dramatic rate. It has been estimated that the amount of information in the world doubles every 20 months and the size and number of databases are increasing at an even faster rate. The increase in use of electronic data gathering devices such as point-of-sale or remote sensing devices has contributed to this explosion of available data (Dilly 1995).

Data mining is one way to extract predictive information from these large databases. The focus of this paper is to show how data mining may become a major tool for the advertising industry. This discussion will address the following areas: what data mining is, what technologies data mining uses, what ways data mining can be used, some examples of successful data mining applications in businesses, what is holding back data mining in advertising, and, finally, ways to develop a successful data mining project.

 

Definitions of Data-mining

 

            Data mining is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time-consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations (White Paper 1997).

            The expression "Data Mining" sometimes refers to the whole process of knowledge discovery and sometimes to the methods that are used in the knowledge discovery phase. The SAS institute has adopted a definition that is more orientated towards methods. According to the SAS institute (URL: http://www.sas.com), data mining consists of

Methods used to explore and model relationships in large amounts of data.

Here are some more useful definitions for data mining:

Data mining is the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques.  (Gartner Group URL: http://www.gartner.com).

 

Data mining is the exploration and analysis, by automatic and semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules  (M.J.A. Berry, G. Linoff 1997).


Data mining the use of advanced statistical tools to reach into a company’s existing databases to discover patterns and relationships that can be exploited in a business context. (Trajecta lexicon URL: http://www.trajecta.com/neural/lexicon.htm).


Data mining is a combination of powerful methods that help reducing costs and risks as well as increasing revenues, by extracting information from the available data. (T. Fahmy URL: http://www.statserv.com/contributers.htm).

 

Data mining is the non trivial extraction of implicit, previously unknown, and potentially useful information from data. (William J Frawley, Gregory Piatetsky-Shapiro and Christopher J Matheus 1991)

 

Why Do We Need Data Mining?

There are a number of factors that have combined to bring data mining to the attention of the business community. First, there is a general recognition that there is untapped value in large databases. Second, there is a consolidation of database records tending toward a single customer view. Third, there is huge storage of databases, including the concept of an information warehouse. Fourth, there is a reduction in the cost of data storage and processing providing for the ability to collect and accumulate detailed data. Fifth, there is intense competition for a customer's attention in an increasingly saturated marketplace. Finally, there is movement towards the de-massification of business practices (IBM 1998, URL: http://www.ibm.com).

De-massification is a term from Alvin Toffler (1970). During the industrial revolution economies of scale led businesses to mass manufacturing, mass marketing and mass advertising. The information revolution is providing the capability to custom manufacture, to market and advertise to small segments and ultimately to the individual. This is de-massification and it is a strong force in business today (IBM 1998, URL: http://www.ibm.com).

Data mining is evolving as an essential segment in the decision-support marketplace. Like data warehousing, data mining is experiencing rapid change and growth as major corporations around the world begin to implement new and enhanced systems for leveraging corporate data. The META Group, a Stamford, Connecticut research firm, reports the market for data mining software and services to be $3.2 billion in 1996. The firm anticipates growth to $10 billion in 2000 (About Data Mining,1997).

A recent Gartner Group Advanced Technology Research Note listed data mining and artificial intelligence at the top of the five key technology areas that "will clearly have a major impact across a wide range of industries within the next three to five years." Gartner also listed data mining as one of the top 10 new technologies in which companies will invest during the next five years (White Paper 1999).  

If data mining is understood and applied, it can generate new revenue, increased profits, and have significant cost savings. However, those "ifs" speak volumes about a set of technologies that is still in its infancy. (Gerber 1998).

"Data mining is an immature, unstable market today. We have to perfect the technology and the understanding of it by business analysts. However, it is necessary for survival. If you don't use it to predict a trend before your competitors, you're dead," says Erick Brethenoux, Paris-based research director for advanced technologies at the Gartner Group (Gerber 1998).

“Indeed, there’s been a frenzy of interest in datamining, which has captured corporate America’s imagination—and budgets—with one simple value proposition: Get to know your customers better, and you’ll keep them longer. That can translate into bigger profits over longer periods” (McCarthy 1998).

 

IT executives will likely triple the size of their existing data stores in the next two years, predicts a recent study by the San Francisco-based Sentry Technology Group. Moreover, to make the most of all those terabytes of data, they will probably need to invest heavily in advanced OLAP (Online Analytical Processing) products and algorithm-intensive datamining technologies. So, before they gear up to embark on this latest technology adventure, they had better prepare themselves for the fact that datamining projects can consume millions of dollars and years of effort. But the alternative could be just as costly: then upper management, may wonder why the competition always seems so much better at spotting sales trends, developing smarter marketing campaigns, and accurately predicting customer loyalty (McCarthy 1998).

Data mining is emerging as the major solution to solve a series of critical, fast growing business issues. In particular, with the effects of globalization and deregulation, major corporations are facing new pressure from increased competition on a worldwide basis. Effectively responding to this challenge requires faster marketing cycles to produce a greater number of campaigns to attract and retain the best customers. As a result, tools are required in the hands of marketers to help them stay on top of corporate data that provides the foundation of any good marketing program. These tools must go beyond simply reporting data records or historical sales performance - they must ultimately provide the business professional with deep insight into customer actions. This insight is best achieved with a combination of hindsight as well as foresight (About Data Mining 1997).

Data mining offers a rich capability for modeling historical performance and then using this model to predict likely future outcomes. This combination of hindsight and foresight is unique to data mining and provides business professionals with a new understanding of factors, which truly contribute to business success or failure (About Data Mining, 1997).

Data Mining Technologies

The primary technologies which comprise data mining are neural networks, decision trees, rule induction, genetic algorithms, and nearest neighbor.

 

Neural Networks

A neural network is patterned after the neurons in the human brain. It's made up of a web of electronic neurons that send signals to each other through thousands of connections, which are adjusted up or down as the machine learns particular applications. A neural network can be taught voice recognition, for example, simply by having a trainer speak words into the machine, reinforcing some connections and not others (Darling 1998).

Neural networks are an approach to computing that involves developing mathematical structures with the ability to learn. The methods are the result of academic investigations to model nervous system learning. Neural networks have the remarkable ability to derive meaning from complicated or imprecise data and can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an "expert" in the category of information it has been given to analyze. This expert can then be used to provide projections given new situations of interest and answer "what if" questions (Dilly 1995).

Neural networks have broad applicability to real world business problems and have already been successfully applied in many industries. Since neural networks are best at identifying patterns or trends in data, they are well suited for prediction or forecasting needs including: sales forecasting, industrial process control, customer research, data validation, risk management, and target marketing etc. (Dilly 1995).

Neural networks use a set of processing elements (or nodes) analogous to neurons in the brain. These processing elements are interconnected in a network that can then identify patterns in data once it is exposed to the data, i.e. the network learns from experience just as people do. This distinguishes neural networks from traditional computing programs that simply follow instructions in a fixed sequential order. The picture to the right is a simple neural network (Dilly 1995).

 

Decision Trees

A tree-shaped structure that represents a set of decisions that can be used to predict which records in a new (unclassified) dataset will have a given outcome. Decision trees are simple knowledge representation and they classify examples to a finite number of classes, the nodes are labeled with attribute names, the edges are labeled with possible values for this attribute and the leaves labeled with different classes. Objects are classified by following a path down the tree, by taking the edges, corresponding to the values of the attributes in an object (Dilly 1995).

The following is an example of objects that describe the weather at a given time. The objects contain information on the outlook, humidity etc. Some objects are positive examples denote by P and others are negative i.e. N. Classification is in this case the construction of a tree structure, illustrated in the following diagram below, which can be used to classify all the objects correctly (Dilly 1995).

There are two classifications of decision trees: they are Classification and Regression Tree (CART), and Chi Square Automation Interactive Detection (CHAID).

Classification and Regression Tree (CART)

A decision tree technique can be used for classification of a dataset. Provides a set of rules that you can apply to a new (unclassified) dataset to predict which records will have a given outcome. It segments a dataset by creating 2-way splits and it requires less data preparation than CHAID (White Paper 1997).

The CART method derives its name from the tree-like structure that it builds as a model. These trees work much like the child's game of twenty questions, where one child asks questions to determine the kind of object another child has in mind. In this case, the tree asks questions about card members to determine which are likelier to attrite than others. Each question in the tree represents a choice between values of a particular variable that break a large group of card members into two smaller groups (Big Data – Better Returns Leveraging Your Hidden Data Assets to Improve, 27 Apr 1998).

For instance, there might be a choice between those members with more than $1,000 of monthly balance and those with less than that amount. Those with less than a thousand dollars could be transactors while those with more could be revolvers. As with the child's game, each question's answer provides a bit of information about card members and leads to other, more appropriate questions. For revolvers, the next-best question to determine whether someone will attrite might be whether their monthly balance has been decreasing, but for transactors, those who always pay off their monthly balance, the next-best question might be whether the monthly volume of transactions has been decreasing. Eventually, enough questions have been asked of the card members to place each in a group of people who will reliably do much the same as one another: stay or attrite (Big Data – Better Returns Leveraging Your Hidden Data Assets to Improve, 27 Apr 1998).

             Because CART uses tests on the database to find its questions, it can be robust in the face of outliers, by isolating them with a single question. And because it searches for the best variable to use in the groups created by answers to questions based on other variables, it automatically finds combinations of variables (Big Data – Better Returns Leveraging Your Hidden Data Assets to Improve, 27 Apr 1998).

 

Chi Square Automation Interactive Detection (CHAID)

A decision tree technique used for classification of a dataset. Provides a set of rules that you can apply to a new (unclassified) dataset to predict which records will have a given outcome. It segments a dataset by using chi square tests to create multi-way splits. Preceded, and requires more data preparation than, CART (White Paper 1997).

 

Rule Induction

The extraction from data of useful if-then rules based on statistical significance. A data mine system has to infer a model from the database. That is, it may define classes such that the database contains one or more attributes that denote the class of a tuple i.e. the predicted attributes while the remaining attributes are the predicting attributes. Class can then be defined by condition on the attributes. When the classes are defined, the system should be able to infer the rules that govern classification, in other words the system should find the description of each class (Dilly 1995).

Production rules have been widely used to represent knowledge in expert systems and they have the advantage of being easily interpreted by human experts because of their modularity i.e. a single rule can be understood in isolation and doesn't need reference to other rules (Dilly 1995).

 

Genetic Algorithms

Genetic algorithms are a relatively new software paradigm inspired by Darwin's theory of evolution. A population of individuals, each representing a possible solution to a problem, is initially created at random. Then pairs of individuals combine to produce offspring for the next generation. A mutation process is also used to randomly modify the genetic structure of some members of each new generation (Genetic Algorithms 1996).

Genetic algorithms also generate rules from data sets, but do not follow the exploration-oriented protocol of rule induction. Instead, they rely on the idea of "mutation" to make changes in patterns until a suitable form of pattern emerges via selective breeding. The genetic crossover operation is in fact very similar to the operation breeders use when they crossbreed plants and/or animals. The exchange of genetic material by chromosomes is also based on the same method (A Characterization of Data Mining Technologies and Processes, 2 May 1998). Genetic algorithms are not just for rule generation and may be applied to a variety of other tasks to which rules do not immediately apply, such as the discovery of patterns in text, planning and control, and system optimization, etc. (Holland 1995).

 

Nearest Neighbor

Nearest neighbor is a technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k is greater than or equal to 1). Nearest Neighbor is sometimes called the k-nearest neighbor technique (White Pape 1997).

 

Implications of Data Mining for Advertising Management

This discussion will focus on two areas of advertising management:  Direct Response Advertising and Interactive Advertising and the related area of Customer Service.

Direct Response Advertising

The direct advertising and marketing industry is focused on choosing from millions of potential clients or prospects the people who are most likely to respond to an offer and then targeting a specific mailing or telesales campaign to them. By better targeting the candidates, not only does the response rate increase, but also the cost for each positive response goes down. A credit card company can leverage its vast warehouse of customer transaction data to identify customers most likely to be interested in a new credit product. Using a small test mailing, the attributes of customers with an affinity for the product can be identified. Recent projects have indicated more than a 20-fold decrease in costs for targeted mailing campaigns over conventional approaches (Hodel 1998).

 

Interactive Advertising

Interactive marketing and advertising are interested, among other things, in predicting what each individual accessing a Web site is most likely interested in seeing. A pharmaceutical company can analyze its recent sales force activity and their results to improve targeting of high-value physicians and determine which marketing activities will have the greatest impact in the next few months. The data needs to include competitor market activity as well as information about the local health care systems. The results can be distributed to the sales force via a wide-area network that enables the representatives to review the recommendations from the perspective of the key attributes in the decision process. The ongoing, dynamic analysis of the data warehouse allows best practices from throughout the organization to be applied in specific sales situations like selling the drugs on the World Wide Web (White Paper 1997).

              Because of the Web's tracking ability, banner advertising can be exactly aimed to the target audience or potential customers who are most likely to click or respond using log file analysis process. Specifically data mining can help to discover a particular group of consumers with specific characteristics and behavior patterns. First of all, with server log files, geographic location, prior purchases browser type or past banner clicking can be obtained. From cookie files, an additional data component can be assembled and mined in order to discover the inclination for click-through and actual online sales. After this information is gathered, data mining can be used to infer customer composites about lifestyle similarity and characteristics. Based on all this information and data mining techniques, the advertising and marketing campaign will launch. Therefore the more targeted the ads, the higher the response will likely be obtained.

 

Customer Service

Whether they are shopping for toothpaste, a checking account or computers, consumers can choose from a range of products and services offering similar features for similar prices. Often, the only differentiator among suppliers is the quality of customer service they provide. Better understanding of customer's requirements through data mining holds the promise of helping companies to develop more attractive, focused service and support offerings (Data Mining for Competitive Advantage, 1997).

 

Advertising Management Case Study

The first author analyzed a data-set gathered by a Chicago mail-order company dealing with men’s, women’s, and children’s shoes and home furnishing catalogs over several years. The objective of the company was 1) to reduce cost of mailing catalogs and 2) to increase the hit rate of mailings. The default model for women’s catalogs is as follows: A total of 31,140 catalogs were sent out; from these, 6,545 made orders. So the hit rate was 21% as the default model (6,545/31,140=21%). The default model for home furnishing catalogs is as follows: A total of 31,140 catalogs were sent out; from these, 5,240 made orders. So the hit rate was 16.8 % as the default mode for home furnishings. A model was selected to satisfy the two objectives noted above. Coincidence Matrix Methodology was used because it is suitable to minimize the possibility of missing potential customers and to minimize the number of mailings needed to reach the majority of customers.

           Generally speaking, Coincidence Matrix is a method to detect two or more unique events that occur almost simultaneously. Coincidence counting involves two or more event counters exposed to the same source of events and shows if there exists the relationship among events.

 

1) Women’s catalog Model (Neural Net Model, Coincidence Matrix)

The results showed that if a total of 1,715 copies were sent out to the exact target audience, the hit rate would be 29.33%.  So the hit rate is increased by 40 % (21% Vs 29.33%)

Table 1: Women’s Catalog Model Prediction

Sending catalogs

 

No

Yes

Order (No)

1129

1212

Order (Yes)

156

503

                                                                        Correct: 1632 (54.4%)

                                                                        Wrong: 1368  (45.6%)

                                                                        Total: 3,000

 

(Assumption: $5 per copy, $253 per year/per customer, 4 catalogs per season, $63 per season/ per customer, 100% markup.)

When the default model and the current model were compared, there is an interesting result. The total profit was increased by 37.3%. ((69,340- 50,498)/ 50,498)

Default model:

            COGS: (31,140 X 5) + (6,546 X (63/2)) = $361,900

            Revenue: 6,546 X $63 = $412,398

            Profit: $ 50,498

 

Current Model:

COGS: (17,374 X 5) + (4,959 X (63/2)) = $243,078

            Revenue: 4,959 X $63 = $312,417

            Profit: $ 69,340.

 

2) Home’s catalog Model (Neural Net Model, Coincidence Matrix)

The results in this case showed that if a total of 1,254 copies were sent out to the exact target audience, the hit rate would be 22.64%.  So the hit rate is increased by 35 % (21% Vs 29.33%)

Table 2: Home furnishings Catalog Model prediction

 

Sending catalogs

 

No

Yes

Order (No)

1508

970

Order (Yes)

238

284

                                                                        Correct: 1,792 (59.73%)

                                                                        Wrong:  1,208 (40.27%)

                                                                        Total: 3,000

 

(Assumption: $5 per copy, $164 per year/per customer, 2 catalogs per season, $82 per season/ per customer, 80% markup)

When Default model and the Current Model were compared, there is an interesting result. The total profit was increased by 56.36 %. ((50,498-33,192)/33,192)

Default model:

            COGS: (31,140 X 5) + (5,240 X (82/1.7)) = $408,452

            Revenue: 5,240 X $82 = $429,680

            Profit: $ 50,498

Current Model:

COGS: (12,540 X 5) + (2,840 X (82/1.7)) = $199,688

            Revenue: 2,840 X $82 = $232,880

            Profit: $ 33,192

            Profit increase by 56.36%.

 

 

Conclusion and Implications

Data Mining is gaining wider acceptance in the general business community. To gain a competitive advantage in today’s markets, data mining must be used to get the combination of hindsight and foresight to make proactive and knowledge-driven decisions a business needs to succeed. Success can be measured in increased profits, generating new revenue and providing a significant cost savings. However, data mining, with it strengths, has one down side. The business must have knowledge people on its staff and the project could cost over a million dollars. The consequences of not doing data mining could be more disastrous with lost revenue, less customers or a market not being fully realized. There are many ways to do data mining through its various technologies. Now and in the future it will be coupled with other technologies such as OLAP (Online Analytical Processing) that will give businesses, a one-two punch for market success. The myths of data mining are just that: myths. As long as one does not expect results overnight, data mining may provide results. Finally, the application of data mining will continue to find answers where answers have not been found before.  And this may be particularly true in the field of advertising where its application has not been universally adopted at the rate in some other areas of business.

 

 

 

References

 

Ashbrand, Deborah (1997), “ Is data mining ready for the masses?” at URL:

http://www.datamation.com/PlugIn/workbench/datamine/stories/nuggets.htm

 

Berry, Michael J. A., and Gordon Linoff (1997), Data Mining Techniques for Marketing Sales and Customer Support, New York: Wiley.

 

Condon, Bernard (1997), "Prickly Does It." Forbes, 159 (May 5), p66-70.

 

Darling, Charles B (1998), “Data mining for the masses?” at URL:

http://www.datamation.com/PlugIn/workbench/datamine/stories/masses.htm

 

Dilly, Ruth (1995a), “ Data Mining Techniques” at URL:

http://www-pcc.qub.ac.uk/tec/courses/datamining/stu_notes/dm_book_4.html

 

_________ (1995b), “Data Mining,” at URL:

http://www-pcc.qub.ac.uk/tec/courses/datamining/stu_notes/dm_book_2.html

 

Elder, John & Ridgeway, Greg (1999), “Combining Estimators to Improve Performance: Bundling, Bagging, boosting, and baysian Model Averaging,” International Conference on Knowledge Discovery and Data Mining, San Diego, CA, August 15

 

Foley, John & Bruce Caldwell (1996), "Dangerous Data,"Informationweek, September 30), 14-16.

 

Gerber, Cheryl (1998), “ Excavate your data,” at URL:

http://www.datamation.com/PlugIn/workbench/datamine/stories/excav.htm

 

Gilman, Michael (1998), “White Paper,” at URL:

http://www.data-mine.com/white.htm

 

Hodel, Alan (1998), “Data Mining: A New Weapon for Competitive Advantage,” at URL:

http://www.software.ibm.com/sq/issues/vol24/data.htm

 

McCarthy, Vance (1998), “Strike it rich!” at URL:

http://www.datamation.com/PlugIn/workbench/datamine/stories/rich.htm

 

Miller, Michael J (1997), “Looking Back,” PC Magazine, 16 (March 25), p108-38.

Mitchell, Tom M (1997), “Does Machine Learning Really Work?” AI Magazine, 18 (Fall), p11-20.

Novack, Janet (1996), “The Data Miners,” Forbes, 157 (3-February 12), 96-97.

O'Harrow, Robert, Jr (1998), “Prescription Sales, Privacy Fears: CVS, Giant Share Customer Records With Drug Marketing Firm,” The Washington Post, Feb. 15, p. A1+.

 

SAS Institute, Inc. (1997), "SAS Enterprise Miner Software," at URL: http://www.sas.com/software/components/miner.html

 

Small, Robert D (1997), “ Debunking Data Mining Myths,” at URL

http://www.twocrows.com/iwk9701.htm

 

Stone, M. David (1997), “Future Mass Storage,” PC Magazine, 16 (March 25), p182-84.

 

Swami, Arun (1995), “Data Mining with Silicon Graphics Technology,” at URL

http://www.sgi.com/Technology/data-mining.html

 

Toffler, Alvin (1970), Future Shock, New York: Random House.

 

A Characterization of Data Mining Technologies and Processes (1998) at URL:

http://www.datamining.com/datamine/dm-tech.htm

 

Big Data – Better Returns Leveraging Your Hidden Data Assets to Improve (1998) at URL: http://www.think.con/html/products/darwin/r_method.htm

 

Data Mining- An IBM Overview (1998) at URL: http://direct.boulder.ibm.com/bi/info/overview.htm

 

Genetic Algorithms (1998), at URL: http://www.ultragem.com/ga.htm

 

About Data Mining (1998), at URL: http://www.datamindcorp.com/dmabout.html

 

Data Mining: A Competitive Tool for Managing Churn (1998) at URL:

http://www.datamindcorp.com/paper_churn.html

 

Data Mining for Competitive Advantage (1997) at URL:

http://www.datamindcorp.com/paper_advantage.html

 

White Paper: An Introduction to Data Mining (1998) at URL:

http://www.pilotsw.com/dmpaper/dmindex.htm