How Data
Mining can Help Advertising Management
by
Seokmin Hong
Doctoral Student
smhong@mail.utexas.edu
Department of Advertising
College of Communication
The University of Texas at
Austin
Austin, Texas 78712
And
John D. Leckenby
in Communication
john.leckenby@mail.utexas.edu
Department of Advertising
College of Communication
The University of Texas at
Austin
Austin, Texas 78712
(Do not cite without
permission of the authors)
September 22, 2000
How Data
Mining can Help Advertising Management
Abstract
Data Mining is a relatively new set of
analytical techniques designed to provide management information for
decision-making purposes. The
distinguishing characteristic of data mining is the application of techniques
and rules designed to extract information from very large databases. These databases are often developed in the
course of ordinary business both online and offline. This paper addresses the question of the relevance of data mining
techniques to the field of advertising management. First, the definition of data mining is explored followed by an
exposition of the core concepts in data mining. Second, an example of data mining developed by the first author
is explained. Finally, the
applicability of data mining to advertising management problems is addressed.
How Data
Mining can Help Advertising Management
Introduction
In the current advertising
and marketing business and in the press, data mining has been and continues to
be the focus of a great deal of attention (Small 1997;Elder & Ridgeway
1999; Abbott, Matkovsky & Elder 1998; Condon 1997;
Foley, John & Bruce Caldwell 1996; Miller 1997; Mitchell 1997; Novack 1996;
O’Harrow 1998; Stone 1997). But care must be taken to separate fact from
myth. Data mining is a useful tool, a new approach that combines discovery with
analysis. Data mining is not a newly discovered branch of mathematics embodied
in software that will, when hooked-up to a large database, inexplicably and
inevitably reveal the business insights contained in the millions of records
stored therein. Yet it is a tool that will increasingly become important for
competitive businesses (IBM 1998, URL: http://www.ibm.com).
The past two decades has seen a dramatic increase in the amount of
information or data being stored in electronic format. This accumulation of
data has taken place at a dramatic rate. It has been estimated that the amount
of information in the world doubles every 20 months and the size and number of
databases are increasing at an even faster rate. The increase in use of
electronic data gathering devices such as point-of-sale or remote sensing
devices has contributed to this explosion of available data (Dilly 1995).
Data mining is one way to extract predictive information from these large databases. The focus of this paper is to show how data mining may become a major tool for the advertising industry. This discussion will address the following areas: what data mining is, what technologies data mining uses, what ways data mining can be used, some examples of successful data mining applications in businesses, what is holding back data mining in advertising, and, finally, ways to develop a successful data mining project.
Data mining is a powerful new technology with
great potential to help companies focus on the most important information in
their data warehouses. Data mining tools predict future trends and behaviors,
allowing businesses to make proactive, knowledge-driven decisions. The
automated, prospective analyses offered by data mining move beyond the analyses
of past events provided by retrospective tools typical of decision support
systems. Data mining tools can answer business questions that traditionally
were too time-consuming to resolve. They scour databases for hidden patterns,
finding predictive information that experts may miss because it lies outside
their expectations (White Paper 1997).
The expression "Data
Mining" sometimes refers to the whole process of knowledge
discovery and sometimes to the methods that are used in the knowledge discovery
phase. The SAS institute has adopted a definition that is more orientated
towards methods. According to the SAS institute (URL: http://www.sas.com), data
mining consists of
“Methods used to explore and model relationships in large amounts of
data.”
Here are some more useful definitions for data mining:
Data mining is the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques. (Gartner Group URL: http://www.gartner.com).
Data mining is the exploration and analysis, by automatic and semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules (M.J.A. Berry, G. Linoff 1997).
Data mining the use of advanced statistical tools to reach into a company’s
existing databases to discover patterns and relationships that can be exploited
in a business context. (Trajecta lexicon URL:
http://www.trajecta.com/neural/lexicon.htm).
Data mining is a combination of powerful methods that help reducing costs and
risks as well as increasing revenues, by extracting information from the
available data. (T. Fahmy URL: http://www.statserv.com/contributers.htm).
Data mining is the non trivial
extraction of implicit, previously unknown, and potentially useful information
from data. (William J Frawley, Gregory Piatetsky-Shapiro and Christopher J
Matheus 1991)
There are a number of
factors that have combined to bring data mining to the attention of the
business community. First, there is a general recognition that there is
untapped value in large databases. Second, there is a consolidation of database
records tending toward a single customer view. Third, there is huge storage of
databases, including the concept of an information warehouse. Fourth, there is
a reduction in the cost of data storage and processing providing for the
ability to collect and accumulate detailed data. Fifth, there is intense
competition for a customer's attention in an increasingly saturated
marketplace. Finally, there is movement towards the de-massification of
business practices (IBM 1998, URL: http://www.ibm.com).
De-massification is a term from Alvin Toffler
(1970). During the industrial revolution economies of scale led businesses to
mass manufacturing, mass marketing and mass advertising. The information
revolution is providing the capability to custom manufacture, to market and
advertise to small segments and ultimately to the individual. This is
de-massification and it is a strong force in business today (IBM 1998, URL:
http://www.ibm.com).
Data mining is evolving as
an essential segment in the decision-support marketplace. Like data
warehousing, data mining is experiencing rapid change and growth as major
corporations around the world begin to implement new and enhanced systems for
leveraging corporate data. The META Group, a Stamford, Connecticut research
firm, reports the market for data mining software and services to be $3.2
billion in 1996. The firm anticipates growth to $10 billion in 2000 (About Data
Mining,1997).
A recent Gartner Group Advanced
Technology Research Note listed data mining and artificial intelligence at
the top of the five key technology areas that "will clearly have a major
impact across a wide range of industries within the next three to five
years." Gartner also listed data mining as one of the top 10 new
technologies in which companies will invest during the next five years (White
Paper 1999).
If data mining is understood
and applied, it can generate new revenue, increased profits, and have
significant cost savings. However, those "ifs" speak volumes about a
set of technologies that is still in its infancy. (Gerber 1998).
"Data mining is an immature, unstable market
today. We have to perfect the technology and the understanding of it by
business analysts. However, it is necessary for survival. If you don't use it
to predict a trend before your competitors, you're dead," says Erick
Brethenoux, Paris-based research director for advanced technologies at the
Gartner Group (Gerber 1998).
“Indeed, there’s been a frenzy of interest in datamining, which has captured corporate America’s imagination—and budgets—with one simple value proposition: Get to know your customers better, and you’ll keep them longer. That can translate into bigger profits over longer periods” (McCarthy 1998).
IT executives will likely
triple the size of their existing data stores in the next two years, predicts a
recent study by the San Francisco-based Sentry Technology Group. Moreover, to
make the most of all those terabytes of data, they will probably need to invest
heavily in advanced OLAP (Online Analytical Processing) products and
algorithm-intensive datamining technologies. So, before they gear up to embark
on this latest technology adventure, they had better prepare themselves for the
fact that datamining projects can consume millions of dollars and years of
effort. But the alternative could be just as costly: then upper management, may
wonder why the competition always seems so much better at spotting sales
trends, developing smarter marketing campaigns, and accurately predicting
customer loyalty (McCarthy 1998).
Data mining is emerging as
the major solution to solve a series of critical, fast growing business issues.
In particular, with the effects of globalization and deregulation, major
corporations are facing new pressure from increased competition on a worldwide
basis. Effectively responding to this challenge requires faster marketing
cycles to produce a greater number of campaigns to attract and retain the best
customers. As a result, tools are required in the hands of marketers to help
them stay on top of corporate data that provides the foundation of any good
marketing program. These tools must go beyond simply reporting data records or
historical sales performance - they must ultimately provide the business
professional with deep insight into
customer actions. This insight is best achieved with a combination of hindsight as well as foresight (About
Data Mining 1997).
Data mining offers a rich capability for modeling historical performance and then using this model to predict likely future outcomes. This combination of hindsight and foresight is unique to data mining and provides business professionals with a new understanding of factors, which truly contribute to business success or failure (About Data Mining, 1997).
Data Mining Technologies
The primary technologies
which comprise data mining are neural networks, decision trees, rule induction,
genetic algorithms, and nearest neighbor.
Neural Networks
A neural network is
patterned after the neurons in the human brain. It's made up of a web of
electronic neurons that send signals to each other through thousands of
connections, which are adjusted up or down as the machine learns particular
applications. A neural network can be taught voice recognition, for example,
simply by having a trainer speak words into the machine, reinforcing some
connections and not others (Darling 1998).
Neural networks are an
approach to computing that involves developing mathematical structures with the
ability to learn. The methods are the result of academic investigations to
model nervous system learning. Neural networks have the remarkable ability to
derive meaning from complicated or imprecise data and can be used to extract
patterns and detect trends that are too complex to be noticed by either humans
or other computer techniques. A trained neural network can be thought of as an
"expert" in the category of information it has been given to analyze.
This expert can then be used to provide projections given new situations of
interest and answer "what if" questions (Dilly 1995).
Neural networks have broad
applicability to real world business problems and have already been
successfully applied in many industries. Since neural networks are best at
identifying patterns or trends in data, they are well suited for prediction or
forecasting needs including: sales forecasting, industrial process control,
customer research, data validation, risk management, and target marketing etc.
(Dilly 1995).
Neural networks use a set of
processing elements (or nodes) analogous to neurons in the brain. These
processing elements are interconnected in a network that can then identify patterns
in data once it is exposed to the data, i.e. the network learns from experience
just as people do. This distinguishes neural networks from traditional
computing programs that simply follow instructions in a fixed sequential order.
The picture to the right is a simple neural network (Dilly 1995).
Decision Trees
A tree-shaped structure that
represents a set of decisions that can be used to predict which records in a
new (unclassified) dataset will have a given outcome. Decision trees are simple
knowledge representation and they classify examples to a finite number of
classes, the nodes are labeled with attribute names, the edges are labeled with
possible values for this attribute and the leaves labeled with different
classes. Objects are classified by following a path down the tree, by taking
the edges, corresponding to the values of the attributes in an object (Dilly
1995).
The following is an example
of objects that describe the weather at a given time. The objects contain
information on the outlook, humidity etc. Some objects are positive examples
denote by P and others are negative i.e. N. Classification is in this case the
construction of a tree structure, illustrated in the following diagram below,
which can be used to classify all the objects correctly (Dilly 1995).
There are two classifications of decision trees:
they are Classification and Regression Tree (CART), and Chi Square Automation
Interactive Detection (CHAID).
Classification and Regression Tree (CART)
A decision tree technique
can be used for classification of a dataset. Provides a set of rules
that you can apply to a new (unclassified) dataset to predict which records
will have a given outcome. It segments a dataset by creating 2-way splits and
it requires less data preparation than CHAID (White Paper 1997).
The CART method derives its name from the tree-like
structure that it builds as a model. These trees work much like the child's
game of twenty questions, where one child asks questions to determine the kind
of object another child has in mind. In this case, the tree asks questions
about card members to determine which are likelier to attrite than others. Each
question in the tree represents a choice between values of a particular
variable that break a large group of card members into two smaller groups (Big
Data – Better Returns Leveraging Your Hidden Data Assets to Improve, 27 Apr
1998).
For instance, there might be
a choice between those members with more than $1,000 of monthly balance and
those with less than that amount. Those with less than a thousand dollars could
be transactors while those with more could be revolvers. As with the child's
game, each question's answer provides a bit of information about card members
and leads to other, more appropriate questions. For revolvers, the next-best
question to determine whether someone will attrite might be whether their
monthly balance has been decreasing, but for transactors, those who always pay
off their monthly balance, the next-best question might be whether the monthly
volume of transactions has been decreasing. Eventually, enough questions have
been asked of the card members to place each in a group of people who will
reliably do much the same as one another: stay or attrite (Big Data – Better Returns
Leveraging Your Hidden Data Assets to Improve, 27 Apr 1998).
Because CART uses tests on the database to find its questions, it can be robust in the face of outliers, by isolating them with a single question. And because it searches for the best variable to use in the groups created by answers to questions based on other variables, it automatically finds combinations of variables (Big Data – Better Returns Leveraging Your Hidden Data Assets to Improve, 27 Apr 1998).
Chi Square Automation Interactive Detection
(CHAID)
A decision tree technique
used for classification of a dataset. Provides a set of rules that you
can apply to a new (unclassified) dataset to predict which records will have a
given outcome. It segments a dataset by using chi square tests to create
multi-way splits. Preceded, and requires more data preparation than, CART
(White Paper 1997).
The extraction from data of
useful if-then rules based on statistical significance. A data mine system has
to infer a model from the database. That is, it may define classes such that
the database contains one or more attributes that denote the class of a tuple
i.e. the predicted attributes while the remaining attributes are the predicting
attributes. Class can then be defined by condition on the attributes. When the
classes are defined, the system should be able to infer the rules that govern
classification, in other words the system should find the description of each
class (Dilly 1995).
Production rules have been
widely used to represent knowledge in expert systems and they have the
advantage of being easily interpreted by human experts because of their
modularity i.e. a single rule can be understood in isolation and doesn't need
reference to other rules (Dilly 1995).
Genetic algorithms are a
relatively new software paradigm inspired by Darwin's theory of evolution. A
population of individuals, each representing a possible solution to a problem,
is initially created at random. Then pairs of individuals combine to produce
offspring for the next generation. A mutation process is also used to randomly
modify the genetic structure of some members of each new generation (Genetic
Algorithms 1996).
Genetic algorithms also generate rules from data
sets, but do not follow the exploration-oriented protocol of rule induction.
Instead, they rely on the idea of "mutation" to make changes in
patterns until a suitable form of pattern emerges via selective breeding. The
genetic crossover operation is in fact very similar to the operation breeders
use when they crossbreed plants and/or animals. The exchange of genetic
material by chromosomes is also based on the same method (A Characterization of
Data Mining Technologies and Processes, 2 May 1998). Genetic algorithms are not
just for rule generation and may be applied to a variety of other tasks to
which rules do not immediately apply, such as the discovery of patterns in
text, planning and control, and system optimization, etc. (Holland 1995).
Nearest
neighbor is a technique that classifies each record in a dataset based on a
combination of the classes of the k record(s) most similar to it in a
historical dataset (where k is greater than or equal to 1). Nearest Neighbor is
sometimes called the k-nearest neighbor technique (White Pape 1997).
Implications of Data Mining for Advertising Management
Direct Response Advertising
The direct advertising and
marketing industry is focused on choosing from millions of potential clients or
prospects the people who are most likely to respond to an offer and then
targeting a specific mailing or telesales campaign to them. By better targeting
the candidates, not only does the response rate increase, but also the cost for
each positive response goes down. A credit card company can leverage its vast
warehouse of customer transaction data to identify customers most likely to be
interested in a new credit product. Using a small test mailing, the attributes
of customers with an affinity for the product can be identified. Recent
projects have indicated more than a 20-fold decrease in costs for targeted
mailing campaigns over conventional approaches (Hodel 1998).
Interactive Advertising
Interactive marketing and
advertising are interested, among other things, in predicting what each
individual accessing a Web site is most likely interested in seeing. A
pharmaceutical company can analyze its recent sales force activity and their
results to improve targeting of high-value physicians and determine which
marketing activities will have the greatest impact in the next few months. The
data needs to include competitor market activity as well as information about
the local health care systems. The results can be distributed to the sales
force via a wide-area network that enables the representatives to review the
recommendations from the perspective of the key attributes in the decision
process. The ongoing, dynamic analysis of the data warehouse allows best
practices from throughout the organization to be applied in specific sales
situations like selling the drugs on the World Wide Web (White Paper 1997).
Because of the Web's tracking ability, banner advertising can be exactly aimed to the target audience or potential customers who are most likely
to click or respond using log file analysis process. Specifically data mining
can help to discover a particular group of consumers with specific
characteristics and behavior patterns. First of all, with server log files,
geographic location, prior purchases browser type or past banner clicking can
be obtained. From cookie files, an additional data component can be assembled
and mined in order to discover the inclination for click-through and actual
online sales. After this information is gathered, data mining can be used to
infer customer composites about lifestyle similarity and characteristics. Based
on all this information and data mining techniques, the advertising and
marketing campaign will launch. Therefore the more targeted the ads, the higher
the response will likely be obtained.
Customer Service
Whether they are shopping
for toothpaste, a checking account or computers, consumers can choose from a
range of products and services offering similar features for similar prices.
Often, the only differentiator among suppliers is the quality of customer
service they provide. Better understanding of customer's requirements through
data mining holds the promise of helping companies to develop more attractive,
focused service and support offerings (Data Mining for Competitive Advantage,
1997).
Advertising Management Case Study
The first author analyzed a data-set gathered by a Chicago mail-order company dealing with men’s, women’s, and children’s shoes and home furnishing catalogs over several years. The objective of the company was 1) to reduce cost of mailing catalogs and 2) to increase the hit rate of mailings. The default model for women’s catalogs is as follows: A total of 31,140 catalogs were sent out; from these, 6,545 made orders. So the hit rate was 21% as the default model (6,545/31,140=21%). The default model for home furnishing catalogs is as follows: A total of 31,140 catalogs were sent out; from these, 5,240 made orders. So the hit rate was 16.8 % as the default mode for home furnishings. A model was selected to satisfy the two objectives noted above. Coincidence Matrix Methodology was used because it is suitable to minimize the possibility of missing potential customers and to minimize the number of mailings needed to reach the majority of customers.
Generally speaking, Coincidence
Matrix is a method to detect two or more unique events that occur almost simultaneously.
Coincidence counting involves two or more event counters exposed to the same
source of events and shows if there exists the relationship among events.
1) Women’s catalog
Model (Neural Net Model, Coincidence Matrix)
The results showed that if a total of 1,715 copies were sent out to the exact target audience, the hit rate would be 29.33%. So the hit rate is increased by 40 % (21% Vs 29.33%)
|
|
No |
Yes |
|
Order (No) |
1129 |
1212 |
|
Order (Yes) |
156 |
503 |
Correct: 1632 (54.4%)
Wrong: 1368 (45.6%)
(Assumption: $5 per copy, $253 per year/per
customer, 4 catalogs per season, $63 per season/ per customer, 100% markup.)
When the default model and the current model were
compared, there is an interesting result. The total profit was increased by
37.3%. ((69,340- 50,498)/ 50,498)
Default
model:
COGS: (31,140 X 5) + (6,546 X
(63/2)) = $361,900
Revenue: 6,546 X $63 = $412,398
Profit: $ 50,498
Current
Model:
COGS: (17,374 X 5) + (4,959 X (63/2)) = $243,078
Revenue: 4,959 X $63 = $312,417
Profit: $ 69,340.
2) Home’s catalog
Model (Neural Net Model, Coincidence Matrix)
The results in this case showed that if a total of 1,254 copies were sent out to the exact target audience, the hit rate would be 22.64%. So the hit rate is increased by 35 % (21% Vs 29.33%)
|
|
No |
Yes |
|
Order (No) |
1508 |
970 |
|
Order (Yes) |
238 |
284 |
Correct: 1,792
(59.73%)
Wrong: 1,208 (40.27%)
Total: 3,000
(Assumption: $5 per copy, $164 per year/per
customer, 2 catalogs per season, $82 per season/ per customer, 80% markup)
When Default model and the Current Model were
compared, there is an interesting result. The total profit was increased by
56.36 %. ((50,498-33,192)/33,192)
Default
model:
COGS: (31,140 X 5) + (5,240 X
(82/1.7)) = $408,452
Revenue: 5,240 X $82 = $429,680
Profit: $ 50,498
Current Model:
COGS: (12,540 X 5) + (2,840 X (82/1.7)) = $199,688
Revenue: 2,840 X $82 = $232,880
Profit: $ 33,192
Profit increase by 56.36%.
Data Mining is gaining wider
acceptance in the general business community. To gain a competitive advantage
in today’s markets, data mining must be used to get the combination of
hindsight and foresight to make proactive and knowledge-driven decisions a
business needs to succeed. Success can be measured in increased profits,
generating new revenue and providing a significant cost savings. However, data
mining, with it strengths, has one down side. The business must have knowledge
people on its staff and the project could cost over a million dollars. The
consequences of not doing data mining could be more disastrous with lost
revenue, less customers or a market not being fully realized. There are many
ways to do data mining through its various technologies. Now and in the future
it will be coupled with other technologies such as OLAP (Online Analytical
Processing) that will give businesses, a one-two punch for market success. The
myths of data mining are just that: myths. As long as one does not expect
results overnight, data mining may provide results. Finally, the application of
data mining will continue to find answers where answers have not been found
before. And this may be particularly
true in the field of advertising where its application has not been universally
adopted at the rate in some other areas of business.
Ashbrand,
Deborah (1997), “ Is data mining ready for the masses?” at URL:
http://www.datamation.com/PlugIn/workbench/datamine/stories/nuggets.htm
Berry, Michael J. A., and Gordon Linoff (1997), Data Mining
Techniques for Marketing Sales and Customer Support, New York: Wiley.
Condon, Bernard (1997), "Prickly Does It." Forbes,
159 (May 5), p66-70.
Darling,
Charles B (1998), “Data mining for the masses?” at URL:
http://www.datamation.com/PlugIn/workbench/datamine/stories/masses.htm
Dilly,
Ruth (1995a), “ Data Mining Techniques” at URL:
http://www-pcc.qub.ac.uk/tec/courses/datamining/stu_notes/dm_book_4.html
_________ (1995b), “Data Mining,” at URL:
http://www-pcc.qub.ac.uk/tec/courses/datamining/stu_notes/dm_book_2.html
Elder,
John & Ridgeway, Greg (1999), “Combining Estimators to Improve Performance:
Bundling, Bagging, boosting, and baysian Model Averaging,” International
Conference on Knowledge Discovery and Data Mining, San Diego, CA, August 15
Foley, John & Bruce Caldwell (1996), "Dangerous
Data,"Informationweek, September 30), 14-16.
Gerber,
Cheryl (1998), “ Excavate your data,” at URL:
http://www.datamation.com/PlugIn/workbench/datamine/stories/excav.htm
Gilman,
Michael (1998), “White Paper,” at URL:
http://www.data-mine.com/white.htm
Hodel,
Alan (1998), “Data Mining: A New Weapon for Competitive Advantage,” at URL:
http://www.software.ibm.com/sq/issues/vol24/data.htm
McCarthy,
Vance (1998), “Strike it rich!” at URL:
http://www.datamation.com/PlugIn/workbench/datamine/stories/rich.htm
Miller, Michael J (1997), “Looking Back,” PC
Magazine, 16 (March 25), p108-38.
Mitchell, Tom M (1997), “Does Machine Learning
Really Work?” AI Magazine, 18 (Fall), p11-20.
Novack, Janet (1996), “The Data Miners,” Forbes,
157 (3-February 12), 96-97.
O'Harrow, Robert, Jr (1998), “Prescription Sales, Privacy Fears:
CVS, Giant Share Customer Records With Drug Marketing Firm,” The Washington
Post, Feb. 15, p. A1+.
SAS Institute, Inc. (1997), "SAS Enterprise Miner Software," at URL: http://www.sas.com/software/components/miner.html
http://www.twocrows.com/iwk9701.htm
Stone, M. David (1997), “Future Mass Storage,” PC Magazine,
16 (March 25), p182-84.
http://www.sgi.com/Technology/data-mining.html
Toffler, Alvin (1970), Future Shock, New York: Random
House.
A
Characterization of Data Mining Technologies and Processes (1998) at URL:
http://www.datamining.com/datamine/dm-tech.htm
Big
Data – Better Returns Leveraging Your Hidden Data Assets to Improve (1998) at
URL: http://www.think.con/html/products/darwin/r_method.htm
Data
Mining- An IBM Overview (1998) at URL:
http://direct.boulder.ibm.com/bi/info/overview.htm
Genetic
Algorithms (1998), at URL: http://www.ultragem.com/ga.htm
About
Data Mining (1998), at URL: http://www.datamindcorp.com/dmabout.html
Data
Mining: A Competitive Tool for Managing Churn (1998) at URL:
http://www.datamindcorp.com/paper_churn.html
Data
Mining for Competitive Advantage (1997) at URL:
http://www.datamindcorp.com/paper_advantage.html
White
Paper: An Introduction to Data Mining (1998) at URL:
http://www.pilotsw.com/dmpaper/dmindex.htm