-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathindex.html
87 lines (84 loc) · 4.9 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
<html>
<head>
<title>cafe-predicting project</title>
<link rel="stylesheet" type="text/css" href="css/layout.css" />
<link rel="stylesheet" type="text/css" href="//code.jquery.com/ui/1.11.4/themes/smoothness/jquery-ui.css" />
<script src="jquery/jquery-1.11.2.min.js"></script>
<script src="jquery/jquery-ui.min.js"></script>
<script>
$(function(){
$("#header").load("header.html");
});
$(function(){
$("#menu").load("menu.html");
});
$(function(){
$("#footer").load("footer.html");
});
</script>
</head>
<body>
<div id="header"></div>
<div id="menu"></div>
<div id="main">
<div class="content">
<h2>Cafe Data Project</h2>
<h3>Abstract</h3>
<p>The steady growth of digital storage capacities and the connection of an increasing variety of devices to the
internet allows for the collection of data sets so large and complex that traditional methods of data processing are
rendered obsolete. While such data sets generally prove challenging to analyze, their sheer scope and comprehensiveness
provide many opportunities to identify subtle trends in business, crime, and information. For this reason, many modern
scientists and analysts employ machine learning and data mining techniques, allowing for the automation of big data
analysis and rendering the process of identifying trends and relationships in large data sets much more efficient.
</p>
<p>
We present a web application which makes predictions about customer activity at a local cafe, based on factors such as
time of day, external weather conditions, and advertising decisions. Predictions are made using models generated through
machine learning algorithms, including regression and decision trees, applied to multiple large data sets provided by Dr.
Julie Whitney of Lexmark International, Inc. and collected from a Lexmark campus café. The predictions made by the
application achieve an average of at least 80% accuracy (using the Mean Absolute Scaled Error). The result of our efforts
can be found <a href="https://cafe-predicting.shinyapps.io/Dashboard/">here</a>.
</p>
</br>
<h3>Project Description</h3>
<p><u>Objective</u></p>
<p>
The purpose of this project is to provide a web-based application that allows the user, an owner or manager of a café
or restaurant, to estimate staffing and supply needs based upon predicted customer activity, based upon a user-modifiable
time scale and input parameters. This allows for more efficient utilization of physical resources and personnel,
potentially reducing overhead costs and increasing net profits.
</p>
<p><u>Background</u></p>
<p>
In February 2016, Dr. Julie Whitney, senior technical staff member at Lexmark International, Inc., presented our team
with a large data set (described below) collected from a café serving Lexmark employees and visitors to the Lexmark
campus. We were then tasked with using machine learning techniques to analyze the data sets and obtain models for
predicting future customer activity. After initial exploration, we divided the data into two sets, one for training our
models, and one for testing them. We then applied regression and decision tree algorithms to the training set to obtain
predictive models, which we then refined until the models’ predictions achieved 80% accuracy when compared to actual
results from the testing set of the collected data. At this point, the models were implemented into a web application
which allows the user to choose between models and adjust input parameters based upon their needs.
</p>
<p><u>The Data Set</u></p>
<p>
Included in the data set provided by Dr. Whitney are the following essential data points: </br>
<ul style="color: #000">
<li>Date and time of purchase (a string in format MM/DD/YYYY hh:mm)</li>
<li>Item(s) purchased (given by integer item IDs)</li>
<li>Perceived customer age group: unknown, child, young adult, adult, or senior (represented by integers from 0 to 4, respectively)</li>
<li>Perceived customer sex: unknown, male, or female (represented by integers from 0 to 2, respectively)</li>
<li>Time spent in the vicinity of an advertising screen (an integer in milliseconds)</li>
<li>Time spent looking at the advertising screen (an integer in milliseconds)</li>
<li>Item being advertised at time of purchase (an integer item ID)</li>
<li>External temperature (an integer value representing degrees Fahrenheit)</li>
<li>External humidity (an integer between 0 and 100 representing relative humidity percentage)</li>
<li>External precipitation state (one of the following strings: “Clear”, “Clouds”, “Mist”, “Rain”, “Snow”)</li>
</ul>
(Note: Some of the supplied data points have been fabricated to protect customer identities.) Our team examined the relationships between these data point in order to identify meaningful trends.
</p>
</br>
</div>
</div>
<div id="footer"></div>
</body>
</html>