Datathon Challenge: How to Boost Sales for a Global Retailer

Every time you take advantage of a discount, join a store’s loyalty program, or simply buy something online — whether it’s a bag of groceries, a toddler sun hat or a shaving kit — you’re telling a retailer something about your needs, preferences and buying habits. Billions of pieces of information are continually being generated. How do companies make sense of it all to gain a competitive edge? That, of course, is where data analytics comes in.

The global big data analytics market for retail was estimated at $3.4 billion in 2018 and is projected to reach nearly $11 billion by 2024, according to a report in Business Wire. The report noted that data analytics is being used at every stage of the retail process to understand customer behavior, predict demand and optimize pricing.

Recently, a group of Wharton and University of Pennsylvania students were invited to try their hand at a real-world data analytics challenge. They were given customer transaction data from an international consumer brand and tasked with finding innovative ways to help the company improve its gross margins. The virtual event, a Datathon run by Wharton Customer Analytics, was sponsored by Baring Private Equity Asia.

The retailer, whose identity was not revealed, is part of Baring PE’s portfolio. Baring vice president Karim Awad described it as a U.K.-based international women’s wear and kids’ wear brand with both a physical store presence and an ecommerce function. The data given to the students revealed that the company also sold products such as cooking and dining items, luggage and home furnishings.

“Using the data, how can we contribute value in the form of improving revenues?” Awad asked the students. He suggested they might take approaches such as optimizing prices for particular genres of goods; architecting and applying discounts; encouraging and improving cross-selling (for example, promoting accessories with clothing), or re-engaging customers who have stopped buying from the company.

Awad said that although he had outlined those sample objectives, he wanted to keep the challenge broad. His aim was to give the students a sense of “a real-world scenario” as an actual data analyst, which would typically involve “having a dataset put before you and a blank sheet of paper.” He encouraged the students to use their creativity while designing solutions that would be relatively easy for a company’s managers to understand and implement.

“I think it is more important than ever for retailers to use careful analysis of data to understand who their good customers are.” –Raghuram Iyengar

The data shared with the students was two years’ worth of point of sale information, in the form of about 20 million rows of transactions and 50 variables. It included both retail and ecommerce activity from the UK and Japan.

Serving as one of the judges was Wharton marketing professor Raghuram Iyengar, faculty director of Wharton Customer Analytics. He commented in a separate interview about the increasing importance of data analytics in retail, especially given the challenges the industry has faced during the COVID-19 pandemic. “I think it is more important than ever for retailers to use careful analysis of data to understand who their valuable customers are,” he said.

One major improvement that data analytics can bring to retailers, Iyengar noted, is to help them identify customers across sales channels to better understand shopping behavior. If a customer comes into a store, for example, were they motivated to do so by recent in-store promotions, an ad on their mobile, an online video, or something else? However, many retailers still manage their channels in separate siloes. According to Iyengar, a top priority should be breaking down those siloes to enable data-gathering across them. The companies would then be on track to achieve “a holistic view of what customers are doing.”

What Do They Buy and When Do They Buy It?

Fifteen teams of three to five students each competed in the Datathon. The participants were Wharton MBAs and undergraduates as well as students from other Penn programs such as engineering and information technology. Teams analyzed the dataset using the programming languages and tools (e.g., Python, R) of their choice, and created statistical models that helped solve the business challenge. The judges’ panel was composed of Wharton Customer Analytics leadership. The first-place team was awarded $1,500 and the second-place team, $500.

Winning first place was a team whose presentation was titled, “Modeling Consumer Retail Preferences for an International Consumer Brand.” The team came up with a variation on RFM (recency, frequency, and monetary value), an established method for determining customer value based on how recently they bought something, how often they buy, and how much they spend. Team member Hoyt Gong asserted that the traditional RFM approach didn’t work well for this brand. “RFM scores should normally reflect customer value…. However, when we tried to find a correlation between these RFM scores and the transaction history of our customers, we only saw a weak correlation. Similarly, we saw little to no correlation between that same RFM score and our client’s gross margins.”

“The company’s strategy will be the most successful if they … lower that churn probability.” –Hsien Tham

They decided to pull apart recency and frequency measurements from monetary value, stating that their new model would tell the business which customers, based on their recency and frequency, will make the greatest number of future transactions. The model would also yield customer lifetime value (CLV), determined by multiplying a customer’s expected future transactions by their average purchase size in dollars. Once the firm adopts this improved way to calculate CLV, they said, it will know which customers to focus on, and can better optimize its decision-making.

The team also used a technique known as network modeling or network analysis to reveal that the firm’s new and repeat customers have different purchasing behaviors: specifically, that they buy items in batches differently. They pointed out, for example, that bundles of Disney-themed items dominated the most frequent purchases for repeat customers, but not new customers. The team noted that the company could plug all their customers into this model to determine the most appealing products to promote to them as add-ons to their order.

The second-place team’s presentation was “Understanding the Customer.” Team member Gantavya Pahwa explained, “When we got access to the data, we very quickly realized that we should have our primary focus on the customer…. From this, we realized that there were a few key questions that we should ask.”

Pahwa and his teammates created a “when, what, why and who” question framework. First, looking at “when and how much do they buy,” they built a probabilistic model of customer transaction streams. Second, examining what customers buy, they sought to identify the optimal product mix for maximizing revenue. Asking, “Why do they buy?” they examined responsiveness to discounts and pricing. And finally, they looked at who the brand’s customers are based on age, gender, and country data.

Interestingly, the team found that customers with a high rate of what marketers call “churn” (meaning they stopped buying during a certain timeframe) were contributing a large chunk — about 74% — of the company’s revenue. Since these churned customers provide significant value, re-engaging them presents a lucrative opportunity, they said. Team member Hsien Tham commented, “The company’s strategy will be the most successful if they … lower that churn probability.” The team suggested designing loyalty rewards, membership programs, exclusive events or other types of offers for this promising customer segment.

Another insight the team offered was that this particular brand’s most valuable customers are the more price-conscious ones. The firm should conduct more sales and offer more discounts, they said, and focus on lower-end products. Using data analytics they also managed to identify the product mix that these valuable customers tended to buy (which consisted of outerwear, separates and dresses).

Navigating a Sea of Data to Create Value

The judges also selected a third-place team, whose presentation was titled “Revenue and Retention Segmentation Models.” This team created a spending model and a retention model to analyze customer behavior. They then combined the top contributing features of the two models to arrive at a list of value-creating factors that the company should focus on: the ecommerce channel; high-frequency purchases; the accessories category such as luggage, bags, and backpacks; and the dresses category.

“It’s not enough to just show the numbers; you have to be good at talking about the numbers, and then explaining the ‘so what.’” –Raghuram Iyengar

Among the recommendations the team came up with was that the company should strengthen its online platform to create a richer omnichannel experience. Team member Namrita Narula commented, “We saw [in the data] that e-commerce is indicative of sales and retention, and therefore it is important — especially in these unprecedented times — that the consumer brand has items fully stocked and their full range of products available online.” The team also advised investing in a mobile app since, according to Narula, the profitability of one-click sales through in-app purchases is increasing.

Secondly, the team suggested linking the bags/backpacks category with dresses, since both categories are good sellers. For example, the company could increase tote bag sales through a fashion campaign focused on sustainability. In general, the bundling of products could enhance customer brand recall.

The team also presented a seasonality-related finding. While conducting their analyses, they had noticed that the firm’s winter sales were nearly six times that of summer. They speculated that the company’s summer product stock was perhaps not as inviting as the winter’s, and suggested that might be an area of expansion, with micro-influencers engaged to boost sales.

Of the Datathon overall, Wharton’s Iyengar said that the most important skill these students can learn — and that the Datathon helps them practice — is weaving a coherent narrative around their analyses. “It’s not enough to just show the numbers; you have to be good at talking about the numbers, and then explaining the ‘so what.’”

In fact, he said, just having the baseline skills of running analyses or working with the latest machine learning model are not as prized as they once were, because they’re now easier to acquire. “You can actually get that on Coursera or other platforms,” Iyengar said. Instead, a truly valuable candidate is one who can perform the data analysis and then stand up in front of a management team and persuasively explain the business significance of their findings. “And I think those people are more scarce,” he said.

More From Knowledge at Wharton

Visual Marketing | Barbara Kahn and Zab Johnson

The New Science of Pitching and Hitting With Travis Sawchik

How Tariffs Are Disrupting Retail and Consumer Confidence

Looking for more insights?