When Wharton professor Marshall Fisher and colleague Vishal Gaur did a controlled pricing experiment in 18 stores belonging to the Zany Brainy retail toy chain, they came away with a surprising result.


The experiment had been designed to measure how demand for three separate products – a family center board game, a Phonics traveler and a headset walkie-talkie – varied with price, and to determine the price at which profit is maximized.


Two of the products were downward sloping in price (they sold more units when the price was lower). Surprisingly, however, demand for one product actually increased with price for reasons that are described later in this article.


But Fisher, professor of information and operations management, and Gaur, operations management professor at NYU’s Stern School of Business, also demonstrated another point with this experiment: That in-store testing can be a powerful tool for retailers trying to determine price optimization, as long as the testing is done right.


Generally when retailers test, says Fisher, “they aren’t systematic enough to insure accurate results. One retailer I worked with found that half the time the products it tested sold better at the higher price, an illogical result that led the company to conclude that the test was invalid.” 


Moreover, when Fisher, who is also co-director of Wharton’s Fishman-Davidson Center for Service and Operations Management, once asked 32 retailers whether they conducted price testing, 90% of them responded ‘yes’. “We then asked them to rate the effectiveness of their testing on a 10 point scale, with 10 being the highest. The median score was about six – not a stellar performance,” he says.


The unreliability of testing led Fisher and Gaur to design their Zany Brainy experiment in such a way as to control for differences between stores and thus produce results that were representative of the entire chain. Their methodology is described in a paper entitled, “In-Store Experiments to Determine the Impact of Price on Sales.”


Precautionary Measures

In their experiment with Zany Brainy – based in King of Prussia, Pa., and owned by F.A.O. – Fisher and Gaur took a number of steps to make sure that the stores they used in their test were directly comparable. For example, the stores selected were similar in age and in size (as measured by total dollar sales), and their geographical location was relatively isolated from other stores in the chain (“to reduce the risk that a customer will visit two stores with different prices).” A small subset of stores was used so that the experiment was cost-effective and relatively easy to execute.


The experiment was conducted for a period of six weeks – “long enough to provide a sufficient sample of data without creating any seasonal variations,” the authors write. Precautions were taken to insure the “purity” of the experiment: For example, the labels did not show the original list price so that customers would not perceive that a product was marked up or marked down. Also, store managers were not informed about the experiment in order to make sure they didn’t treat the test products differently than they did other products. Finally, sufficient inventory was kept in the experimental stores to avoid running out of the targeted merchandise.


Fisher and Gaur tested the three products at three price-points in six stores, for a total of 18 stores (nearly one-third of the chain’s 53 stores). The family game center price ranged from a low of $19.99 to a medium price of $24.99 to a high of $29.99. The Phonics traveler ranged in price from $24.99 to $29.99 to $34.99 and the headset walkie-talkie from $14.99 to $19.99 to $24.99.


The difference in prices for each item was “considered sufficiently large to cause an observable change in demand,” the authors write. In addition, the products were “not carried by the competition, so that the chances of comparison-shopping” were reduced. Moreover, each item was “unique to avoid comparison with other brands in the same category.”


Walkie-talkies and Wine

In analyzing the results of their experiment, Fisher and Gaur found that sales of the family game center and the Phonics traveler were downward sloping in price – i.e. more of them sold at the lower price than at the higher one. However, the walkie-talkie showed a different pattern. It sold 74 units at the middle price point ($19.99), 47 units at the $14.99 price and 36 units at the highest price ($24.99.)


To understand the reasons for this behavior, Fisher and Gaur spoke with merchandise managers at Zany Brainy and several other retailing firms. They came away with several explanations. First, because the headset walkie-talkie is a complex electronic item, consumers find it difficult to judge its quality. They therefore depend on price as an indicator. Wine, the researchers note, is another example of a product where consumers – most of whom are not wine connoisseurs – often use price to judge quality.  


The family board center, however, is easily understood by the customer, “so that price need not be used as an indicator of quality.” As for the Phonics traveler, because it is a branded item made by a recognized manufacturer, consumers don’t rely on price to judge the item’s quality. “You might buy a $10 walkie-talkie from Sony because you know and trust the brand,” says Fisher. “But a consumer might worry that a $10 walkie-talkie with an unknown brand would be a piece of junk.” (The walkie-talkie at Zany Brainy is not branded.)


In their paper, Fisher and Gaur also cite outside research suggesting that consumers “uniformly perceive a stronger association between price and quality for durable products” – such as microwaves and televisions – than for non-durable products, such as paper towels, orange juice and detergents. This is most likely because consumers make fewer purchases in the durable goods category and have greater difficulty evaluating the products’ complexity. 


Second, the $19.99 price point is more popular for gift purchases than $14.99, the authors note. Because consumers may like the headset walkie-talkie as a gift item, the unit sales at $19.99 exceeded those at $14.99.


The researchers also fit demand curves to the two products with downward sloping demand – the family game center and the Phonics traveler – to estimate the price elasticity of demand and to identify a price that maximizes profit. They found that the optimal prices for the Phonics traveler and family game center were $35.65 and $22.58 respectively (compared to existing prices of $29.99 and $24.99). “The increase in expected gross profit from moving to the optimal price is 3.8% for the Phonics traveler and 0.9% for the family game center,” the paper notes.


History Lessons

In explaining the background behind the Zany Brainy experiment, Fisher notes that people tend to separate pricing into markdown pricing and regular pricing. Much of the world’s economy is transacted in markdown pricing; indeed, department stores may sell more than half their volume at markdown prices. “Prices are very fluid, and often change multiple times as stores try to (sell out) their inventory,” Fisher says.


Regular pricing generally includes the consumer packaged good segment (food products, soda, paper towels, etc.). Research typically tends to center on pricing strategies as well as quantification – trying to estimate what demand would be at different price levels and then choosing an optimal price. In doing their research on pricing, companies look at past history showing how the price has varied over certain selling periods (controlling for factors that drive sales, like seasonality).


In the Zany Brainy experiment, however, Fisher and Gaur took a different approach to testing. “If your goal is to measure price elasticity – how consumers respond to differences in price – the advantage of our test is you can control everything, compared with using whatever natural variations there were in history,” Fisher says.


On the other hand, the advantage of relying on history is that it doesn’t require the effort involved in consciously running a test, he adds. “Retailers are so action oriented that they often have a bias against testing. It’s not going to help you make this quarter’s numbers; it takes time and energy; and you know that if you are testing three different prices, two of the prices will be wrong. That will hurt your numbers a little. So there is a cost to testing.”


Fisher recommends a blended approach, and has started an analytic software company called 4R Systems to do this. “We are working with one retailer to help it optimize its exit from inventory at the end of product life. Price is our main weapon. Our approach is to use history to get the best estimates we can, but if the history is inadequate or if it is a very high-stakes decisions, we will use selective experimentation to buttress historical data as needed.”


Fisher would argue, however, that there are obvious advantages to pure price testing, or in-store testing, as long as the stores chosen for the testing meet the strict criteria used in the Zany Brainy experiment. “Our methodology is useful not just for finding consumer reactions to different price points but also to test the effects of different types of assortments and store-push levers such as ‘item of the week promotion,’ large shelf space display and salesperson push,” he says.


In-store experiments are also valuable scientific tools for studying the impact of store environmental variables – such as music, lighting, employee behavior and store design – on purchasing decisions.


“By doing these types of experiments on price,” says Fisher, “you are learning about consumer behavior. In essence you are learning for the future.”

In-Store Experiments to Determine the Impact of Price on Sales