Dabble in beer and you know that Oregon hops occupy a special place in the resurgence of craft brewing. Keenies (and those who listened to my podcast on the history of hops also know that long before craft beer became a necessary response to big industrial brewers, the biggest industrial brewer of all1 was getting all of its American hops from the Pacific northwest.
Dabble in statistics and you know about Student’s t-test. Keenies may even know that Student’s real name was William Sealy Gosset, and that he worked for that biggest of all industrial brewers.
Put the two together and you find out how hops made it necessary for Gosset to invent the t-test and why Guinness made him hide his identity.
One way in which Guinness brought science to bear on beer was to rely on objective measurements rather than the experience and accumulated subjective wisdom of brewers to assess the quality of their raw materials. Thomas Case, Guinness’s first scientific brewer, believed that the quality of Guinness depended on the soft resins in the hops used to flavour the brew. Case measured small samples of hops in order to determine the average of the whole batch, the population from which the samples were drawn. A colleague measured a similar small number of samples from the same batch. The average of the two sets of measurements differed; was that a real difference or not? Were the samples of hops the same, at least with regard to their soft resins? Case could not be sure and talked about “the weak link between examination or analysis and the brewing value”.
Nevertheless, based on the research by Case and his colleagues, and the fact that American hops were both cheaper and contained more soft resins, Guinness entered into a deal with Emil Horst to source Oregon hops.
At the time statistics had no approach to deciding whether such small samples represented identical populations. Enter Gosset, a recent graduate in chemistry and mathematics from the University of Oxford and one of a succession of bright young people hired by Guinness as experimental, scientific brewers. Gosset started working for Guinness in 1899 and his mathematical abilities led to him doing most of the analyses for all the various experiments the scientific brewers were undertaking, on barley growth, fertilisers, soil and climate, all of which, ultimately, might influence the quality of the malt and thus the quality of Guinness itself. All these experiments suffered the same drawback as Case’s hops measurements: small sample sizes.
Gosset’s genius was to determine how the size of a sample introduces a predictable level of uncertainty into measurements of the population average. The smaller the sample, the greater the chance of a difference between the sample average and the population average. Knowing that, brewers could decide in advance how many samples they needed to measure in order to be reasonably certain that two populations were the same.2 That enabled them to make sure that their hops, their barley and their malt were of a suitable quality and uniformity to make a standard pint of Guinness.
Important, not significant
Gosset was more than a mathematician, though. He also applied hard-headed economic logic, something for which Guinness was also famous. The reason for small samples was the time and cost of each one. The fewer samples needed to make some decision, the more efficient. To that Gosset added another point that many of the people who use Student’s t-test today forget: that the value of the odds – the probability that two samples are not the same, for example – “depends on the importance of the issues at stake”. This is the crucial difference between “statistical significance” and what Gosset thought of as economically or scientifically “important”. For Gosset, the level of significance depended on the opportunity cost of treating a result as true, plus the opportunity cost of conducting the experiment. The higher those costs, the more certainty was required.3
In 1909, for example, Guinness bought about 2770 tons of hops, which represented nearly 10% of the cost of production. Gosset had calculated that a 1% difference in the amount of soft resins in the hops increased their value to the brewery by almost 11%. Although his result was not statistically significant, it was important. His estimate of the value of the level of soft resins allowed him to reject about one third of the standard hops that had previously been acceptable to the brewery, greatly increasing the bottom line and proving the value of his statistical methods.
So, why Student’s t-test, rather than Gosset’s t-test? An earlier publication by one of the scientific brewers had revealed some trade secrets, to which the company responded by banning all publications by its employees. Gosset pleaded to be allowed to share his results, and the company finally agreed that its staff could publish as long as any connection with Guinness remained hidden, so that other brewers would not realise that these statistical methods were the foundations of the superior consistency of Guinness. Gosset chose the pseudonym “Student”4 and although it was common knowledge by the 1930s that Gosset was Student, the secret was not officially revealed until after his death in 1937.
- Joan Fisher Box (1987). Guinness, Gosset, Fisher, and Small Samples. Statistical Science, 2. 45–52. http://www.jstor.org/stable/2245613
- Stephen T. Ziliak (2008). Retrospectives: Guinnessometrics: The Economic Foundation of “Student’s” t. Journal of Economic Perspectives. 22. 199–216. DOI: 10.1257/jep.22.4.199.
- Stephen T. Ziliak (2011). W.S. Gosset and Some Neglected Concepts in Experimental Statistics: Guinnessometrics II. Journal of Wine Economics. 6. 252–277 10.1017/S1931436100001632.
- Photo by User Wujaszek on pl.wikipedia – scanned from Gosset’s obituary in Annals of Eugenics, Public Domain, Link
- A million and a half barrels a year at the start of the 20th century. Funny thing; you don’t see many artisans trying to brew a tastier Guinness. Maybe there’s something to all this scientific quality control after all. [↩]
- Strictly speaking, they want to know whether the two populations are different, and that’s not quite the same thing, but we’ll let that slide. [↩]
- It was R.A. Fisher who started the nonsense of an absolute threshold for considering a probability “significant”. Gosset would have no truck with such an arbitrary limit. [↩]
- Others chose “Sophister” and “Mathetes” and remain essentially unknown. [↩]