Galton Institute Home Page March 2003 Newsletter Contents Newsletter Index

Who Is Sir Francis Galton?

Gary E Pittman

XII Conclusion

“Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.”

H G Wells

It is now more than a year since the previous article in this series. So now I would like to draw together the themes of the articles in the series and point the reader to material that deals in greater detail with Francis Galton’s scientific insights.

Clearly Francis Galton is most remembered for initiating the study of eugenics. Equally clearly, this contribution to twentieth century thinking has become deeply misunderstood by many people, not because of anything Galton said or did but because of actions of others, many of which he would have thoroughly condemned. But the focus of my articles has been Galton’s contribution to statistical thinking, a legacy that survives intact if largely unrecognised.

There are two lessons that I would like to draw from my articles. The first is the most obvious – that Galton’s creative mind was responsible for many of the ideas that his successors developed into coherent statistical theory. The other, not much less obvious, is that statistical concepts are still not well appreciated by the majority of the population – including senior politicians and leaders of industry. The consequence is much erroneous decision-making.

I have presented in previous articles examples of the efficacy of statistical thinking in the world of industry where it has been fairly evaluated and pursued with energy and dedication by management. These were just the tip of the iceberg.

It just may be possible that if we could use statistical thinking correctly in every area of life where we have appropriate data, soon the positive results would begin to accrue in such a profound way that very few could object to its use.

But how can we get acceptance of this proven, powerful methodology? Well, I don’t think any amount of marketing and sales can accomplish what is needed. People are reluctant to change if there is any risk involved until and unless change is forced upon them. So what can be done?

The following is a quotation from a native of Kenya, who was involved in his country’s efforts to save the native wildlife from extinction:

So what is needed is to begin statistical thinking early; in the elementary grades, perhaps even in kindergarten. Children will enjoy learning about distributions, stable and unstable processes, common and assignable causes, and children will respond to the connection between statistical methods and conservation of the earth’s resources.

In the book “The Bell Curve” there is an appendix which addresses the issue of teaching children statistical thinking. And Francis Galton himself gave us many ideas which we can use with children – the Quincunx would certainly have a role to play.

Once children grow up with a good understanding of statistical thinking and become managers and directors of many endeavours, they will not have the concerns and anxieties that the present leaders have with regard to the acceptance of statistical thinking.

At this time we would probably have general agreement that children should be taught that telling the truth is the right thing to do. Statistical thinking is a methodology which enables us to know quantitatively what the truth is and provides guidance on the type of action to be taken in order to have continual improvement over time.

So how do we get the present generation of school teachers, administrators, taxpayers, politicians and others to support such a thing? Frankly, I do not know – I solicit the support of readers of these articles in generating some forces in the right direction.

What is needed is perhaps a modest beginning, a small start and success in the sense that the children enjoy learning about this new world of ideas.

There can be no doubt about the importance that Francis Galton placed upon education. Remember that he funded the first department of statistics in the University of London. He also bequeathed £45,000 to the University of London for continued support and research in the areas of genetics, biometrics and statistics.

But although Francis Galton could operate on a grand scale, he was also practical and down to earth. So it should come as no surprise to discover that he devised some extremely simple yet quite effective ways to teach the basic principles of statistics.

Galton delivered the prestigious Herbert Spencer Lecture at the University of Oxford on 5th June 1907. In this lecture he said “Most persons of ordinary education seem to know nothing about [statistical principles], not even understanding their technical terms, much less appreciating the cogency of their results”. In my own experience, nearly 100 years later, little has changed. Galton proposed a remedy consisting of five lessons each of one hour which “would be sufficient to introduce the learner into a new world of ideas, extraordinarily wide in their application”.

Galton went on to give a brief description of each of the five lessons, many of which would be excellent for children today – see his Essays in Eugenics published by the Eugenics Education Society in 1909.

A wonderful opportunity that goes along with teaching statistical thinking is the possibility of introducing children to Sir Francis Galton. As discussed earlier, unlike many heroes of the past, Sir Francis’ influence is present today, and growing in importance.

I should stress that I am not advocating that every child be equipped to become a classical statistician, any more than learning the multiplication tables leads inevitably to becoming a mathematician. My thinking is simply that every child should be able to understand and use the simple concepts that characterise a distribution by completion of secondary education.

By the completion of an undergraduate degree in any field, students should be able to use simple statistical methods to determine the mean, median and standard deviation of a set of data and for any given process be able to determine what is happening, why it is happening and what actions might bring about improvement.

Distributions are very important because they give you the whole picture, not just a carefully selected example. Instances of the latter are frequently encountered in advertisements for products as diverse as slimming aids and investment services.

With a population educated in statistical thinking we will be able with a very high success rate to answer the following questions with regard to processes that are personal, regional, national and global:

  1. What is the current status of the
    process?

  2. Is the process stable or unstable?

  3. Is change desirable?

  4. What is the best way to accomplish change?

In the examples described in earlier articles, a great deal of money, time and material was saved. But this is just the tip of the iceberg. If these results could be replicated to include a majority of all processes in existence, the results would be astounding.

And society needs such results; not just to save money and increase profits, but to stretch the earth’s resources and to use some of the savings to learn how to alleviate human suffering.

In summary, education of the population with regard to statistical thinking is vital because it can lead to a better world, using methods based on natural laws and having as its strongest characteristic the continual pursuit of truth.

Finally, I present a note to demonstrate the sort of lesson that could be presented to make the statistical thinking interesting while offering real insight into the underlying concepts.

Common and Assignable Causes

One of the initially puzzling aspects of statistics is the observation that so many continuous variables follow the distinctive shape of the normal distribution. The explanation for this can be seen if we assume the measured property to be the result of adding up or averaging the values of some very much less significant factors, probably not themselves easily measured and each one following not a normal distribution but taking one of a small number of equally likely discrete values. There may be hundreds, even thousands or tens of thousands of these factors.

Francis Galton called these factors “small” causes; the developer of the control chart, Walter Shewhart, called them “chance” causes. And W Edwards Deming called them common causes. Although we can sometimes identify these individual causes, more often we cannot. When we do, we can say, after the fact, that the cause is one we would expect to see in the process being examined, that there is nothing unexpected or unusual about it. Egon Pearson, a co-developer of the control chart in England, has stated “But there will usually be very many causes which cannot be separated out because they lie outside the existing boundaries of knowledge”.

On the other hand, large forces that cause a distribution to deviate from normality, have been called “assignable causes”. They are forces that are foreign to a process, may appear only once, or may appear repeatedly, until the cause is “assigned” an identity, and removed from the process. Such causes are, when identified, clearly seen to be abnormal, or unexpected for a given process

The control chart below in Fig. 1 will help us to understand more about common causes and assignable causes. Let’s say that the characteristic being observed is a critical mechanical dimension. When all of the sample data points are between the control limits, the process is said to be “in statistical control”, or in other words, in control of the common causes which are a normal or expected part of the process. But when sample number 9 is plotted, we see a value beyond the upper control limit, meaning the process has shifted. The engineer responsible for this product is able to ascertain that a setting on a machine has been inadvertently changed. The engineer resets the control, and the assignable cause is removed from the process. When the incorrect setting is found, it is easy to assign the problem to it, hence “assignable cause”.

A question for the reader: has the root cause of the problem been discovered and eliminated?

A simple experiment, which anyone can perform, will help to illustrate how “small causes” can result in a process output which tends toward a normal distribution. Let us pretend that a single die (such as used in a board game) represents a single small cause, or small force. The influence of this force in the process can vary from a value of 1 to 6.

Table 1: Average Values of Two Dice Thrown Together

Second Die

First
Die

 

 

 

 

1

2

3

4

5

6

1

1

1.5

2

2.5

3

3.5

2

1.5

2

2.5

3

3.5

4

3

2

2.5

3

3.5

4

4.5

4

2.5

3

3.5

4

4.5

5

5

3

3.5

4

4.5

5

5.5

6

3.5

4

4.5

5

5.5

6

Let us start with just two such ordinary dice. Let us construct a table such as the one above which has all the possible combinations of the two dice. What we want is to obtain the average value of each possible pair. Two ones would have an average value of one; a six and a four, an average value of five.

Now let us construct a frequency distribution of the averages just obtained. We see that the most likely value is 3.5 and when our distribution is complete we can already see the primitive beginnings of a normal distribution – Figure (b).

Now let us look at frequency distributions using the results from three, five and eight dice (Figures (c) to (e)). What a difference in the shape by adding six more dice! Now imagine if we used 100 dice; or 1000 dice. Keep in mind that real processes have large numbers of causes.

Because there are so many factors – “small causes” – that determine the outcome of the process, many or most of which remain indeterminate, one should look with caution on the pronouncements of “experts” who claim to be able to determine success or failure based on only a few characteristics. Such as: “Successful people all have these seven characteristics”. You can probably find some unsuccessful people with the same ones. Or: “We track 20 crucial indices of every company we invest in”. Here one would like to ask for the distribution of performances of all the companies involved.