| Galton Institute Home Page | September 2001 Newsletter Contents | Newsletter Index |
Prediction
Francis Galton, a hundred years ago, knew that a stable process was predictable. That is, if no foreign causes impact a stable process, then it will continue to behave in the future the same as it has behaved in the past.

The control chart from which Figure 1 is extracted is a great example of predictability. The characteristic for this chart is “Orders Entered each Month”.
An annual staff meeting was held to produce the forecast for the next fiscal year. One of the items to be forecast was Orders Entered for 1989-1990. I brought to the meeting a control chart for the fiscal year just concluded. It was decided to show an increase of about 25% for the next year. I pointed out that unless something significant was done to change the system which resulted in orders, that the amount of orders entered would remain nearly the same as the preceding year. I asked what specific measures would be taken that would differ from the ways we had been doing business. The answer was sort of a vague, “Well, we’ll just have to work harder this next year.” and “I’m sure we can figure out something.”
You can see from the chart that the next year was almost identical to the previous one. A difference of only 2.5%. If the responsible managers had understood the basic principles of statistical thinking, they would perhaps have made a genuine effort to find some new, innovative ways to make a significant change in our sales and marketing efforts. As you can see, nothing was done, and the prediction was right on target.
The Dow-Jones Average

A while ago, I developed a control chart for the Dow-Jones Average (Figure 2). Usually, when one devises a control chart, the characteristic of interest is one whose average value is expected to be constant over time when the related process is stabilised.
The characteristic usually selected for the D-J Average is its average value. But, because the D-J Average is constantly changing, usually increasing in the long term, a conventional control chart will be constantly going out of control (exceeding the control limits) even though the cause is known.
So I chose to use a characteristic which allows for the changing value of the Average, namely, the percent increase or decrease from one week to the next, and I chose to compute this change using the values for the Monday close. The characteristic is computed in the following way: the previous week’s value is subtracted from the current week’s value and the result is then divided by the previous week’s value, and changed to a percent. I was very interested to see if the weekly values would be a nearly normal distribution about some average, and the early results give a very positive indication. The average of the weekly percent changes is 0.6 percent per week. Now we can predict what the D-J Average is going to do over a period of weeks (as long as no unexpected causes occur).
Of course, stock brokers and stock market analysts have developed all sorts of charts and charting techniques, but these are nearly always based on empirical data, with no valid theory supporting them.
As we can see from the chart, it does go out of control for the entry of November 27, when the D-J Average dropped 9.6% from the previous week. But the chart was showing good stability, and there were no serious problems with the economy; on the contrary.
We know that the drop was brought on by ill-advised trading and panic on the part of brokers and fund managers. Therefore, the small causes determining the intrinsic performance of the market were unchanged and a quick recovery was predicted by the control chart. As the chart shows, this almost immediate rebound did occur.
Now, let’s look at what I call a “public control chart”. This is the way I think some types of information should be presented to the public; in newspapers, and magazines.
Control charts, properly executed, do not exaggerate, nor do they soften the facts; they just tell the unvarnished truth about the characteristic being charted.
All this will be quite evident after we have examined the next chart.
Texas Unemployment

Figure 3 is taken from the Dallas Morning News, February 2, 1990. The accompanying headline says, “Texas jobless rate falls sharply”. The underline is mine, to point out how the media uses provocative words to add interest to a statement.
And you might say, “Well, the graph does show a sharp fall.”
But, if you look closely, you will see that the vertical scale does not start at zero, but rather at 4.0. This distorts the relative difference between charted values. For example, the true relative difference between the values for December and January is about 1/5 of the December value, yet the graph makes the difference seem to be more than 1/2. So “sharply” is perhaps an appropriate word for a change of 1/2, but perhaps not for a change of 1/5.
This kind of graphical distortion is quite common. A flagrant example is the type of bar graph often shown for the Dow-Jones average on the daily or network news broadcasts.

Let’s take the Morning News data (from the U.S Bureau of Labor Statistics), and construct a control chart (Figure 4 - ignore the second February point for the moment). We see that although there is some variability over the fifteen months shown, there are no points above or below the control limits. This indicates a stable process, the output of which is the unemployment rate.
Let me say, and you’ll have to take my word for this, that if a process is stable, we don’t expect to see two points in a row as close to the control limit as the value for January. Therefore, we can predict, since the data shows a stable process up to the present, and we have no reason to suspect that the process is going out of control; we can make the prediction that the value for February will be higher than for January - see final, predicted point in Figure 4.

In Figure 5 we see the actual data as published in the Dallas Morning News for Saturday 10 March 1990, accompanied by the provocative headline “Texas jobless rate surges”.
The previous comments about the distortion in the graphics apply here also. But what does the control chart show? Well, just as predicted, the rate is back up. But note that the value for February is still below the average for the 16 months charted.
Hardly a “surge” in the statistical sense! And the statement of Nic Santangelo who works in the labour statistics office in Dallas, reported in the Dallas Morning News article, that “We’ve got slow, steady growth” has no basis in fact. He is also reported as making the astounding statement that “Our employment rate has been in the 6 to 6.5 range all along.”
A look at the data reveals that the range is from 5.2 to 7.5, a significant difference. An important fact is that the average value for the rate is unchanging at 6.6, and until some significant change occurs to the system, it will remain at 6.6.
So all of the exaggerated words and graphics are quite misleading, and could cause the unwary to make poor or even disastrous decisions as a result of trusting the media presentations.
At the time when I first generated the information above, several years ago, I had an interview with a reporter from the business section of the Dallas Morning News. I showed her what you have just seen, and asked her about the possibility of using control charts to present data to the public that conveyed the truth rather than wild, unjustified claims.
Her response was that the public liked wild actions, and would be bored by comments, however truthful, that “nothing is really happening of significance”.
It is my hope that the reader will see the power of statistical methods from the examples above. If so, you will share with me the wonder of a society in which truth replaces ignorance, using the tools and thinking of statistical methods.
If the 3% utilisation in industry mentioned above is true, and I believe it is optimistic, from my own experience, then the utilisation elsewhere in society must be orders of magnitude less. What a wonderful opportunity presented to us by Francis Galton, and his ongoing legacy.
For it is really not a lack of data (or information) that thwarts mankind, although the computer moguls would have you think just the opposite, but the lack of proper insights into what the data means that is the problem. And only statistical methods can provide these insights.