I was discussing testing hypotheses with Damoon several days ago and I thought it is not clear how one can judge a theory using random observations. I came up with a simple example and thought better to share it with you.
It is counter-intuitive how we can extract evidence from data, if somebody brings it. There are many paradigms, I do not want to go into the details to make long and boring statements. I suggest the classic example: tossing a coin.
Lets ask a simple question: what sort of data is an evidence against fairness (or unfairness) of a coin? From fair I mean 1/2 chance of getting a Head, and 1/2 chance of getting a Tail.
Suppose somebody tosses a coin 4 times and gets: Tail Tail Tail Tail.
Such a coin looks suspicious, right? It seems we tend to believe the coin produces more Tails than Heads.
It is widely accepted we vote against a theory (a theory sometimes is called assumption, sometimes called hypothesis) that produces suspicious results; from the result I mean data.
To have a better understanding of a suspicious result, lets compute what is the probability that a fair coin gives 4 Tails in 4 trials.
(1/2)^4 ~ 0.06
Usually the threshold between being suspicious and being evidence is 0.05 (sometimes this value is 0.01 if a scientist is conservative). This quantity is related to p-value and testing statistical hypothesis. Interpretation of p-value is difficult, if you are interested, see this paper.
Therefore, a scientist lives with the fair coin hypothesis if tosses a coin 4 times and gets 4 tails.
Now suppose we toss the coin five times and get Tail Tail Tail Tail Tail. Then what would be the decision of a scientist? Such data can be produced under the fair coin hypothesis with probability
(1/2)^5 ~ 0.03<0.05. So a scientist will believe that the coin is unfair!
I suggest you to toss a coin 5 times, I bet you get at least one Head or Tail, try it if you do not believe me.