"Little experience is sufficient to show that the traditional machinery of statistical processes is wholly unsuited to the needs of practical research. Not only does it take a cannon to shoot a sparrow, but it misses the sparrow! The elaborate mechanism built on the theory of infinitely large samples is not accurate enough for simple laboratory data. Only by systematically tackling small sample problems on their metrics does it seem possible to apply accurate tests to practical data." (Ronald A Fisher, "Statistical Methods for Research Workers", 1925)
"The postulate of randomness thus resolves itself into the question, ‘of what population is this a random sample?’ which must frequently be asked by every practical statistician." (Ronald A Fisher, "On the Mathematical Foundation of Theoretical Statistics", Philosophical Transactions of the Royal Society of London Vol. A222, 1922)
"Null hypotheses of no difference are usually known to be false before the data are collected [...] when they are, their rejection or acceptance simply reflects the size of the sample and the power of the test, and is not a contribution to science." (I Richard Savage, "Nonparametric Statistics", Journal of the American Statistical Association 52, 1957)
"Assumptions that we make, such as those concerning the form of the population sampled, are always untrue." (David R Cox, "Some problems connected with statistical inference", Annals of Mathematical Statistics 29, 1958)
"[...] a priori reasons for believing that the null hypothesis is generally false anyway. One of the common experiences of research workers is the very high frequency with which significant results are obtained with large samples." (David Bakan, "The test of significance in psychological research", Psychological Bulletin 66, 1966)
"People have erroneous intuitions about the laws of chance. In particular, they regard a sample randomly drawn from a population as highly representative, that is, similar to the population in all essential characteristics. The prevalence of the belief and its unfortunate consequences for psychological research are illustrated by the responses of professional psychologists to a questionnaire concerning research decisions." (Amos Tversky & Daniel Kahneman, "Belief in the law of small numbers", Psychological Bulletin 76(2), 1971)
"[...] too many users of the analysis of variance seem to regard the reaching of a mediocre level of significance as more important than any descriptive specification of the underlying averages Our thesis is that people have strong intuitions about random sampling; that these intuitions are wrong in fundamental respects; that these intuitions are shared by naive subjects and by trained scientists; and that they are applied with unfortunate consequences in the course of scientific inquiry. We submit that people view a sample randomly drawn from a population as highly representative, that is, similar to the population in all essential characteristics. Consequently, they expect any two samples drawn from a particular population to be more similar to one another and to the population than sampling theory predicts, at least for small samples." (Amos Tversky & Daniel Kahneman, "Belief in the law of small numbers", Psychological Bulletin 76(2), 1971)
"It would help if the standard statistical programs did not generate t statistics in such profusion. The programs might be written to ask, 'Do you really have a probability sample?', 'By what standard would you judge a fitted coefficient large or small?' Or perhaps they could merely say, printed in bold capitals beside each equation, 'So What Else Is New?'" (Donald M McCloskey, "The Loss Function Has Been Mislaid: The Rhetoric of Significance Tests", American Economic Review Vol. 75, 1985)
"Since a point hypothesis is not to be expected in practice to be exactly true, but only approximate, a proper test of significance should almost always show significance for large enough samples. So the whole game of testing point hypotheses, power analysis notwithstanding, is but a mathematical game without empirical importance." (Louis Guttman, "The illogic of statistical inference for cumulative science", Applied Stochastic Models and Data Analysis, 1985)
"A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that’s the only way you can take it in formal hypothesis testing), is always false in the real world[...]. If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what’s the big deal about rejecting it." (Jacob Cohen, "Things I have learned (so far)", American Psychologist 45, 1990)
"Unfortunately, when applied in a cook-book fashion, such significance tests do not extract the maximum amount of information available from the data. Worse still, misleading conclusions can be drawn. There are at least three problems: (1) a conclusion that there is a significant difference can often be reached merely by collecting enough samples; (2) a statistically significant result is not necessarily practically significant; and (3) reports of the presence or absence of significant differences for multiple tests are not comparable unless identical sample sizes are used." (Graham B McBride et al, "What do significance tests really tell us about the environment?", Environmental Management 17, 1993)
"Statistical hypothesis testing is commonly used inappropriately to analyze data, determine causality, and make decisions about significance in ecological risk assessment,[...] It discourages good toxicity testing and field studies, it provides less protection to ecosystems or their components that are difficult to sample or replicate, and it provides less protection when more treatments or responses are used. It provides a poor basis for decision-making because it does not generate a conclusion of no effect, it does not indicate the nature or magnitude of effects, it does address effects at untested exposure levels, and it confounds effects and uncertainty[...]. Risk assessors should focus on analyzing the relationship between exposure and effects[...]." (Glenn W Suter, "Abuse of hypothesis testing statistics in ecological risk assessment", Human and Ecological Risk Assessment 2, 1996)
"The standard error of most statistics is proportional to 1 over the square root of the sample size. God did this, and there is nothing we can do to change it." (Howard Wainer, "Improving Tabular Displays, With NAEP Tables as Examples and Inspirations", Journal of Educational and Behavioral Statistics Vol 22 (1), 1997)
"It is not always convenient to remember that the right model for a population can fit a sample of data worse than a wrong model - even a wrong model with fewer parameters. We cannot rely on statistical diagnostics to save us, especially with small samples. We must think about what our models mean, regardless of fit, or we will promulgate nonsense." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)
"It’s a commonplace among statisticians that a chi-squared test (and, really, any p-value) can be viewed as a crude measure of sample size: When sample size is small, it’s very difficult to get a rejection (that is, a p-value below 0.05), whereas when sample size is huge, just about anything will bag you a rejection. With large n, a smaller signal can be found amid the noise. In general: small n, unlikely to get small p-values. Large n, likely to find something. Huge n, almost certain to find lots of small p-values." (Andrew Gelman, "The sample size is huge, so a p-value of 0.007 is not that impressive", 2009)
"Why are you testing your data for normality? For large sample sizes the normality tests often give a meaningful answer to a meaningless question (for small samples they give a meaningless answer to a meaningful question)." (Greg Snow, "R-Help", 2014)
"The Dirty Data Theorem states that 'real world' data tends to come from bizarre and unspecifiable distributions of highly correlated variables and have unequal sample sizes, missing data points, non-independent observations, and an indeterminate number of inaccurately recorded values." (Anon, Statistically Speaking)
"The old rule of trusting the Central Limit Theorem if the sample size is larger than 30 is just that–old. Bootstrap and permutation testing let us more easily do inferences for a wider variety of statistics." (Tim Hesterberg)
"While the main emphasis in the development of power analysis has been to provide methods for assessing and increasing power, it should also be noted that it is possible to have too much power. If your sample is too large, nearly any difference, no matter how small or meaningless from a practical standpoint, will be ‘statistically significant’." (Clay Helberg)