25 November 2024

🦋Science: On Samples (Quotes)

"Little experience is sufficient to show that the traditional machinery of statistical processes is wholly unsuited to the needs of practical research. Not only does it take a cannon to shoot a sparrow, but it misses the sparrow! The elaborate mechanism built on the theory of infinitely large samples is not accurate enough for simple laboratory data. Only by systematically tackling small sample problems on their metrics does it seem possible to apply accurate tests to practical data." (Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"The postulate of randomness thus resolves itself into the question, ‘of what population is this a random sample?’ which must frequently be asked by every practical statistician." (Ronald  A Fisher, "On the Mathematical Foundation of Theoretical Statistics", Philosophical Transactions of the Royal Society of London Vol. A222, 1922) 

"Null hypotheses of no difference are usually known to be false before the data are collected [...] when they are, their rejection or acceptance simply reflects the size of the sample and the power of the test, and is not a contribution to science." (I Richard Savage, "Nonparametric Statistics", Journal of the American Statistical Association 52, 1957)

"Assumptions that we make, such as those concerning the form of the population sampled, are always untrue." (David R Cox, "Some problems connected with statistical inference", Annals of Mathematical Statistics 29, 1958)

"[...] a priori reasons for believing that the null hypothesis is generally false anyway. One of the common experiences of research workers is the very high frequency with which significant results are obtained with large samples." (David Bakan, "The test of significance in psychological research", Psychological Bulletin 66, 1966)

"People have erroneous intuitions about the laws of chance. In particular, they regard a sample randomly drawn from a population as highly representative, that is, similar to the population in all essential characteristics. The prevalence of the belief and its unfortunate consequences for psychological research are illustrated by the responses of professional psychologists to a questionnaire concerning research decisions." (Amos Tversky & Daniel Kahneman, "Belief in the law of small numbers", Psychological Bulletin 76(2), 1971)

"[...] too many users of the analysis of variance seem to regard the reaching of a mediocre level of significance as more important than any descriptive specification of the underlying averages Our thesis is that people have strong intuitions about random sampling; that these intuitions are wrong in fundamental respects; that these intuitions are shared by naive subjects and by trained scientists; and that they are applied with unfortunate consequences in the course of scientific inquiry. We submit that people view a sample randomly drawn from a population as highly representative, that is, similar to the population in all essential characteristics. Consequently, they expect any two samples drawn from a particular population to be more similar to one another and to the population than sampling theory predicts, at least for small samples." (Amos Tversky & Daniel Kahneman, "Belief in the law of small numbers", Psychological Bulletin 76(2), 1971) 

"It would help if the standard statistical programs did not generate t statistics in such profusion. The programs might be written to ask, 'Do you really have a probability sample?', 'By what standard would you judge a fitted coefficient large or small?' Or perhaps they could merely say, printed in bold capitals beside each equation, 'So What Else Is New?'" (Donald M McCloskey, "The Loss Function Has Been Mislaid: The Rhetoric of Significance Tests", American Economic Review Vol. 75, 1985)

"Since a point hypothesis is not to be expected in practice to be exactly true, but only approximate, a proper test of significance should almost always show significance for large enough samples. So the whole game of testing point hypotheses, power analysis notwithstanding, is but a mathematical game without empirical importance." (Louis Guttman, "The illogic of statistical inference for cumulative science", Applied Stochastic Models and Data Analysis, 1985)

"A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that’s the only way you can take it in formal hypothesis testing), is always false in the real world[...]. If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what’s the big deal about rejecting it." (Jacob Cohen, "Things I have learned (so far)", American Psychologist 45, 1990)

"Unfortunately, when applied in a cook-book fashion, such significance tests do not extract the maximum amount of information available from the data. Worse still, misleading conclusions can be drawn. There are at least three problems: (1) a conclusion that there is a significant difference can often be reached merely by collecting enough samples; (2) a statistically significant result is not necessarily practically significant; and (3) reports of the presence or absence of significant differences for multiple tests are not comparable unless identical sample sizes are used." (Graham B McBride et al, "What do significance tests really tell us about the environment?", Environmental Management 17, 1993)

"Statistical hypothesis testing is commonly used inappropriately to analyze data, determine causality, and make decisions about significance in ecological risk assessment,[...] It discourages good toxicity testing and field studies, it provides less protection to ecosystems or their components that are difficult to sample or replicate, and it provides less protection when more treatments or responses are used. It provides a poor basis for decision-making because it does not generate a conclusion of no effect, it does not indicate the nature or magnitude of effects, it does address effects at untested exposure levels, and it confounds effects and uncertainty[...]. Risk assessors should focus on analyzing the relationship between exposure and effects[...]."  (Glenn W Suter, "Abuse of hypothesis testing statistics in ecological risk assessment", Human and Ecological Risk Assessment 2, 1996)

"The standard error of most statistics is proportional to 1 over the square root of the sample size. God did this, and there is nothing we can do to change it." (Howard Wainer, "Improving Tabular Displays, With NAEP Tables as Examples and Inspirations", Journal of Educational and Behavioral Statistics Vol 22 (1), 1997)

"It is not always convenient to remember that the right model for a population can fit a sample of data worse than a wrong model - even a wrong model with fewer parameters. We cannot rely on statistical diagnostics to save us, especially with small samples. We must think about what our models mean, regardless of fit, or we will promulgate nonsense." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"It’s a commonplace among statisticians that a chi-squared test (and, really, any p-value) can be viewed as a crude measure of sample size: When sample size is small, it’s very difficult to get a rejection (that is, a p-value below 0.05), whereas when sample size is huge, just about anything will bag you a rejection. With large n, a smaller signal can be found amid the noise. In general: small n, unlikely to get small p-values. Large n, likely to find something. Huge n, almost certain to find lots of small p-values." (Andrew Gelman, "The sample size is huge, so a p-value of 0.007 is not that impressive", 2009)

"Why are you testing your data for normality? For large sample sizes the normality tests often give a meaningful answer to a meaningless question (for small samples they give a meaningless answer to a meaningful question)." (Greg Snow, "R-Help", 2014)

"The Dirty Data Theorem states that 'real world' data tends to come from bizarre and unspecifiable distributions of highly correlated variables and have unequal sample sizes, missing data points, non-independent observations, and an indeterminate number of inaccurately recorded values." (Anon, Statistically Speaking)

"The old rule of trusting the Central Limit Theorem if the sample size is larger than 30 is just that–old. Bootstrap and permutation testing let us more easily do inferences for a wider variety of statistics." (Tim Hesterberg)

"While the main emphasis in the development of power analysis has been to provide methods for assessing and increasing power, it should also be noted that it is possible to have too much power. If your sample is too large, nearly any difference, no matter how small or meaningless from a practical standpoint, will be ‘statistically significant’." (Clay Helberg) 

24 November 2024

🦋Science: On Confidence Interval (Just the Quotes)

"Confidence intervals give a feeling of the uncertainty of experimental evidence, and (very important) give it in the same units [...] as the original observations." (Mary G Natrella, "The relation between confidence intervals and tests of significance", American Statistician 14, 1960)

"One reason for preferring to present a confidence interval statement (where possible) is that the confidence interval, by its width, tells more about the reliance that can be placed on the results of the experiment than does a YES-NO test of significance." (Mary G Natrella, "The relation between confidence intervals and tests of significance", American Statistician 14, 1960)

"Evaluation of the statistical reliability of a set of results is not mere calculation of standard errors and confidence limits. The statistician must go far beyond the statistical methods in textbooks. He must evaluate uncertainty in terms of possible uses of the data. Some of this writing is not statistical but draws on assistance from the expert in the subject-matter." (W Edwards Deming, "Principles of Professional Statistical Practice", Annals of Mathematical Statistics, 36(6), 1965)

"Significance levels are usually computed and reported, but power and confidence limits are not. Perhaps they should be." (Amos Tversky & Daniel Kahneman, "Belief in the law of small numbers", Psychological Bulletin 76(2), 1971)

"It is usually wise to give a confidence interval for the parameter in which you are interested." (David S Moore & George P McCabe, "Introduction to the Practice of Statistics", 1989) 

"I do not think that significance testing should be completely abandoned [...] and I don’t expect that it will be. But I urge researchers to provide estimates, with confidence intervals: scientific advance requires parameters with known reliability estimates. Classical confidence intervals are formally equivalent to a significance test, but they convey more information." (Nigel G Yoccoz, "Use, Overuse, and Misuse of Significance Tests in Evolutionary Biology and Ecology", Bulletin of the Ecological Society of America Vol. 72 (2), 1991)

"Whereas hypothesis testing emphasizes a very narrow question (‘Do the population means fail to conform to a specific pattern?’), the use of confidence intervals emphasizes a much broader question (‘What are the population means?’). Knowing what the means are, of course, implies knowing whether they fail to conform to a specific pattern, although the reverse is not true. In this sense, use of confidence intervals subsumes the process of hypothesis testing." (Geoffrey R Loftus, "On the tyranny of hypothesis testing in the social sciences", Contemporary Psychology 36, 1991)

"Probabilistic inference is the classical paradigm for data analysis in science and technology. It rests on a foundation of randomness; variation in data is ascribed to a random process in which nature generates data according to a probability distribution. This leads to a codification of uncertainly by confidence intervals and hypothesis tests." (William S Cleveland, "Visualizing Data", 1993)

"[...] they [confidence limits] are rarely to be found in the literature. I suspect that the main reason they are not reported is that they are so embarrassingly large!" (Jacob Cohen, "The earth is round (p<.05)", American Psychologist 49, 1994)

"I contend that the general acceptance of statistical hypothesis testing is one of the most unfortunate aspects of 20th century applied science. Tests for the identity of population distributions, for equality of treatment means, for presence of interactions, for the nullity of a correlation coefficient, and so on, have been responsible for much bad science, much lazy science, and much silly science. A good scientist can manage with, and will not be misled by, parameter estimates and their associated standard errors or confidence limits." (Marks Nester, "A Myopic View and History of Hypothesis Testing", 1996)

"We should push for de-emphasizing some topics, such as statistical significance tests - an unfortunate carry-over from the traditional elementary statistics course. We would suggest a greater focus on confidence intervals - these achieve the aim of formal hypothesis testing, often provide additional useful information, and are not as easily misinterpreted." (Gerry Hahn et al, "The Impact of Six Sigma Improvement: A Glimpse Into the Future of Statistics", The American Statistician, 1999)

"Distinguish among confidence, prediction, and tolerance intervals. Confidence intervals are statements about population means or other parameters. Prediction intervals address future (single or multiple) observations. Tolerance intervals describe the location of a specific proportion of a population, with specified confidence." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"Precision does not vary linearly with increasing sample size. As is well known, the width of a confidence interval is a function of the square root of the number of observations. But it is more complicate than that. The basic elements determining a confidence interval are the sample size, an estimate of variability, and a pivotal variable associated with the estimate of variability." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"The important thing is to understand that frequentist and Bayesian methods are answering different questions. To combine prior beliefs with data in a principled way, use Bayesian inference. To construct procedures with guaranteed long run performance, such as confidence intervals, use frequentist methods. Generally, Bayesian methods run into problems when the parameter space is high dimensional." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004) 

"There is a tendency to use hypothesis testing methods even when they are not appropriate. Often, estimation and confidence intervals are better tools. Use hypothesis testing only when you want to test a well-defined hypothesis." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Estimating the missing values in a dataset solves one problem - imputing reasonable values that have well-defined statistical properties. It fails to solve another, however - drawing inferences about parameters in a model fit to the estimated data. Treating imputed values as if they were known (like the rest of the observed data) causes confidence intervals to be too narrow and tends to bias other estimates that depend on the variability of the imputed values (such as correlations)." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Scholars feel the need to present tables of model parameters in academic articles (perhaps just as evidence that they ran the analysis they claimed to have run), but these tables are rarely interpreted other than for their sign and statistical significance. Most of the numbers in these tables are never even discussed in the text. From the perspective of the applied data analyst, R packages without procedures to compute quantities of scientific interest are woefully incomplete. A better approach focuses on quantities of direct scientific interest rather than uninterpretable model parameters. [...] For each quantity of interest, the user needs some summary that includes a point estimate and a measure of uncertainty such as a standard error, confidence interval, or a distribution. The methods of calculating these differ greatly across theories of inference and methods of analysis. However, from the user’s perspective, the result is almost always the same: the point estimate and uncertainty of some quantity of interest." (Kousuke Imai et al, "Toward a Common Framework for Statistical Analysis and Development", Journal of Computational and Graphical Statistics vol. 17, 2008)

"Given the important role that correlation plays in structural equation modeling, we need to understand the factors that affect establishing relationships among multivariable data points. The key factors are the level of measurement, restriction of range in data values (variability, skewness, kurtosis), missing data, nonlinearity, outliers, correction for attenuation, and issues related to sampling variation, confidence intervals, effect size, significance, sample size, and power." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"A complete data analysis will involve the following steps: (i) Finding a good model to fit the signal based on the data. (ii) Finding a good model to fit the noise, based on the residuals from the model. (iii) Adjusting variances, test statistics, confidence intervals, and predictions, based on the model for the noise.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"For a confidence interval, the central limit theorem plays a role in the reliability of the interval because the sample mean is often approximately normal even when the underlying data is not. A prediction interval has no such protection. The shape of the interval reflects the shape of the underlying distribution. It is more important to examine carefully the normality assumption by checking the residuals […].(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"More useful than a statement that an experiment’s results were statistically insignificant is a confidence interval giving plausible sizes for the effect. Even if the confidence interval includes zero, its width tells you a lot: a narrow interval covering zero tells you that the effect is most likely small (which may be all you need to know, if a small effect is not practically useful), while a wide interval clearly shows that the measurement was not precise enough to draw conclusions." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"Overlapping confidence intervals do not mean two values are not significantly different. Checking confidence intervals or standard errors will mislead. It’s always best to use the appropriate hypothesis test instead. Your eyeball is not a well-defined statistical procedure." (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"There is exactly one situation when visually checking confidence intervals works, and it is when comparing the confidence interval against a fixed value, rather than another confidence interval. If you want to know whether a number is plausibly zero, you may check to see whether its confidence interval overlaps with zero. There are, of course, formal statistical procedures that generate confidence intervals that can be compared by eye and that even correct for multiple comparisons automatically. Unfortunately, these procedures work only in certain circumstances;" (Alex Reinhart, "Statistics Done Wrong: The Woefully Complete Guide", 2015)

"Samples give us estimates of something, and they will almost always deviate from the true number by some amount, large or small, and that is the margin of error. […] The margin of error does not address underlying flaws in the research, only the degree of error in the sampling procedure. But ignoring those deeper possible flaws for the moment, there is another measurement or statistic that accompanies any rigorously defined sample: the confidence interval." (Daniel J Levitin, "Weaponized Lies", 2017)

"The margin of error is how accurate the results are, and the confidence interval is how confident you are that your estimate falls within the margin of error." (Daniel J Levitin, "Weaponized Lies", 2017)

"Titles should clearly specify the content of the table or the graphic. What is being presented? Means and standard deviations? Confidence intervals? Percentages? Trends over time? Furthermore, consider the context, such as when and where the data were gathered, as well as the name of the dataset if using secondary data (although the dataset may also be identified in a source note)." (John Hoffmann, "Principles of Data Management and Presentation", 2017) 

"[...] a hypothesis test tells us whether the observed data are consistent with the null hypothesis, and a confidence interval tells us which hypotheses are consistent with the data." (William C Blackwelder)

23 November 2024

🦋Science: On Consistency (Quotes)

"A model, like a novel, may resonate with nature, but it is not a ‘real’ thing. Like a novel, a model may be convincing - it may ‘ring true’ if it is consistent with our experience of the natural world. But just as we may wonder how much the characters in a novel are drawn from real life and how much is artifice, we might ask the same of a model: How much is based on observation and measurement of accessible phenomena, how much is convenience? Fundamentally, the reason for modeling is a lack of full access, either in time or space, to the phenomena of interest." (Kenneth Belitz, Science, Vol. 263, 1944)

"Hypothesis is a tool which can cause trouble if not used properly. We must be ready to abandon out hypothesis as soon as it is shown to be inconsistent with the facts." (William I B Beveridge, "The Art of Scientific Investigation", 1950) 

"A good estimator will be unbiased and will converge more and more closely (in the long run) on the true value as the sample size increases. Such estimators are known as consistent. But consistency is not all we can ask of an estimator. In estimating the central tendency of a distribution, we are not confined to using the arithmetic mean; we might just as well use the median. Given a choice of possible estimators, all consistent in the sense just defined, we can see whether there is anything which recommends the choice of one rather than another. The thing which at once suggests itself is the sampling variance of the different estimators, since an estimator with a small sampling variance will be less likely to differ from the true value by a large amount than an estimator whose sampling variance is large." (Michael J Moroney, "Facts from Figures", 1951)

"Consistency and completeness can also be characterized in terms of models: a theory T is consistent if and only if it has at least one model; it is complete if and only if every sentence of T which is satisfied in one model is also satisfied in any other model of T. Two theories T1 and T2 are said to be compatible if they have a common consistent extension; this is equivalent to saying that the union of T1 and T2 is consistent." (Alfred Tarski et al, "Undecidable Theories", 1953)

"[I]n probability theory we are faced with situations in which our intuition or some physical experiments we have carried out suggest certain results. Intuition and experience lead us to an assignment of probabilities to events. As far as the mathematics is concerned, any assignment of probabilities will do, subject to the rules of mathematical consistency." (Robert Ash, "Basic probability theory", 1970)

"Information that is only partially structured (and therefore contains some 'noise' is fuzzy, inconsistent, and indistinct. Such imperfect information may be regarded as having merit only if it represents an intermediate step in structuring the information into a final meaningful form. If the partially Structured information remains in fuzzy form, it will create a state of dissatisfaction in the mind of the originator and certainly in the mind of the recipient. The natural desire is to continue structuring until clarity, simplicity, precision, and definitiveness are obtained." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"A single observation that is inconsistent with some generalization points to the falsehood of the generalization, and thereby 'points to itself'." (Ian Hacking, "The Emergence Of Probability", 1975)

"In any particular application, graphical or other informal analysis may show that consistency or inconsistency with H0 is so clear cut that explicit calculation of p is unnecessary." (David R Cox, "The role of significance tests", Scandanavian Journal of Statistics 4, 1977)

"A mathematical model is any complete and consistent set of mathematical equations which are designed to correspond to some other entity, its prototype. The prototype may be a physical, biological, social, psychological or conceptual entity, perhaps even another mathematical model." (Rutherford Aris, "Mathematical Modelling", 1978)

"When evaluating a model, at least two broad standards are relevant. One is whether the model is consistent with the data. The other is whether the model is consistent with the ‘real world.’" (Kenneth Bollen, "Structural Equations with Latent Variable", 1989)

"The term chaos is used in a specific sense where it is an inherently random pattern of behaviour generated by fixed inputs into deterministic (that is fixed) rules (relationships). The rules take the form of non-linear feedback loops. Although the specific path followed by the behaviour so generated is random and hence unpredictable in the long-term, it always has an underlying pattern to it, a 'hidden' pattern, a global pattern or rhythm. That pattern is self-similarity, that is a constant degree of variation, consistent variability, regular irregularity, or more precisely, a constant fractal dimension. Chaos is therefore order (a pattern) within disorder (random behaviour)." (Ralph D Stacey, "The Chaos Frontier: Creative Strategic Control for Business", 1991)

"When looking at the end result of any statistical analysis, one must be very cautious not to over interpret the data. Care must be taken to know the size of the sample, and to be certain the method for gathering information is consistent with other samples gathered. […] No one should ever base conclusions without knowing the size of the sample and how random a sample it was. But all too often such data is not mentioned when the statistics are given - perhaps it is overlooked or even intentionally omitted." (Theoni Pappas, "More Joy of Mathematics: Exploring mathematical insights & concepts", 1991)

"Data are generally collected as a basis for action. However, unless potential signals are separated from probable noise, the actions taken may be totally inconsistent with the data. Thus, the proper use of data requires that you have simple and effective methods of analysis which will properly separate potential signals from probable noise." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Every messy data is messy in its own way - it’s easy to define the characteristics of a clean dataset (rows are observations, columns are variables, columns contain values of consistent types). If you start to look at real life data you’ll see every way you can imagine data being messy (and many that you can’t)!" (Hadley Wickham, "R-help mailing list", 2008)

"It is the consistency of the information that matters for a good story, not its completeness. Indeed, you will often find that knowing little makes it easier to fit everything you know into a coherent pattern." (Daniel Kahneman, "Thinking, Fast and Slow", 2011) 

"Having NUMBERSENSE means: (•) Not taking published data at face value; (•) Knowing which questions to ask; (•) Having a nose for doctored statistics. [...] NUMBERSENSE is that bit of skepticism, urge to probe, and desire to verify. It’s having the truffle hog’s nose to hunt the delicacies. Developing NUMBERSENSE takes training and patience. It is essential to know a few basic statistical concepts. Understanding the nature of means, medians, and percentile ranks is important. Breaking down ratios into components facilitates clear thinking. Ratios can also be interpreted as weighted averages, with those weights arranged by rules of inclusion and exclusion. Missing data must be carefully vetted, especially when they are substituted with statistical estimates. Blatant fraud, while difficult to detect, is often exposed by inconsistency." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"Accuracy and coherence are related concepts pertaining to data quality. Accuracy refers to the comprehensiveness or extent of missing data, performance of error edits, and other quality assurance strategies. Coherence is the degree to which data - item value and meaning are consistent over time and are comparable to similar variables from other routinely used data sources." (Aileen Rothbard, "Quality Issues in the Use of Administrative Data Records", 2015)

"The dialectical interplay of experiment and theory is a key driving force of modern science. Experimental data do only have meaning in the light of a particular model or at least a theoretical background. Reversely theoretical considerations may be logically consistent as well as intellectually elegant: Without experimental evidence they are a mere exercise of thought no matter how difficult they are. Data analysis is a connector between experiment and theory: Its techniques advise possibilities of model extraction as well as model testing with experimental data." (Achim Zielesny, "From Curve Fitting to Machine Learning" 2nd Ed., 2016)

"A good estimator has to be more than just consistent. It also should be one whose variance is less than that of any other estimator. This property is called minimum variance. This means that if we run the experiment several times, the 'answers' we get will be closer to one another than 'answers' based on some other estimator." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Estimators are functions of the observed values that can be used to estimate specific parameters. Good estimators are those that are consistent and have minimum variance. These properties are guaranteed if the estimator maximizes the likelihood of the observations." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"There are other problems with Big Data. In any large data set, there are bound to be inconsistencies, misclassifications, missing data - in other words, errors, blunders, and possibly lies. These problems with individual items occur in any data set, but they are often hidden in a large mass of numbers even when these numbers are generated out of computer interactions." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"[...] a hypothesis test tells us whether the observed data are consistent with the null hypothesis, and a confidence interval tells us which hypotheses are consistent with the data." (William C Blackwelder)

22 November 2024

🦋Science: On Randomization (Quotes)

"It appears to be a quite general principle that, whenever there is a randomized way of doing something, then there is a nonrandomized way that delivers better performance but requires more thought." (Edwin T Jaynes, "Probability Theory: The Logic of Science", 1979)

"Managers construct, rearrange, single out, and demolish many objective features of their surroundings. When people act they unrandomize variables, insert vestiges of orderliness, and literally create their own constraints." (Karl E Weick, "Social Psychology of Organizing", 1979)

"When the statistician looks at the outside world, he cannot, for example, rely on finding errors that are independently and identically distributed in approximately normal distributions. In particular, most economic and business data are collected serially and can be expected, therefore, to be heavily serially dependent. So is much of the data collected from the automatic instruments which are becoming so common in laboratories these days. Analysis of such data, using procedures such as standard regression analysis which assume independence, can lead to gross error. Furthermore, the possibility of contamination of the error distribution by outliers is always present and has recently received much attention. More generally, real data sets, especially if they are long, usually show inhomogeneity in the mean, the variance, or both, and it is not always possible to randomize." (George E P Box, "Some Problems of Statistics and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365), 1979)

"Randomization is usually a cheap and harmless way of improving the effectiveness of experimentation with very little extra effort." (Robert Hooke, "How to Tell the Liars from the Statisticians", 1983)

"When nearest neighbor effects exist, the randomized complete block analysis [can be] so poor as to deserver to be called catastrophic. It [can not] even be considered a serious form of analysis. It is extremely important to make this clear to the vast number of researchers who have near religious faith in the randomized complete block design." (Walt Stroup & D Mulitze, "Nearest Neighbor Adjusted Best Linear Unbiased Prediction", The American Statistician 45, 1991)

"Randomization puts systematic sources of variability into the error term." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"The correlational technique known as multiple regression is used frequently in medical and social science research. This technique essentially correlates many independent (or predictor) variables simultaneously with a given dependent variable (outcome or output). It asks, 'Net of the effects of all the other variables, what is the effect of variable A on the dependent variable?' Despite its popularity, the technique is inherently weak and often yields misleading results. The problem is due to self-selection. If we don’t assign cases to a particular treatment, the cases may differ in any number of ways that could be causing them to differ along some dimension related to the dependent variable. We can know that the answer given by a multiple regression analysis is wrong because randomized control experiments, frequently referred to as the gold standard of research techniques, may give answers that are quite different from those obtained by multiple regression analysis." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"Expert knowledge is a term covering various types of knowledge that can help define or disambiguate causal relations between two or more variables. Depending on the context, expert knowledge might refer to knowledge from randomized controlled trials, laws of physics, a broad scope of experiences in a given area, and more." (Aleksander Molak, "Causal Inference and Discovery in Python", 2023)

"The causal interpretation of linear regression only holds when there are no spurious relationships in your data. This is the case in two scenarios: when you control for a set of all necessary variables (sometimes this set can be empty) or when your data comes from a properly designed randomized experiment." (Aleksander Molak, "Causal Inference and Discovery in Python", 2023)

"The first level of creativity [for evaluating causal models] is to use the refutation tests [...] The second level of creativity is available when you have access to historical data coming from randomized experiments. You can compare your observational model with the experimental results and try to adjust your model accordingly. The third level of creativity is to evaluate your modeling approach on simulated data with known outcomes. [...] The fourth level of creativity is sensitivity analysis." (Aleksander Molak, "Causal Inference and Discovery in Python", 2023)

20 November 2024

🦋Science: On Risk (Quotes)

"A deterministic system is one in which the parts interact in a perfectly predictable way. There is never any room for doubt: given a last state of the system and the programme of information by defining its dynamic network, it is always possible to predict, without any risk of error, its succeeding state. A probabilistic system, on the other hand, is one about which no precisely detailed prediction can be given. The system may be studied intently, and it may become more and more possible to say what it is likely to do in any given circumstances. But the system simply is not predetermined, and a prediction affecting it can never escape from the logical limitations of the probabilities in which terms alone its behaviour can be described." (Stafford Beer, "Cybernetics and Management", 1959)

"It is easy to obtain confirmations, or verifications, for nearly every theory - if we look for confirmations. Confirmations should count only if they are the result of risky predictions. […] A theory which is not refutable by any conceivable event is non-scientific. Irrefutability is not a virtue of a theory (as people often think) but a vice. Every genuine test of a theory is an attempt to falsify it, or refute it." (Karl R Popper, "Conjectures and Refutations: The Growth of Scientific Knowledge", 1963)

"Statistical hypothesis testing is commonly used inappropriately to analyze data, determine causality, and make decisions about significance in ecological risk assessment,[...] It discourages good toxicity testing and field studies, it provides less protection to ecosystems or their components that are difficult to sample or replicate, and it provides less protection when more treatments or responses are used. It provides a poor basis for decision-making because it does not generate a conclusion of no effect, it does not indicate the nature or magnitude of effects, it does address effects at untested exposure levels, and it confounds effects and uncertainty[...]. Risk assessors should focus on analyzing the relationship between exposure and effects[...]."  (Glenn W Suter, "Abuse of hypothesis testing statistics in ecological risk assessment", Human and Ecological Risk Assessment 2, 1996)

"Until we can distinguish between an event that is truly random and an event that is the result of cause and effect, we will never know whether what we see is what we'll get, nor how we got what we got. When we take a risk, we are betting on an outcome that will result from a decision we have made, though we do not know for certain what the outcome will be. The essence of risk management lies in maximizing the areas where we have some control over the outcome while minimizing the areas where we have absolutely no control over the outcome and the linkage between effect and cause is hidden from us." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Overcoming innumeracy is like completing a three-step program to statistical literacy. The first step is to defeat the illusion of certainty. The second step is to learn about the actual risks of relevant events and actions. The third step is to communicate the risks in an understandable way and to draw inferences without falling prey to clouded thinking. The general point is this: Innumeracy does not simply reside in our minds but in the representations of risk that we choose." (Gerd Gigerenzer, "Calculated Risks: How to know when numbers deceive you", 2002)

"The goal of random sampling is to produce a sample that is likely to be representative of the population. Although random sampling does not guarantee that the sample will be representative, it does allow us to assess the risk of an unrepresentative sample. It is the ability to quantify this risk that will enable us to generalize with confidence from a random sample to the corresponding population." (Roxy Peck et al, "Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"Decision trees are an important tool for decision making and risk analysis, and are usually represented in the form of a graph or list of rules. One of the most important features of decision trees is the ease of their application. Being visual in nature, they are readily comprehensible and applicable. Even if users are not familiar with the way that a decision tree is constructed, they can still successfully implement it. Most often decision trees are used to predict future scenarios, based on previous experience, and to support rational decision making." (Jelena Djuris et al, "Neural computing in pharmaceutical products and process development", Computer-Aided Applications in Pharmaceutical Technology, 2013)

"Without context, data is useless, and any visualization you create with it will also be useless. Using data without knowing anything about it, other than the values themselves, is like hearing an abridged quote secondhand and then citing it as a main discussion point in an essay. It might be okay, but you risk finding out later that the speaker meant the opposite of what you thought." (Nathan Yau, "Data Points: Visualization That Means Something", 2013)

"The more complex the system, the more variable (risky) the outcomes. The profound implications of this essential feature of reality still elude us in all the practical disciplines. Sometimes variance averages out, but more often fat-tail events beget more fat-tail events because of interdependencies. If there are multiple projects running, outlier (fat-tail) events may also be positively correlated - one IT project falling behind will stretch resources and increase the likelihood that others will be compromised." (Paul Gibbons, "The Science of Successful Organizational Change",  2015)

"Roughly stated, the No Free Lunch theorem states that in the lack of prior knowledge (i.e. inductive bias) on average all predictive algorithms that search for the minimum classification error (or extremum over any risk metric) have identical performance according to any measure." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"Premature enumeration is an equal-opportunity blunder: the most numerate among us may be just as much at risk as those who find their heads spinning at the first mention of a fraction. Indeed, if you’re confident with numbers you may be more prone than most to slicing and dicing, correlating and regressing, normalizing and rebasing, effortlessly manipulating the numbers on the spreadsheet or in the statistical package - without ever realizing that you don’t fully understand what these abstract quantities refer to. Arguably this temptation lay at the root of the last financial crisis: the sophistication of mathematical risk models obscured the question of how, exactly, risks were being measured, and whether those measurements were something you’d really want to bet your global banking system on." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Behavioral finance so far makes conclusions from statics not dynamics, hence misses the picture. It applies trade-offs out of context and develops the consensus that people irrationally overestimate tail risk (hence need to be 'nudged' into taking more of these exposures). But the catastrophic event is an absorbing barrier. No risky exposure can be analyzed in isolation: risks accumulate. If we ride a motorcycle, smoke, fly our own propeller plane, and join the mafia, these risks add up to a near-certain premature death. Tail risks are not a renewable resource." (Nassim N Taleb, "Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications" 2nd Ed., 2022)

"Any time you run regression analysis on arbitrary real-world observational data, there’s a significant risk that there’s hidden confounding in your dataset and so causal conclusions from such analysis are likely to be (causally) biased." (Aleksander Molak, "Causal Inference and Discovery in Python", 2023)

"[Making reasoned macro calls] starts with having the best and longest-time-series data you can find. You may have to take some risks in terms of the quality of data sources, but it amazes me how people are often more willing to act based on little or no data than to use data that is a challenge to assemble." (Robert J Shiller)

19 November 2024

🦋Science: On Evidence (Quotes)

"Rule 1. Original data should be presented in a way that will preserve the evidence in the original data for all the predictions assumed to be useful." (Walter A Shewhart, "Economic Control of Quality of Manufactured Product", 1931)

"Rule 2. Any summary of a distribution of numbers in terms of symmetric functions should not give an objective degree of belief in any one of the inferences or predictions to be made therefrom that would cause human action significantly different from what this action would be if the original distributions had been taken as evidence." (Walter A Shewhart, "Economic Control of Quality of Manufactured Product", 1931)

"[...] there is evidence that significance tests have been a genuine block to achieving [...] knowledge." (Denton E Morrison & Ramon E Henkel, "Significance tests reconsidered", The American Sociologist 4, 1969)

"Confidence intervals give a feeling of the uncertainty of experimental evidence, and (very important) give it in the same units [...] as the original observations." (Mary G Natrella, "The relation between confidence intervals and tests of significance", American Statistician 14, 1960)

"The null-hypothesis significance test treats ‘acceptance’ or ‘rejection’ of a hypothesis as though these were decisions one makes. But a hypothesis is not something, like a piece of pie offered for dessert, which can be accepted or rejected by a voluntary physical action. Acceptance or rejection of a hypothesis is a cognitive process, a degree of believing or disbelieving which, if rational, is not a matter of choice but determined solely by how likely it is, given the evidence, that the hypothesis is true." (William W Rozeboom, "The fallacy of the null–hypothesis significance test", Psychological Bulletin 57, 1960)

"Scientific discovery, or the formulation of scientific theory, starts in with the unvarnished and unembroidered evidence of the senses. It starts with simple observation - simple, unbiased, unprejudiced, naive, or innocent observation - and out of this sensory evidence, embodied in the form of simple propositions or declarations of fact, generalizations will grow up and take shape, almost as if some process of crystallization or condensation were taking place. Out of a disorderly array of facts, an orderly theory, an orderly general statement, will somehow emerge." (Sir Peter B Medawar, "Is the Scientific Paper Fraudulent?", The Saturday Review, 1964)

"We have overwhelming evidence that available information plus analysis does not lead to knowledge. The management science team can properly analyse a situation and present recommendations to the manager, but no change occurs. The situation is so familiar to those of us who try to practice management science that I hardly need to describe the cases." (C West Churchman, "Managerial acceptance of scientific recommendations", California Management Review Vol 7, 1964)

"Science consists simply of the formulation and testing of hypotheses based on observational evidence; experiments are important where applicable, but their function is merely to simplify observation by imposing controlled conditions." (Henry L Batten, "Evolution of the Earth", 1971)

"Statistics is a body of methods and theory applied to numerical evidence in making decisions in the face of uncertainty." (Lawrence Lapin, "Statistics for Modern Business Decisions", 1973)

"The language of association and prediction is probably most often used because the evidence seems insufficient to justify a direct causal statement. A better practice is to state the causal hypothesis and then to present the evidence along with an assessment with respect to the causal hypothesis - instead of letting the quality of the data determine the language of the explanation." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"All interpretations made by a scientist are hypotheses, and all hypotheses are tentative. They must forever be tested and they must be revised if found to be unsatisfactory. Hence, a change of mind in a scientist, and particularly in a great scientist, is not only not a sign of weakness but rather evidence for continuing attention to the respective problem and an ability to test the hypothesis again and again." (Ernst Mayr, "The Growth of Biological Thought: Diversity, Evolution and Inheritance", 1982)

"It has been widely felt, probably for thirty years and more, that significance tests are overemphasized and often misused and that more emphasis should be put on estimation and prediction. While such a shift of emphasis does seem to be occurring, for example in medical statistics, the continued very extensive use of significance tests is on the one hand alarming and on the other evidence that they are aimed, even if imperfectly, at some widely felt need." (David R Cox, "Some general aspects of the theory of statistics", International Statistical Review 54, 1986)

"Like a detective, a data analyst will experience many dead ends, retrace his steps, and explore many alternatives before settling on a single description of the evidence in front of him." (David Lubinsky & Daryl Pregibon , "Data analysis as search", Journal of Econometrics Vol. 38 (1–2), 1988)

"Subjective probability, also known as Bayesian statistics, pushes Bayes' theorem further by applying it to statements of the type described as 'unscientific' in the frequency definition. The probability of a theory (e.g. that it will rain tomorrow or that parity is not violated) is considered to be a subjective 'degree of belief - it can perhaps be measured by seeing what odds the person concerned will offer as a bet. Subsequent experimental evidence then modifies the initial degree of belief, making it stronger or weaker according to whether the results agree or disagree with the predictions of the theory in question." (Roger J Barlow, "Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"Probability theory is an ideal tool for formalizing uncertainty in situations where class frequencies are known or where evidence is based on outcomes of a sufficiently long series of independent random experiments. Possibility theory, on the other hand, is ideal for formalizing incomplete information expressed in terms of fuzzy propositions." (George Klir, "Fuzzy sets and fuzzy logic", 1995)

"[…] the simplest hypothesis proposed as an explanation of phenomena is more likely to be the true one than is any other available hypothesis, that its predictions are more likely to be true than those of any other available hypothesis, and that it is an ultimate a priori epistemic principle that simplicity is evidence for truth." (Richard Swinburne, "Simplicity as Evidence for Truth", 1997)

"When significance tests are used and a null hypothesis is not rejected, a major problem often arises - namely, the result may be interpreted, without a logical basis, as providing evidence for the null hypothesis." (David F Parkhurst, "Statistical Significance Tests: Equivalence and Reverse Tests Should Reduce Misinterpretation", BioScience Vol. 51 (12), 2001)

"One cautious approach is represented by Bernoulli’s more conservative outlook. If there are very strong reasons for believing that an observation has suffered an accident that made the value in the data-file thoroughly untrustworthy, then reject it; in the absence of clear evidence that an observation, identified by formal rule as an outlier, is unacceptable then retain it unless there is lack of trust that the laboratory obtaining it is conscientiously operated by able persons who have [...] taken every care.'" (David Finney, "Calibration Guidelines Challenge Outlier Practices", The American Statistician Vol 60 (4), 2006)

"Scholars feel the need to present tables of model parameters in academic articles (perhaps just as evidence that they ran the analysis they claimed to have run), but these tables are rarely interpreted other than for their sign and statistical significance. Most of the numbers in these tables are never even discussed in the text. From the perspective of the applied data analyst, R packages without procedures to compute quantities of scientific interest are woefully incomplete. A better approach focuses on quantities of direct scientific interest rather than uninterpretable model parameters. [...] For each quantity of interest, the user needs some summary that includes a point estimate and a measure of uncertainty such as a standard error, confidence interval, or a distribution. The methods of calculating these differ greatly across theories of inference and methods of analysis. However, from the user’s perspective, the result is almost always the same: the point estimate and uncertainty of some quantity of interest." (Kousuke Imai et al, "Toward a Common Framework for Statistical Analysis and Development", Journal of Computational and Graphical Statistics vol. 17, 2008)

"Data analysis is careful thinking about evidence." (Michael Milton, "Head First Data Analysis", 2009)

"Data clusters are everywhere, even in random data. Someone who looks for an explanation will inevitably find one, but a theory that fits a data cluster is not persuasive evidence. The found explanation needs to make sense and it needs to be tested with uncontaminated data." (Gary Smith, "Standard Deviations", 2014)

"In general, when building statistical models, we must not forget that the aim is to understand something about the real world. Or predict, choose an action, make a decision, summarize evidence, and so on, but always about the real world, not an abstract mathematical world: our models are not the reality - a point well made by George Box in his oft-cited remark that "all models are wrong, but some are useful". (David Hand, "Wonderful examples, but let's not close our eyes", Statistical Science 29, 2014)

"The dialectical interplay of experiment and theory is a key driving force of modern science. Experimental data do only have meaning in the light of a particular model or at least a theoretical background. Reversely theoretical considerations may be logically consistent as well as intellectually elegant: Without experimental evidence they are a mere exercise of thought no matter how difficult they are. Data analysis is a connector between experiment and theory: Its techniques advise possibilities of model extraction as well as model testing with experimental data." (Achim Zielesny, "From Curve Fitting to Machine Learning" 2nd Ed., 2016)

"In terms of characteristics, a data scientist has an inquisitive mind and is prepared to explore and ask questions, examine assumptions and analyse processes, test hypotheses and try out solutions and, based on evidence, communicate informed conclusions, recommendations and caveats to stakeholders and decision makers." (Jesús Rogel-Salazar, "Data Science and Analytics with Python", 2017)

"With the growing availability of massive data sets and user-friendly analysis software, it might be thought that there is less need for training in statistical methods. This would be naïve in the extreme. Far from freeing us from the need for statistical skills, bigger data and the rise in the number and complexity of scientific studies makes it even more difficult to draw appropriate conclusions. More data means that we need to be even more aware of what the evidence is actually worth." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"The general principles of starting with a well-defined question, engaging in careful observation, and then formulating hypotheses and assessing the strength of evidence for and against them became known as the scientific method." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"Absence of evidence is not evidence of absence." (Martin Rees)

"The deepest sin of the human mind is to believe things without evidence." (Thomas H Huxley)

18 November 2024

🦋Science: On Truth in Models (Quotes)

"A model, like a novel, may resonate with nature, but it is not a ‘real’ thing. Like a novel, a model may be convincing - it may ‘ring true’ if it is consistent with our experience of the natural world. But just as we may wonder how much the characters in a novel are drawn from real life and how much is artifice, we might ask the same of a model: How much is based on observation and measurement of accessible phenomena, how much is convenience? Fundamentally, the reason for modeling is a lack of full access, either in time or space, to the phenomena of interest." (Kenneth Belitz, Science, Vol. 263, 1944)

"Exact truth of a null hypothesis is very unlikely except in a genuine uniformity trial." (David R Cox, "Some problems connected with statistical inference", Annals of Mathematical Statistics 29, 1958)

"[…] no models are [true] = not even the Newtonian laws. When you construct a model you leave out all the details which you, with the knowledge at your disposal, consider inessential. […] Models should not be true, but it is important that they are applicable, and whether they are applicable for any given purpose must of course be investigated. This also means that a model is never accepted finally, only on trial." (Georg Rasch, "Probabilistic Models for Some Intelligence and Attainment Tests", 1960)

"The validation of a model is not that it is 'true' but that it generates good testable hypotheses relevant to important problems." (Richard Levins, "The Strategy of Model Building in Population Biology", 1966)

"A theory has only the alternative of being right or wrong. A model has a third possibility: it may be right, but irrelevant." (Manfred Eigen, 1973)

"Models, of course, are never true, but fortunately it is only necessary that they be useful. For this it is usually needful only that they not be grossly wrong. I think rather simple modifications of our present models will prove adequate to take account of most realities of the outside world. The difficulties of computation which would have been a barrier in the past need not deter us now." (George E P Box, "Some Problems of Statistics and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365), 1979)

"The purpose of an experiment is to answer questions. The truth of this seems so obvious, that it would not be worth emphasizing were it not for the fact that the results of many experiments are interpreted and presented with little or no reference to the questions that were asked in the first place."  (Thomas M Little, "Interpretation and presentation of results", Hortscience 16, 1981)

"The fact that [the model] is an approximation does not necessarily detract from its usefulness because models are approximations. All models are wrong, but some are useful." (George Box, 1987)

"A null hypothesis that yields under two different treatments have identical expectations is scarcely very plausible, and its rejection by a significance test is more dependent upon the size of an experiment than upon its untruth." (David J Finney, "Was this in your statistics textbook?", Experimental Agriculture 24, 1988)

"The motivation for any action on outliers must be to improve interpretation of data without ignoring unwelcome truth. To remove bad and untrustworthy data is a laudable ambition, but naive and untested rules may bring harm rather than benefit." (David Finney, "Calibration Guidelines Challenge Outlier Practices", The American Statistician Vol 60 (4), 2006) 

"You might say that there’s no reason to bother with model checking since all models are false anyway. I do believe that all models are false, but for me the purpose of model checking is not to accept or reject a model, but to reveal aspects of the data that are not captured by the fitted model." (Andrew Gelman, "Some thoughts on the sociology of statistics", 2007)

"If students have students have no experience with hands-on [telescope] observing, they may take all data as ‘truth’ without having an understanding of how the data are obtained and what could potentially go wrong in that process, so I think it becomes crucially important to give a glimpse of what’s happening behind the scenes at telescopes, so they can be appropriately skeptical users of data in the future." (Colette Salyk, Sky & Telescope, 2022)

"On a final note, we would like to stress the importance of design, which often does not receive the attention it deserves. Sometimes, the large number of modeling options for spatial analysis may raise the false impression that design does not matter, and that a sophisticated analysis takes care of everything. Nothing could be further from the truth." (Hans-Peter Piepho et al, "Two-dimensional P-spline smoothing for spatial analysis of plant breeding trials", “Biometrical Journal”, 2022)

17 November 2024

🦋Science: On Confidence Intervals (Quotes)

"Confidence intervals give a feeling of the uncertainty of experimental evidence, and (very important) give it in the same units [...] as the original observations." (Mary G Natrella, "The relation between confidence intervals and tests of significance", American Statistician 14, 1960)

"I do not think that significance testing should be completely abandoned [...] and I don’t expect that it will be. But I urge researchers to provide estimates, with confidence intervals: scientific advance requires parameters with known reliability estimates. Classical confidence intervals are formally equivalent to a significance test, but they convey more information." (Nigel G Yoccoz, "Use, Overuse, and Misuse of Significance Tests in Evolutionary Biology and Ecology", Bulletin of the Ecological Society of America Vol. 72 (2), 1991)

"Whereas hypothesis testing emphasizes a very narrow question (‘Do the population means fail to conform to a specific pattern?’), the use of confidence intervals emphasizes a much broader question (‘What are the population means?’). Knowing what the means are, of course, implies knowing whether they fail to conform to a specific pattern, although the reverse is not true. In this sense, use of confidence intervals subsumes the process of hypothesis testing." (Geoffrey R Loftus, "On the tyranny of hypothesis testing in the social sciences", Contemporary Psychology 36, 1991)

"Probabilistic inference is the classical paradigm for data analysis in science and technology. It rests on a foundation of randomness; variation in data is ascribed to a random process in which nature generates data according to a probability distribution. This leads to a codification of uncertainly by confidence intervals and hypothesis tests." (William S Cleveland, "Visualizing Data", 1993)

"We should push for de-emphasizing some topics, such as statistical significance tests - an unfortunate carry-over from the traditional elementary statistics course. We would suggest a greater focus on confidence intervals - these achieve the aim of formal hypothesis testing, often provide additional useful information, and are not as easily misinterpreted." (Gerry Hahn et al, "The Impact of Six Sigma Improvement: A Glimpse Into the Future of Statistics", The American Statistician, 1999)

"Precision does not vary linearly with increasing sample size. As is well known, the width of a confidence interval is a function of the square root of the number of observations. But it is more complicate than that. The basic elements determining a confidence interval are the sample size, an estimate of variability, and a pivotal variable associated with the estimate of variability." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"The important thing is to understand that frequentist and Bayesian methods are answering different questions. To combine prior beliefs with data in a principled way, use Bayesian inference. To construct procedures with guaranteed long run performance, such as confidence intervals, use frequentist methods. Generally, Bayesian methods run into problems when the parameter space is high dimensional." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004) 

"There is a tendency to use hypothesis testing methods even when they are not appropriate. Often, estimation and confidence intervals are better tools. Use hypothesis testing only when you want to test a well-defined hypothesis." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Scholars feel the need to present tables of model parameters in academic articles (perhaps just as evidence that they ran the analysis they claimed to have run), but these tables are rarely interpreted other than for their sign and statistical significance. Most of the numbers in these tables are never even discussed in the text. From the perspective of the applied data analyst, R packages without procedures to compute quantities of scientific interest are woefully incomplete. A better approach focuses on quantities of direct scientific interest rather than uninterpretable model parameters. [...] For each quantity of interest, the user needs some summary that includes a point estimate and a measure of uncertainty such as a standard error, confidence interval, or a distribution. The methods of calculating these differ greatly across theories of inference and methods of analysis. However, from the user’s perspective, the result is almost always the same: the point estimate and uncertainty of some quantity of interest." (Kousuke Imai et al, "Toward a Common Framework for Statistical Analysis and Development", Journal of Computational and Graphical Statistics vol. 17, 2008)

"Given the important role that correlation plays in structural equation modeling, we need to understand the factors that affect establishing relationships among multivariable data points. The key factors are the level of measurement, restriction of range in data values (variability, skewness, kurtosis), missing data, nonlinearity, outliers, correction for attenuation, and issues related to sampling variation, confidence intervals, effect size, significance, sample size, and power." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"A complete data analysis will involve the following steps: (i) Finding a good model to fit the signal based on the data. (ii) Finding a good model to fit the noise, based on the residuals from the model. (iii) Adjusting variances, test statistics, confidence intervals, and predictions, based on the model for the noise.(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"For a confidence interval, the central limit theorem plays a role in the reliability of the interval because the sample mean is often approximately normal even when the underlying data is not. A prediction interval has no such protection. The shape of the interval reflects the shape of the underlying distribution. It is more important to examine carefully the normality assumption by checking the residuals […].(DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Samples give us estimates of something, and they will almost always deviate from the true number by some amount, large or small, and that is the margin of error. […] The margin of error does not address underlying flaws in the research, only the degree of error in the sampling procedure. But ignoring those deeper possible flaws for the moment, there is another measurement or statistic that accompanies any rigorously defined sample: the confidence interval." (Daniel J Levitin, "Weaponized Lies", 2017)

"The margin of error is how accurate the results are, and the confidence interval is how confident you are that your estimate falls within the margin of error." (Daniel J Levitin, "Weaponized Lies", 2017)

"[...] a hypothesis test tells us whether the observed data are consistent with the null hypothesis, and a confidence interval tells us which hypotheses are consistent with the data." (William C Blackwelder)

Related Posts Plugin for WordPress, Blogger...