01 August 2021

Science: On Data Collection (Quotes)

 "There are two aspects of statistics that are continually mixed, the method and the science. Statistics are used as a method, whenever we measure something, for example, the size of a district, the number of inhabitants of a country, the quantity or price of certain commodities, etc. […] There is, moreover, a science of statistics. It consists of knowing how to gather numbers, combine them and calculate them, in the best way to lead to certain results. But this is, strictly speaking, a branch of mathematics." (Alphonse P de Candolle, "Considerations on Crime Statistics", 1833)

"Just as data gathered by an incompetent observer are worthless - or by a biased observer, unless the bias can be measured and eliminated from the result - so also conclusions obtained from even the best data by one unacquainted with the principles of statistics must be of doubtful value." (William F White, "A Scrap-Book of Elementary Mathematics: Notes, Recreations, Essays", 1908)

"[...] scientists are not a select few intelligent enough to think in terms of 'broad sweeping theoretical laws and principles'. Instead, scientists are people specifically trained to build models that incorporate theoretical assumptions and empirical evidence. Working with models is essential to the performance of their daily work; it allows them to construct arguments and to collect data." (Peter Imhof, Science Vol. 287, 1935–1936)

"Statistics is a scientific discipline concerned with collection, analysis, and interpretation of data obtained from observation or experiment. The subject has a coherent structure based on the theory of Probability and includes many different procedures which contribute to research and development throughout the whole of Science and Technology." (Egon Pearson, 1936)

"Scientific data are not taken for museum purposes; they are taken as a basis for doing something. If nothing is to be done with the data, then there is no use in collecting any. The ultimate purpose of taking data is to provide a basis for action or a recommendation for action. The step intermediate between the collection of data and the action is prediction." (William E Deming, "On a Classification of the Problems of Statistical Inference", Journal of the American Statistical Association Vol. 37 (218), 1942)

"Data should be collected with a clear purpose in mind. Not only a clear purpose, but a clear idea as to the precise way in which they will be analysed so as to yield the desired information." (Michael J Moroney, "Facts from Figures", 1951)

"The technical analysis of any large collection of data is a task for a highly trained and expensive man who knows the mathematical theory of statistics inside and out. Otherwise the outcome is likely to be a collection of drawings - quartered pies, cute little battleships, and tapering rows of sturdy soldiers in diversified uniforms - interesting enough in the colored Sunday supplement, but hardly the sort of thing from which to draw reliable inferences." (Eric T Bell, "Mathematics: Queen and Servant of Science", 1951)

"Null hypotheses of no difference are usually known to be false before the data are collected [...] when they are, their rejection or acceptance simply reflects the size of the sample and the power of the test, and is not a contribution to science." (I Richard Savage, "Nonparametric statistics", Journal of the American Statistical Association 52, 1957)

"Philosophers of science have repeatedly demonstrated that more than one theoretical construction can always be placed upon a given collection of data." (Thomas Kuhn, "The Structure of Scientific Revolutions", 1962) 

"It has been said that data collection is like garbage collection: before you collect it you should have in mind what you are going to do with it." (Russell Fox et al, "The Science of Science", 1964)

"Measurement has too often been the leitmotif of many investigations rather than the experimental examination of hypotheses. Mounds of data are collected, which are statistically decorous and methodologically unimpeachable, but conclusions are often trivial and rarely useful in decision making. This results from an overly rigorous control of an insignificant variable and a widespread deficiency in the framing of pertinent questions. Investigators seem to have settled for what is measurable instead of measuring what they would really like to know." (Edmund D Pellegrino, "Patient Care: Mystical Research or Researchable Mystique", Clinical Research, 1964)

"If we gather more and more data and establish more and more associations, however, we will not finally find that we know something. We will simply end up having more and more data and larger sets of correlations." (Kenneth N Waltz, "Theory of International Politics Source: Theory of International Politics", 1979)

"The systematic collection of data about people has affected not only the ways in which we conceive of a society, but also the ways in which we describe our neighbour. It has profoundly transformed what we choose to do, who we try to be, and what we think of ourselves." (Ian Hacking, "The Taming of Chance", 1990)

When looking at the end result of any statistical analysis, one must be very cautious not to over interpret the data. Care must be taken to know the size of the sample, and to be certain the method for gathering information is consistent with other samples gathered. […] No one should ever base conclusions without knowing the size of the sample and how random a sample it was. But all too often such data is not mentioned when the statistics are given - perhaps it is overlooked or even intentionally omitted." (Theoni Pappas, "More Joy of Mathematics: Exploring mathematical insights & concepts", 1991)

"We do not realize how deeply our starting assumptions affect the way we go about looking for and interpreting the data we collect." (Roger A Lewin, "Kanzi: The Ape at the Brink of the Human Mind", 1994)

"The science of statistics may be described as exploring, analyzing and summarizing data; designing or choosing appropriate ways of collecting data and extracting information from them; and communicating that information. Statistics also involves constructing and testing models for describing chance phenomena. These models can be used as a basis for making inferences and drawing conclusions and, finally, perhaps for making decisions." (Fergus Daly et al, "Elements of Statistics", 1995)

"Data are collected as a basis for action. Yet before anyone can use data as a basis for action the data have to be interpreted. The proper interpretation of data will require that the data be presented in context, and that the analysis technique used will filter out the noise."  (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Data are generally collected as a basis for action. However, unless potential signals are separated from probable noise, the actions taken may be totally inconsistent with the data. Thus, the proper use of data requires that you have simple and effective methods of analysis which will properly separate potential signals from probable noise." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Just as dynamics arise from feedback, so too all learning depends on feedback. We make decisions that alter the real world; we gather information feedback about the real world, and using the new information we revise our understanding of the world and the decisions we make to bring our perception of the state of the system closer to our goals." (John D Sterman, "Business dynamics: Systems thinking and modeling for a complex world", 2000) 

"Without meaningful data there can be no meaningful analysis. The interpretation of any data set must be based upon the context of those data. Unfortunately, much of the data reported to executives today are aggregated and summed over so many different operating units and processes that they cannot be said to have any context except a historical one - they were all collected during the same time period. While this may be rational with monetary figures, it can be devastating to other types of data." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Data is a fact of life. As time goes by, we collect more and more data, making our original reason for collecting the data harder to accomplish. We don't collect data just to waste time or keep busy; we collect data so that we can gain knowledge, which can be used to improve the efficiency of our organization, improve profit margins, and on and on. The problem is that as we collect more data, it becomes harder for us to use the data to derive this knowledge. We are being suffocated by this raw data, yet we need to find a way to use it." (Seth Paul et al. "Preparing and Mining Data with Microsoft SQL Server 2000 and Analysis", 2002)

"Statistics depend on collecting information. If questions go unasked, or if they are asked in ways that limit responses, or if measures count some cases but exclude others, information goes ungathered, and missing numbers result. Nevertheless, choices regarding which data to collect and how to go about collecting the information are inevitable." (Joel Best, "More Damned Lies and Statistics: How numbers confuse public issues", 2004)

"Put simply, statistics is a range of procedures for gathering, organizing, analyzing and presenting quantitative data. […] Essentially […], statistics is a scientific approach to analyzing numerical data in order to enable us to maximize our interpretation, understanding and use. This means that statistics helps us turn data into information; that is, data that have been interpreted, understood and are useful to the recipient. Put formally, for your project, statistics is the systematic collection and analysis of numerical data, in order to investigate or discover relationships among phenomena so as to explain, predict and control their occurrence." (Reva B Brown & Mark Saunders, "Dealing with Statistics: What You Need to Know", 2008)

"Statistics is the art of learning from data. It is concerned with the collection of data, their subsequent description, and their analysis, which often leads to the drawing of conclusions." (Sheldon M Ross, "Introductory Statistics" 3rd Ed., 2009)

"Statistics is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions." (Ron Larson & Betsy Farber, "Elementary Statistics: Picturing the World" 5th Ed., 2011)

"The discrepancy between our mental models and the real world may be a major problem of our times; especially in view of the difficulty of collecting, analyzing, and making sense of the unbelievable amount of data to which we have access today." (Ugo Bardi, "The Limits to Growth Revisited", 2011)

"In order to be effective a descriptive statistic has to make sense - it has to distill some essential characteristic of the data into a value that is both appropriate and understandable. […] the justification for computing any given statistic must come from the nature of the data themselves - it cannot come from the arithmetic, nor can it come from the statistic. If the data are a meaningless collection of values, then the summary statistics will also be meaningless - no arithmetic operation can magically create meaning out of nonsense. Therefore, the meaning of any statistic has to come from the context for the data, while the appropriateness of any statistic will depend upon the use we intend to make of that statistic." (Donald J Wheeler, "Myths About Data Analysis", International Lean & Six Sigma Conference, 2012)

"Each systems archetype embodies a particular theory about dynamic behavior that can serve as a starting point for selecting and formulating raw data into a coherent set of interrelationships. Once those relationships are made explicit and precise, the 'theory' of the archetype can then further guide us in our data-gathering process to test the causal relationships through direct observation, data analysis, or group deliberation." (Daniel H Kim, "Systems Archetypes as Dynamic Theories", The Systems Thinker Vol. 24 (1), 2013)

"Statistics is an integral part of the quantitative approach to knowledge. The field of statistics is concerned with the scientific study of collecting, organizing, analyzing, and drawing conclusions from data." (Kandethody M Ramachandran & Chris P Tsokos, "Mathematical Statistics with Applications in R" 2nd Ed., 2015)

"The term data, unlike the related terms facts and evidence, does not connote truth. Data is descriptive, but data can be erroneous. We tend to distinguish data from information. Data is a primitive or atomic state (as in ‘raw data’). It becomes information only when it is presented in context, in a way that informs. This progression from data to information is not the only direction in which the relationship flows, however; information can also be broken down into pieces, stripped of context, and stored as data. This is the case with most of the data that’s stored in computer systems. Data that’s collected and stored directly by machines, such as sensors, becomes information only when it’s reconnected to its context."  (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"Big data is, in a nutshell, large amounts of data that can be gathered up and analyzed to determine whether any patterns emerge and to make better decisions." (Daniel Covington, Analytics: Data Science, Data Analysis and Predictive Analytics for Business, 2016)

"Statistics can be defined as a collection of techniques used when planning a data collection, and when subsequently analyzing and presenting data." (Birger S Madsen, "Statistics for Non-Statisticians", 2016)

"Statistics is the science of collecting, organizing, and interpreting numerical facts, which we call data. […] Statistics is the science of learning from data." (Moore McCabe & Alwan Craig, "The Practice of Statistics for Business and Economics" 4th Ed., 2016)

"Collecting data through sampling therefore becomes a never-ending battle to avoid sources of bias. [...] While trying to obtain a random sample, researchers sometimes make errors in judgment about whether every person or thing is equally likely to be sampled." (Daniel J Levitin, "Weaponized Lies", 2017)

"Just because there’s a number on it, it doesn’t mean that the number was arrived at properly. […] There are a host of errors and biases that can enter into the collection process, and these can lead millions of people to draw the wrong conclusions. Although most of us won’t ever participate in the collection process, thinking about it, critically, is easy to learn and within the reach of all of us." (Daniel J Levitin, "Weaponized Lies", 2017)

"Measurements must be standardized. There must be clear, replicable, and precise procedures for collecting data so that each person who collects it does it in the same way." (Daniel J Levitin, "Weaponized Lies", 2017)

"To be any good, a sample has to be representative. A sample is representative if every person or thing in the group you’re studying has an equally likely chance of being chosen. If not, your sample is biased. […] The job of the statistician is to formulate an inventory of all those things that matter in order to obtain a representative sample. Researchers have to avoid the tendency to capture variables that are easy to identify or collect data on - sometimes the things that matter are not obvious or are difficult to measure." (Daniel J Levitin, "Weaponized Lies", 2017)

[Murphy’s Laws of Analysis:] "(1) In any collection of data, the figures that are obviously correct contain errors. (2) It is customary for a decimal to be misplaced. (3) An error that can creep into a calculation, will. Also, it will always be in the direction that will cause the most damage to the calculation." (G C Deakly)

"[…] numerous samples collected without a clear idea of what is to be done with the data are commonly less useful than a moderate number of samples collected in accordance with a specific design." (William C Krumbein)

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...