31 July 2021

Science: On Variables (Quotes)

"Every scientific problem can be stated most clearly if it is thought of as a search for the nature of the relation between two defi nitely stated variables. Very often a scientific problem is felt and stated in other terms, but it cannot be so clearly stated in any way as when it is thought of as a function by which one variable is shown to be dependent upon or related to some other variable." (Louis L Thurstone, "The Fundamentals of Statistics", 1925)

"There is a science of simple things, an art of complicated ones. Science is feasible when the variables are few and can be enumerated; when their combinations are distinct and clear. We are tending toward the condition of science and aspiring to it. The artist works out his own formulas; the interest of science lies in the art of making science." (Paul Valéry, "Moralités", 1932)

"Maximal knowledge of a total system does not necessarily include total knowledge of all its parts, not even when these are fully separated from each other and at the moment are not influencing each other at all. Thus it may be that some part of what one knows may pertain to relations […] between the two subsystems (we shall limit ourselves to two), as follows: if a particular measurement on the first system yields this result, then for a particular measurement on the second the valid expectation statistics are such and such; but if the measurement in question on the first system should have that result, then some other expectation holds for that one the second. […] In this way, any measurement process at all or, what amounts to the same, any variable at all of the second system can be tied to the not-yet-known value of any variable at all of the first, and of course vice versa also." (Erwin Schrödinger, "The Present Situation in Quantum Mechanics", 1935)

"The general method involved may be very simply stated. In cases where the equilibrium values of our variables can be regarded as the solutions of an extremum (maximum or minimum) problem, it is often possible regardless of the number of variables involved to determine unambiguously the qualitative behavior of our solution values in respect to changes of parameters." (Paul Samuelson, "Foundations of Economic Analysis", 1947)

"[Disorganized complexity] is a problem in which the number of variables is very large, and one in which each of the many variables has a behavior which is individually erratic, or perhaps totally unknown. However, in spite of this helter-skelter, or unknown, behavior of all the individual variables, the system as a whole possesses certain orderly and analyzable average properties. [...] [Organized complexity is] not problems of disorganized complexity, to which statistical methods hold the key. They are all problems which involve dealing simultaneously with a sizable number of factors which are interrelated into an organic whole. They are all, in the language here proposed, problems of organized complexity." (Warren Weaver, "Science and Complexity", American Scientist Vol. 36, 1948)

"Dynamic theory [...] shows how certain changes in the variables can be explained on the basis of [...] structural characteristics of the system. [...] The economy, of course, does not necessarily find an equilibrium position." (Wassily Leontief, "Studies in the Structure of the American Economy", 1953)

"The primary purpose of a graph is to show diagrammatically how the values of one of two linked variables change with those of the other. One of the most useful applications of the graph occurs in connection with the representation of statistical data." (John F Kenney & E S Keeping, "Mathematics of Statistics" Vol. I 3rd Ed., 1954)

"The well-known virtue of the experimental method is that it brings situational variables under tight control. It thus permits rigorous tests of hypotheses and confidential statements about causation. The correlational method, for its part, can study what man has not learned to control. Nature has been experimenting since the beginning of time, with a boldness and complexity far beyond the resources of science. The correlator’s mission is to observe and organize the data of nature’s experiments." (Lee J Cronbach, "The Two Disciplines of Scientific Psychology", The American Psychologist Vol. 12, 1957)

"A satisfactory prediction of the sequential properties of learning data from a single experiment is by no means a final test of a model. Numerous other criteria - and some more demanding - can be specified. For example, a model with specific numerical parameter values should be invariant to changes in independent variables that explicitly enter in the model." (Robert R Bush & Frederick Mosteller,"A Comparison of Eight Models?", Studies in Mathematical Learning Theory, 1959)

"A satisfactory prediction of the sequential properties of learning data from a single experiment is by no means a final test of a model. Numerous other criteria - and some more demanding - can be specified. For example, a model with specific numerical parameter values should be invariant to changes in independent variables that explicitly enter in the model." (Robert R Bush & Frederick Mosteller,"A Comparison of Eight Models?", Studies in Mathematical Learning Theory, 1959)

"Science manipulates things and gives up living in them. It makes its own limited models of things; operating upon these indices or variables to effect whatever transformations are permitted by their definition, it comes face to face with the real world only at rare intervals. Science is and always will be that admirably active, ingenious, and bold way of thinking whose fundamental bias is to treat everything as though it were an object-in-general - as though it meant nothing to us and yet was predestined for our own use." (Maurice Merleau-Ponty, "L'Œil et l'Esprit", 1960)

"[A] sequence is random if it has every property that is shared by all infinite sequences of independent samples of random variables from the uniform distribution." (Joel N Franklin, 1962)

"The most valuable use of such [mathematical] models usually lies less in turning out the answer in an uncertain world than in shedding light on how much difference an alteration in the assumptions and/or variables used would make in the answer yielded by the models." (Edward G. Bennion, "New Decision-Making Tools for Managers", 1963)

"Measurement has too often been the leitmotif of many investigations rather than the experimental examination of hypotheses. Mounds of data are collected, which are statistically decorous and methodologically unimpeachable, but conclusions are often trivial and rarely useful in decision making. This results from an overly rigorous control of an insignificant variable and a widespread deficiency in the framing of pertinent questions. Investigators seem to have settled for what is measurable instead of measuring what they would really like to know." (Edmund D Pellegrino, "Patient Care: Mystical Research or Researchable Mystique", Clinical Research, 1964)

"Most of our beliefs about complex organizations follow from one or the other of two distinct strategies. The closed-system strategy seeks certainty by incorporating only those variables positively associated with goal achievement and subjecting them to a monolithic control network. The open-system strategy shifts attention from goal achievement to survival and incorporates uncertainty by recognizing organizational interdependence with environment. A newer tradition enables us to conceive of the organization as an open system, indeterminate and faced with uncertainty, but subject to criteria of rationality and hence needing certainty." (James D Thompson, "Organizations in Action", 1967)

"The less we understand a phenomenon, the more variables we require to explain it." (Russell L Ackoff, "Management Science", 1967)

"To model the dynamic behavior of a system, four hierarchies of structure should be recognized: closed boundary around the system; feedback loops as the basic structural elements within the boundary; level variables representing accumulations within the feedback loops; rate variables representing activity within the feedback loops." (Jay W Forrester, "Urban Dynamics", 1969)

"However, and conversely, our models fall far short of representing the world fully. That is why we make mistakes and why we are regularly surprised. In our heads, we can keep track of only a few variables at one time. We often draw illogical conclusions from accurate assumptions, or logical conclusions from inaccurate assumptions. Most of us, for instance, are surprised by the amount of growth an exponential process can generate. Few of us can intuit how to damp oscillations in a complex system." (Donella H Meadows, "Limits to Growth", 1972)

"Changes of variables can be helpful for iterative and parametric solutions even if they do not linearize the problem. For example, a change of variables may change the 'shape' of J(x) into a more suitable form. Unfortunately there seems to be no· general way to choose the 'right' change of variables. Success depends on the particular problem and the engineer's insight. However, the possibility of a change of variables should always be considered."(Fred C Scweppe, "Uncertain dynamic systems", 1973)

"A mature science, with respect to the matter of errors in variables, is not one that measures its variables without error, for this is impossible. It is, rather, a science which properly manages its errors, controlling their magnitudes and correctly calculating their implications for substantive conclusions." (Otis D Duncan, "Introduction to Structural Equation Models", 1975)

"A system may be specified in either of two ways. In the first, which we shall call a state description, sets of abstract inputs, outputs and states are given, together with the action of the inputs on the states and the assignments of outputs to states. In the second, which we shall call a coordinate description, certain input, output and state variables are given, together with a system of dynamical equations describing the relations among the variables as functions of time. Modern mathematical system theory is formulated in terms of state descriptions, whereas the classical formulation is typically a coordinate description, for example a system of differential equations." (E S Bainbridge, "The Fundamental Duality of System Theory", 1975)

"The following four propositions, which appear to the author to be incapable of formal proof, are presented as Fundamental Postulates upon which the entire superstructure of General Systemantics [...] is based [...] (1) Everything is a system. (2) Everything is part of a larger system. (3) The universe is infinitely systematizable, both upward (larger systems) and downward (smaller systems) (4) All systems are infinitely complex. (The illusion of simplicity comes from focusing attention on one or a few variables.)" (John Gall, "Systemantics", 1975)

"[…] statistics - whatever their mathematical sophistication and elegance - cannot make bad variables into good ones." (H T Reynolds, "Analysis of Nominal Data", 1977)

"Managers construct, rearrange, single out, and demolish many objective features of their surroundings. When people act they unrandomize variables, insert vestiges of orderliness, and literally create their own constraints." (Karl E Weick, "Social Psychology of Organizing", 1979)

"The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data.(Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"The formal structure of a decision problem in any area can be put into four parts: (1) the choice of an objective function denning the relative desirability of different outcomes; (2) specification of the policy alternatives which are available to the agent, or decisionmaker, (3) specification of the model, that is, empirical relations that link the objective function, or the variables that enter into it, with the policy alternatives and possibly other variables; and (4) computational methods for choosing among the policy alternatives that one which performs best as measured by the objective function." (Kenneth Arrow, "The Economics of Information", 1984)

"A mechanistic model has the following advantages: 1. It contributes to our scientific understanding of the phenomenon under study. 2. It usually provides a better basis for extrapolation (at least to conditions worthy of further experimental investigation if not through the entire range of all input variables). 3. It tends to be parsimonious (i. e, frugal) in the use of parameters and to provide better estimates of the response." (George E P Box, "Empirical Model-Building and Response Surfaces", 1987)

"Symmetries abound in nature, in technology, and - especially - in the simplified mathematical models we study so assiduously. Symmetries complicate things and simplify them. They complicate them by introducing exceptional types of behavior, increasing the number of variables involved, and making vanish things that usually do not vanish. They simplify them by introducing exceptional types of behavior, increasing the number of variables involved, and making vanish things that usually do not vanish. They violate all the hypotheses of our favorite theorems, yet lead to natural generalizations of those theorems. It is now standard to study the 'generic' behavior of dynamical systems. Symmetry is not generic. The answer is to work within the world of symmetric systems and to examine a suitably restricted idea of genericity." (Ian Stewart, "Bifurcation with symmetry", 1988)

"A system of variables is 'interrelated' if an action that affects or meant to affect one part of the system will also affect other parts of it. Interrelatedness guarantees that an action aimed at one variable will have side effects and long-term repercussions. A large number of variables will make it easy to overlook them." (Dietrich Dorner, "The Logic of Failure: Recognizing and Avoiding Error in Complex Situations", 1989)

"If we want to solve problems effectively [...] we must keep in mind not only many features but also the influences among them. Complexity is the label we will give to the existence of many interdependent variables in a given system. The more variables and the greater their interdependence, the greater the system's complexity. Great complexity places high demands on a planner's capacity to gather information, integrate findings, and design effective actions. The links between the variables oblige us to attend to a great many features simultaneously, and that, concomitantly, makes it impossible for us to undertake only one action in a complex system." (Dietrich Dorner, "The Logic of Failure: Recognizing and Avoiding Error in Complex Situations", 1989)

"We might think of complexity could be regarded as an objective attribute of systems. We might even think we could assign a numerical value to it, making it, for instance, the product of the number of features times the number of interrelationships. If a system had ten variables and five links between them, then its 'complexity quotient', measured in this way would be fifty. If there are no links, its complexity quotient would be zero. Such attempts to measure the complexity of a system have in fact been made." (Dietrich Dorner, "The Logic of Failure: Recognizing and Avoiding Error in Complex Situations", 1989)

"The real leverage in most management situations lies in understanding dynamic complexity, not detail complexity. […] Unfortunately, most 'systems analyses' focus on detail complexity not dynamic complexity. Simulations with thousands of variables and complex arrays of details can actually distract us from seeing patterns and major interrelationships. In fact, sadly, for most people 'systems thinking' means 'fighting complexity with complexity', devising increasingly 'complex' (we should really say 'detailed') solutions to increasingly 'complex' problems. In fact, this is the antithesis of real systems thinking." (Peter M Senge, "The Fifth Discipline: The Art and Practice of the Learning Organization", 1990)

"Dynamical systems that vary continuously, like the pendulum and the rolling rock, and evidently the pinball machine when a ball’s complete motion is considered, are technically known as flows. The mathematical tool for handling a flow is the differential equation. A system of differential equations amounts to a set of formulas that together express the rates at which all of the variables are currently changing, in terms of the current values of the variables." (Edward N Lorenz, "The Essence of Chaos", 1993)

"Dynamical systems that vary in discrete steps […] are technically known as mappings. The mathematical tool for handling a mapping is the difference equation. A system of difference equations amounts to a set of formulas that together express the values of all of the variables at the next step in terms of the values at the current step. […] For mappings, the difference equations directly express future states in terms of present ones, and obtaining chronological sequences of points poses no problems. For flows, the differential equations must first be solved. General solutions of equations whose particular solutions are chaotic cannot ordinarily be found, and approximations to the latter are usually determined by numerical methods." (Edward N Lorenz, "The Essence of Chaos", 1993)

"Just as few concrete physical systems are strictly deterministic in their behavior, so very few are strictly linear. The great importance of linearity lies in a combination of two circumstances. First, many tangible phenomena behave approximately linearly over restricted periods of time or restricted ranges of the variables, so that useful linear mathematical models can simulate their behavior. A pendulum swinging through a small angle is a nearly linear system. Second, linear equations can be handled by a wide variety of techniques that do not work with nonlinear equations." (Edward N Lorenz, "The Essence of Chaos", 1993)

"Complex adaptive systems have the property that if you run them - by just letting the mathematical variable of 'time' go forward - they'll naturally progress from chaotic, disorganized, undifferentiated, independent states to organized, highly differentiated, and highly interdependent states. Organized structures emerge spontaneously. [...]A weak system gives rise only to simpler forms of self-organization; a strong one gives rise to more complex forms, like life. (J Doyne Farmer, "The Third Culture: Beyond the Scientific Revolution", 1995)

"In addition to dimensionality requirements, chaos can occur only in nonlinear situations. In multidimensional settings, this means that at least one term in one equation must be nonlinear while also involving several of the variables. With all linear models, solutions can be expressed as combinations of regular and linear periodic processes, but nonlinearities in a model allow for instabilities in such periodic solutions within certain value ranges for some of the parameters." (Courtney Brown, "Chaos and Catastrophe Theories", 1995)

"Small changes in the initial conditions in a chaotic system produce dramatically different evolutionary histories. It is because of this sensitivity to initial conditions that chaotic systems are inherently unpredictable. To predict a future state of a system, one has to be able to rely on numerical calculations and initial measurements of the state variables. Yet slight errors in measurement combined with extremely small computational errors (from roundoff or truncation) make prediction impossible from a practical perspective. Moreover, small initial errors in prediction grow exponentially in chaotic systems as the trajectories evolve. Thus, theoretically, prediction may be possible with some chaotic processes if one is interested only in the movement between two relatively close points on a trajectory. When longer time intervals are involved, the situation becomes hopeless." (Courtney Brown, "Chaos and Catastrophe Theories", 1995)

"System dynamics is a method for studying the world around us. Unlike other scientists, who study the world by breaking it up into smaller and smaller pieces, system dynamicists look at things as a whole. The central concept to system dynamics is understanding how all the objects in a system interact with one another. A system can be anything from a steam engine, to a bank account, to a basketball team. The objects and people in a system interact through 'feedback' loops, where a change in one variable affects other variables over time, which in turn affects the original variable, and so on." (Edward Yourdon, "Death March", 1997)

"By a variable we will mean an attribute, measurement or inquiry that may take on one of several possible outcomes, or values, from a specified domain. If we have beliefs (i.e., probabilities) attached to the possible values that a variable may attain, we will call that variable a random variable." (Judea Pearl, "Causality: Models, Reasoning, and Inference", 2000)

"The greatest plus of data modeling is that it produces a simple and understandable picture of the relationship between the input variables and responses [...] different models, all of them equally good, may give different pictures of the relation between the predictor and response variables [...] One reason for this multiplicity is that goodness-of-fit tests and other methods for checking fit give a yes–no answer. With the lack of power of these tests with data having more than a small number of dimensions, there will be a large number of models whose fit is acceptable. There is no way, among the yes–no methods for gauging fit, of determining which is the better model." (Leo Breiman, "Statistical modeling: The two cultures" Statistical Science 16(3), 2001)

"Trimming potentially theoretically meaningful variables is not advisable unless one is quite certain that the coefficient for the variable is near zero, that the variable is inconsequential, and that trimming will not introduce misspecification error." (James Jaccard, "Interaction Effects in Logistic Regression", 2001)

"A smaller model with fewer covariates has two advantages: it might give better predictions than a big model and it is more parsimonious (simpler). Generally, as you add more variables to a regression, the bias of the predictions decreases and the variance increases. Too few covariates yields high bias; this called underfitting. Too many covariates yields high variance; this called overfitting. Good predictions result from achieving a good balance between bias and variance. […] fiding a good model involves trading of fit and complexity." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Nonetheless, the basic principles regarding correlations between variables are not that diffcult to understand. We must look for patterns that reveal potential relationships and for evidence that variables are actually related. But when we do spot those relationships, we should not jump to conclusions about causality. Instead, we need to weigh the strength of the relationship and the plausibility of our theory, and we must always try to discount the possibility of spuriousness." (Joel Best, "More Damned Lies and Statistics: How numbers confuse public issues", 2004)

"Humans have difficulty perceiving variables accurately […]. However, in general, they tend to have inaccurate perceptions of system states, including past, current, and future states. This is due, in part, to limited ‘mental models’ of the phenomena of interest in terms of both how things work and how to influence things. Consequently, people have difficulty determining the full implications of what is known, as well as considering future contingencies for potential systems states and the long-term value of addressing these contingencies. " (William B. Rouse, "People and Organizations: Explorations of Human-Centered Design", 2007)

"Swarm intelligence can be effective when applied to highly complicated problems with many nonlinear factors, although it is often less effective than the genetic algorithm approach discussed later in this chapter. Swarm intelligence is related to swarm optimization […]. As with swarm intelligence, there is some evidence that at least some of the time swarm optimization can produce solutions that are more robust than genetic algorithms. Robustness here is defined as a solution’s resistance to performance degradation when the underlying variables are changed." (Michael J North & Charles M Macal, "Managing Business Complexity: Discovering Strategic Solutions with Agent-Based Modeling and Simulation", 2007)

"Graphical displays are often constructed to place principal focus on the individual observations in a dataset, and this is particularly helpful in identifying both the typical positions of data points and unusual or influential cases. However, in many investigations, principal interest lies in identifying the nature of underlying trends and relationships between variables, and so it is often helpful to enhance graphical displays in ways which give deeper insight into these features. This can be very beneficial both for small datasets, where variation can obscure underlying patterns, and large datasets, where the volume of data is so large that effective representation inevitably involves suitable summaries." (Adrian W Bowman, "Smoothing Techniques for Visualisation" [in "Handbook of Data Visualization"], 2008)

"System dynamics is a top-down approach for modelling system changes over time. Key state variables that define the behaviour of the system have to be identified and these are then related to each other through coupled, differential equations." (Peer-Olaf Siebers & Uwe Aickelin, "Introduction to Multi-Agent Simulation", 2008)

"All forms of complex causation, and especially nonlinear transformations, admittedly stack the deck against prediction. Linear describes an outcome produced by one or more variables where the effect is additive. Any other interaction is nonlinear. This would include outcomes that involve step functions or phase transitions. The hard sciences routinely describe nonlinear phenomena. Making predictions about them becomes increasingly problematic when multiple variables are involved that have complex interactions. Some simple nonlinear systems can quickly become unpredictable when small variations in their inputs are introduced." (Richard N Lebow, "Forbidden Fruit: Counterfactuals and International Relations", 2010)

"Given the important role that correlation plays in structural equation modeling, we need to understand the factors that affect establishing relationships among multivariable data points. The key factors are the level of measurement, restriction of range in data values (variability, skewness, kurtosis), missing data, nonlinearity, outliers, correction for attenuation, and issues related to sampling variation, confidence intervals, effect size, significance, sample size, and power." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"Outliers or influential data points can be defined as data values that are extreme or atypical on either the independent (X variables) or dependent (Y variables) variables or both. Outliers can occur as a result of observation errors, data entry errors, instrument errors based on layout or instructions, or actual extreme values from self-report data. Because outliers affect the mean, the standard deviation, and correlation coefficient values, they must be explained, deleted, or accommodated by using robust statistics." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"System dynamics is an approach to understanding the behaviour of over time. It deals with internal feedback loops and time delays that affect the behaviour of the entire system. It also helps the decision maker untangle the complexity of the connections between various policy variables by providing a new language and set of tools to describe. Then it does this by modeling the cause and effect relationships among these variables." (Raed M Al-Qirem & Saad G Yaseen, "Modelling a Small Firm in Jordan Using System Dynamics", 2010)

"There are several key issues in the field of statistics that impact our analyses once data have been imported into a software program. These data issues are commonly referred to as the measurement scale of variables, restriction in the range of data, missing data values, outliers, linearity, and nonnormality." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"There are three possible reasons for [the] absence of predictive power. First, it is possible that the models are misspecified. Second, it is possible that the model’s explanatory factors are measured at too high a level of aggregation [...] Third, [...] the search for statistically significant relationships may not be the strategy best suited for evaluating our model’s ability to explain real world events [...] the lack of predictive power is the result of too much emphasis having been placed on finding statistically significant variables, which may be overdetermined. Statistical significance is generally a flawed way to prune variables in regression models [...] Statistically significant variables may actually degrade the predictive accuracy of a model [...] [By using] models that are constructed on the basis of pruning undertaken with the shears of statistical significance, it is quite possible that we are winnowing our models away from predictive accuracy." (Michael D Ward et al, "The perils of policy by p-value: predicting civil conflicts" Journal of Peace Research 47, 2010)

"[…] a conceptual model is a diagram connecting variables and constructs based on theory and logic that displays the hypotheses to be tested." (Mary W Celsi et al, "Essentials of Business Research Methods", 2011)

"Complexity is a relative term. It depends on the number and the nature of interactions among the variables involved. Open loop systems with linear, independent variables are considered simpler than interdependent variables forming nonlinear closed loops with a delayed response." (Jamshid Gharajedaghi, "Systems Thinking: Managing Chaos and Complexity A Platform for Designing Business Architecture" 3rd Ed., 2011)

"Simplicity in a system tends to increase that system's efficiency. Because less can go wrong with fewer parts, less will. Complexity in a system tends to increase that system's inefficiency; the greater the number of variables, the greater the probability of those variables clashing, and in turn, the greater the potential for conflict and disarray. Because more can go wrong, more will. That is why centralized systems are inclined to break down quickly and become enmeshed in greater unintended consequences." (Lawrence K Samuels, "Defense of Chaos: The Chaology of Politics, Economics and Human Action", 2013)

"When statisticians, trained in math and probability theory, try to assess likely outcomes, they demand a plethora of data points. Even then, they recognize that unless it’s a very simple and controlled action such as flipping a coin, unforeseen variables can exert significant influence." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"A basic problem with MRA is that it typically assumes that the independent variables can be regarded as building blocks, with each variable taken by itself being logically independent of all the others. This is usually not the case, at least for behavioral data. […] Just as correlation doesn’t prove causation, absence of correlation fails to prove absence of causation. False-negative findings can occur using MRA just as false-positive findings do—because of the hidden web of causation that we’ve failed to identify." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"Accuracy and coherence are related concepts pertaining to data quality. Accuracy refers to the comprehensiveness or extent of missing data, performance of error edits, and other quality assurance strategies. Coherence is the degree to which data - item value and meaning are consistent over time and are comparable to similar variables from other routinely used data sources." (Aileen Rothbard, "Quality Issues in the Use of Administrative Data Records", 2015)

"One technique employing correlational analysis is multiple regression analysis (MRA), in which a number of independent variables are correlated simultaneously (or sometimes sequentially, but we won’t talk about that variant of MRA) with some dependent variable. The predictor variable of interest is examined along with other independent variables that are referred to as control variables. The goal is to show that variable A influences variable B 'net of' the effects of all the other variables. That is to say, the relationship holds even when the effects of the control variables on the dependent variable are taken into account." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"The fundamental problem with MRA, as with all correlational methods, is self-selection. The investigator doesn’t choose the value for the independent variable for each subject (or case). This means that any number of variables correlated with the independent variable of interest have been dragged along with it. In most cases, we will fail to identify all these variables. In the case of behavioral research, it’s normally certain that we can’t be confident that we’ve identified all the plausibly relevant variables." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"The theory behind multiple regression analysis is that if you control for everything that is related to the independent variable and the dependent variable by pulling their correlations out of the mix, you can get at the true causal relation between the predictor variable and the outcome variable. That’s the theory. In practice, many things prevent this ideal case from being the norm." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"The correlational technique known as multiple regression is used frequently in medical and social science research. This technique essentially correlates many independent (or predictor) variables simultaneously with a given dependent variable (outcome or output). It asks, 'Net of the effects of all the other variables, what is the effect of variable A on the dependent variable?' Despite its popularity, the technique is inherently weak and often yields misleading results. The problem is due to self-selection. If we don’t assign cases to a particular treatment, the cases may differ in any number of ways that could be causing them to differ along some dimension related to the dependent variable. We can know that the answer given by a multiple regression analysis is wrong because randomized control experiments, frequently referred to as the gold standard of research techniques, may give answers that are quite different from those obtained by multiple regression analysis." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"The theory behind multiple regression analysis is that if you control for everything that is related to the independent variable and the dependent variable by pulling their correlations out of the mix, you can get at the true causal relation between the predictor variable and the outcome variable. That’s the theory. In practice, many things prevent this ideal case from being the norm." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"In the classical deterministic scenario, a model consists of a few variables and physical constants. The relational structure of the model is conceptualized by the scientist via intuition gained from thinking about the physical world. Intuition means that the scientist has some mental construct regarding the interactions beyond positing a skeletal mathematical system he believes is sufficiently rich to capture the interactions and then depending upon data to infer the relational structure and estimate a large number of parameters." (Edward R Dougherty, "The Evolution of Scientific Knowledge: From certainty to uncertainty", 2016) 

"Validity of a theory is also known as construct validity. Most theories in science present broad conceptual explanations of relationship between variables and make many different predictions about the relationships between particular variables in certain situations. Construct validity is established by verifying the accuracy of each possible prediction that might be made from the theory. Because the number of predictions is usually infinite, construct validity can never be fully established. However, the more independent predictions for the theory verified as accurate, the stronger the construct validity of the theory." (K  N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"Decision trees are considered a good predictive model to start with, and have many advantages. Interpretability, variable selection, variable interaction, and the flexibility to choose the level of complexity for a decision tree all come into play." (Ralph Winters, "Practical Predictive Analytics", 2017)

"The degree to which one variable can be predicted from another can be calculated as the correlation between them. The square of the correlation (R^2) is the proportion of the variance of one that can be 'explained' by knowledge of the other." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"To be any good, a sample has to be representative. A sample is representative if every person or thing in the group you’re studying has an equally likely chance of being chosen. If not, your sample is biased. […] The job of the statistician is to formulate an inventory of all those things that matter in order to obtain a representative sample. Researchers have to avoid the tendency to capture variables that are easy to identify or collect data on - sometimes the things that matter are not obvious or are difficult to measure." (Daniel J Levitin, "Weaponized Lies", 2017)

"Bayesian networks inhabit a world where all questions are reducible to probabilities, or (in the terminology of this chapter) degrees of association between variables; they could not ascend to the second or third rungs of the Ladder of Causation. Fortunately, they required only two slight twists to climb to the top." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"We humans are reasonably good at defining rules that check one, two, or even three attributes (also commonly referred to as features or variables), but when we go higher than three attributes, we can start to struggle to handle the interactions between them. By contrast, data science is often applied in contexts where we want to look for patterns among tens, hundreds, thousands, and, in extreme cases, millions of attributes." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"Decision trees show the breakdown of the data by one variable then another in a very intuitive way, though they are generally just diagrams that don’t actually encode data visually." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"One very common problem in data visualization is that encoding numerical variables to area is incredibly popular, but readers can’t translate it back very well." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"Random forests are essentially an ensemble of trees. They use many short trees, fitted to multiple samples of the data, and the predictions are averaged for each observation. This helps to get around a problem that trees, and many other machine learning techniques, are not guaranteed to find optimal models, in the way that linear regression is. They do a very challenging job of fitting non-linear predictions over many variables, even sometimes when there are more variables than there are observations. To do that, they have to employ 'greedy algorithms', which find a reasonably good model but not necessarily the very best model possible." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"Exponentially growing systems are prevalent in nature, spanning all scales from biochemical reaction networks in single cells to food webs of ecosystems. How exponential growth emerges in nonlinear systems is mathematically unclear. […] The emergence of exponential growth from a multivariable nonlinear network is not mathematically intuitive. This indicates that the network structure and the flux functions of the modeled system must be subjected to constraints to result in long-term exponential dynamics." (Wei-Hsiang Lin et al, "Origin of exponential growth in nonlinear reaction networks", PNAS 117 (45), 2020) 

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...