"[…] the intrinsic value of a small-scale model is that it compensates for the renunciation of sensible dimensions by the acquisition of intelligible dimensions." (Claude Levi- Strauss, "The Savage Mind", 1962)
"The idea of knowledge as an improbable structure is still a good place to start. Knowledge, however, has a dimension which goes beyond that of mere information or improbability. This is a dimension of significance which is very hard to reduce to quantitative form. Two knowledge structures might be equally improbable but one might be much more significant than the other." (Kenneth E Boulding, "Beyond Economics: Essays on Society", 1968)
"A time series is a sequence of observations, usually ordered in time, although in some cases the ordering may be according to another dimension. The feature of time series analysis which distinguishes it from other statistical analysis is the explicit recognition of the importance of the order in which the observations are made. While in many problems the observations are statistically independent, in time series successive observations may be dependent, and the dependence may depend on the positions in the sequence. The nature of a series and the structure of its generating process also may involve in other ways the sequence in which the observations are taken." (Theodore W Anderson, "The Statistical Analysis of Time Series", 1971)
"The greatest plus of data modeling is that it produces a simple and understandable picture of the relationship between the input variables and responses [...] different models, all of them equally good, may give different pictures of the relation between the predictor and response variables [...] One reason for this multiplicity is that goodness-of-fit tests and other methods for checking fit give a yes–no answer. With the lack of power of these tests with data having more than a small number of dimensions, there will be a large number of models whose fit is acceptable. There is no way, among the yes–no methods for gauging fit, of determining which is the better model." (Leo Breiman, "Statistical modeling: The two cultures" Statistical Science 16(3), 2001)
"With the ever increasing amount of empirical information that scientists from all disciplines are dealing with, there exists a great need for robust, scalable and easy to use clustering techniques for data abstraction, dimensionality reduction or visualization to cope with and manage this avalanche of data." (Jörg Reichardt, "Structure in Complex Networks", 2009)
"The more dimensions used in quantitative comparisons, the larger are the disparities that can be accommodated. As irony would have it, however, the ease of comparison generally diminishes in direct proportion to the number of dimensions involved." (Joel Katz, "Designing Information: Human factors and common sense in information design", 2012)
"Dimensionality reduction is essential for coping with big data - like the data coming in through your senses every second. A picture may be worth a thousand words, but it’s also a million times more costly to process and remember. [...] A common complaint about big data is that the more data you have, the easier it is to find spurious patterns in it. This may be true if the data is just a huge set of disconnected entities, but if they’re interrelated, the picture changes." (Pedro Domingos, "The Master Algorithm", 2015)
"The correlational technique known as multiple regression is used frequently in medical and social science research. This technique essentially correlates many independent (or predictor) variables simultaneously with a given dependent variable (outcome or output). It asks, 'Net of the effects of all the other variables, what is the effect of variable A on the dependent variable?' Despite its popularity, the technique is inherently weak and often yields misleading results. The problem is due to self-selection. If we don’t assign cases to a particular treatment, the cases may differ in any number of ways that could be causing them to differ along some dimension related to the dependent variable. We can know that the answer given by a multiple regression analysis is wrong because randomized control experiments, frequently referred to as the gold standard of research techniques, may give answers that are quite different from those obtained by multiple regression analysis." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)
"Dimensionality reduction is a way of reducing a large number of different measures into a smaller set of metrics. The intent is that the reduced metrics are a simpler description of the complex space that retains most of the meaning." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)
"The higher the dimension, in other words, the higher the number of possible interactions, and the more disproportionally difficult it is to understand the macro from the micro, the general from the simple units. This disproportionate increase of computational demands is called the curse of dimensionality." (Nassim N Taleb, "Skin in the Game: Hidden Asymmetries in Daily Life", 2018)
No comments:
Post a Comment