22 August 2021

Science: On Classification (Quotes)

"No occupation is more worthy of an intelligent and enlightened mind, than the study of Nature and natural objects; and whether we labour to investigate the structure and function of the human system, whether we direct our attention to the classification and habits of the animal kingdom, or prosecute our researches in the more pleasing and varied field of vegetable life, we shall constantly find some new object to attract our attention, some fresh beauties to excite our imagination, and some previously undiscovered source of gratification and delight." (Sir Joseph Paxton, "A Practical Treatise on the Cultivation of the Dahlia", 1838)

"Are our systems the inventions of naturalists, or only their reading of the Book of Nature? and can that book have more than one reading? If these classifications are not mere inventions, if they are not an attempt to classify for our own convenience the objects we study, then they are thoughts which, whether we detect them or not, are expressed in Nature, - then Nature is the work of thought, the production of intelligence carried out according to plan, therefore premeditated, - and in our study of natural objects we are approaching the thoughts of the Creator, reading His conceptions, interpreting a system that is His and not ours." (Jean L R Agassiz, "Methods of Study in Natural History", 1863)

"Science is the systematic classification of experience." (George H Lewes, "The Physical Basis of Mind", 1877)

"The classification of facts, the recognition of their sequence and relative significance is the function of science, and the habit of forming a judgment upon these facts unbiased by personal feeling is characteristic of what may be termed the scientific frame of mind." (Karl Pearson, "The Grammar of Science", 1892)

"The sole purpose of physical theory is to provide a representation and classification of experimental laws; the only test permitting us to judge a physical theory and pronounce it good or bad is the comparison between the consequences of this theory and the experimental laws it has to represent and classify."  (Pierre-Maurice-Marie Duhem, “The Aim and Structure of Physical Theory”, 1908)

"Science works by the slow method of the classification of data, arranging the detail patiently in a periodic system into groups of facts, in series like the strata of the rocks. For each series there must be a vocabulary of special words which do not always make good sense when used in another series. But the laws of periodicity seem to hold throughout, among the elements and in every sphere of thought, and we must learn to co-ordinate the whole through our new conception of the reign of relativity." (William H Pallister, "Poems of Science", 1931)

"A […] difference between most system-building in the social sciences and systems of thought and classification of the natural sciences is to be seen in their evolution. In the natural sciences both theories and descriptive systems grow by adaptation to the increasing knowledge and experience of the scientists. In the social sciences, systems often issue fully formed from the mind of one man. Then they may be much discussed if they attract attention, but progressive adaptive modification as a result of the concerted efforts of great numbers of men is rare." (Lawrence J Henderson, "The Study of Man", 1941)

"Statistics is the fundamental and most important part of inductive logic. It is both an art and a science, and it deals with the collection, the tabulation, the analysis and interpretation of quantitative and qualitative measurements. It is concerned with the classifying and determining of actual attributes as well as the making of estimates and the testing of various hypotheses by which probable, or expected, values are obtained. It is one of the means of carrying on scientific research in order to ascertain the laws of behavior of things - be they animate or inanimate. Statistics is the technique of the Scientific Method." (Bruce D Greenschields & Frank M Weida, "Statistics with Applications to Highway Traffic Analyses", 1952)

"It might be reasonable to expect that the more we know about any set of statistics, the greater the confidence we would have in using them, since we would know in which directions they were defective; and that the less we know about a set of figures, the more timid and hesitant we would be in using them. But, in fact, it is the exact opposite which is normally the case; in this field, as in many others, knowledge leads to caution and hesitation, it is ignorance that gives confidence and boldness. For knowledge about any set of statistics reveals the possibility of error at every stage of the statistical process; the difficulty of getting complete coverage in the returns, the difficulty of framing answers precisely and unequivocally, doubts about the reliability of the answers, arbitrary decisions about classification, the roughness of some of the estimates that are made before publishing the final results. Knowledge of all this, and much else, in detail, about any set of figures makes one hesitant and cautious, perhaps even timid, in using them." (Ely Devons, "Essays in Economics", 1961)

"The purpose of a classification scheme is to arrange information, in documents on shelves or on cards in indexes, in a sequence that will be helpful to the user." (Douglas J Foskett, Classification and indexing in the social sciences, 1963)

"Ultimately, discovery and invention are both problems of classification, and classification is fundamentally a problem of finding sameness. When we classify, we seek to group things that have a common structure or exhibit a common behavior." (Grady Booch, "Object-oriented design: With Applications", 1991)

"Many of the basic functions performed by neural networks are mirrored by human abilities. These include making distinctions between items (classification), dividing similar things into groups (clustering), associating two or more things (associative memory), learning to predict outcomes based on examples (modeling), being able to predict into the future (time-series forecasting), and finally juggling multiple goals and coming up with a good- enough solution (constraint satisfaction)." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"The methods of science include controlled experiments, classification, pattern recognition, analysis, and deduction. In the humanities we apply analogy, metaphor, criticism, and (e)valuation. In design we devise alternatives, form patterns, synthesize, use conjecture, and model solutions." (Béla H Bánáthy, "Designing Social Systems in a Changing World", 1996) 

"While classification is important, it can certainly be overdone. Making too fine a distinction between things can be as serious a problem as not being able to decide at all. Because we have limited storage capacity in our brain (we still haven't figured out how to add an extender card), it is important for us to be able to cluster similar items or things together. Not only is clustering useful from an efficiency standpoint, but the ability to group like things together (called chunking by artificial intelligence practitioners) is a very important reasoning tool. It is through clustering that we can think in terms of higher abstractions, solving broader problems by getting above all of the nitty-gritty details." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"We build models to increase productivity, under the justified assumption that it's cheaper to manipulate the model than the real thing. Models then enable cheaper exploration and reasoning about some universe of discourse. One important application of models is to understand a real, abstract, or hypothetical problem domain that a computer system will reflect. This is done by abstraction, classification, and generalization of subject-matter entities into an appropriate set of classes and their behavior." (Stephen J Mellor, "Executable UML: A Foundation for Model-Driven Architecture", 2002)

"The domain of systems science consists thus of all kinds of relational properties which are valid for particular classes of systems, or, in some rare instances, are valid for all systems. The chosen relational classification of systems determines the way in which the domain of systems is divided into subdomains, in a similar fashion as the domain of traditional science has been divided into subdomains of the various disciplines and specializations." (George J Klir & Doug Elias, "Architecture of Systems Problem Solving" 2nd Ed, 2003) 

"Compared to traditional statistical studies, which are often hindsight, the field of data mining finds patterns and classifications that look toward and even predict the future. In summary, data mining can (1) provide a more complete understanding of data by finding patterns previously not seen and (2) make models that predict, thus enabling people to make better decisions, take action, and therefore mold future events." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"A problem in data mining when random variations in data are misclassified as important patterns. Overfitting often occurs when the data set is too small to represent the real world." (Microsoft, "SQL Server 2012 Glossary", 2012)

"The power of deep learning models comes from their ability to classify or predict nonlinear data using a modest number of parallel nonlinear steps4. A deep learning model learns the input data features hierarchy all the way from raw data input to the actual classification of the data. Each layer extracts features from the output of the previous layer." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

"Decision trees are important for a few reasons. First, they can both classify and regress. It requires literally one line of code to switch between the two models just described, from a classification to a regression. Second, they are able to determine and share the feature importance of a given training set." (Russell Jurney, "Agile Data Science 2.0: Building Full-Stack Data Analytics Applications with Spark", 2017)

"Multilayer perceptrons share with polynomial classifiers one unpleasant property. Theoretically speaking, they are capable of modeling any decision surface, and this makes them prone to overfitting the training data."  (Miroslav Kubat," An Introduction to Machine Learning" 2nd Ed., 2017)

 "The main reason why pruning tends to improve classification performance on future examples is that the removal of low-level tests, which have poor statistical support, usually reduces the danger of overfitting. This, however, works only up to a certain point. If overdone, a very high extent of pruning can (in the extreme) result in the decision being replaced with a single leaf labeled with the majority class." (Miroslav Kubat," An Introduction to Machine Learning" 2nd Ed., 2017)

"There are other problems with Big Data. In any large data set, there are bound to be inconsistencies, misclassifications, missing data - in other words, errors, blunders, and possibly lies. These problems with individual items occur in any data set, but they are often hidden in a large mass of numbers even when these numbers are generated out of computer interactions." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"An advantage of random forests is that it works with both regression and classification trees so it can be used with targets whose role is binary, nominal, or interval. They are also less prone to overfitting than a single decision tree model. A disadvantage of a random forest is that they generally require more trees to improve their accuracy. This can result in increased run times, particularly when using very large data sets." (Richard V McCarthy et al, "Applying Predictive Analytics: Finding Value in Data", 2019)

"The classifier accuracy would be extra ordinary when the test data and the training data are overlapping. But when the model is applied to a new data it will fail to show acceptable accuracy. This condition is called as overfitting." (Jesu V  Nayahi J & Gokulakrishnan K, "Medical Image Classification", 2019)

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...