Saturday, July 23, 2011
I tried Soovle using as input the “knowledge” keyword:
As can be seen, excepting several common terms (e.g. “knowledge”, “knowledge management”), the output for each engine it’s quite different. Because there is no visual aid, the extraction of common terms between engines it’s not so easy as it should be. Maybe something should be done in this direction – for example using colors, font weight or sorting.
When multi-term words are used (e.g. “data information knowledge wisdom”) the output becomes difficult to read, so maybe some styling would be useful in order to help determine the start/end of a group of terms.
Currently, the tool allows selecting a group of 7, 11, respectively 15 search engines. It would be useful to reduce maybe the number of search engines and increase the number of terms, allowing to compare the results for only 2-3 search engines. A matrix (terms vs. engines) could facilitate maybe the visualization of data.
As it seems the spaces influence the output, for example “DIKW” vs. “DIKW ” vs. “ DIKW” will return different results.
I really like the fact that by providing a first letter of the second term (e.g. “knowledge a”), the output can be limited only to the terms whose second terms start with the specified letters. (I needed this kind of functionality some time ago and I had to rely entirely on Google’s autocomplete feature.)
I was searching for my favorite quote from J. Keats “a thing of beauty is a joy forever” (with quotes). Unfortunately, none of the default 7 engines returned the quote, even if Bing’s result included some “joy”-related results. I tried the same search directly in Google and Bing, and matches were found?! Same result for “joy+forever”. Without some deeper knowledge of the architecture of the search engines and the tool itself, it’s hard to find what causes this behavior or to identify some of the differences in processing.
Comment: In the initial post it seems I misquoted Keats. I can't recall if then I used the misquoted chunk of text or the actual quote. Rechecking Soovle, it actually returns results for Google, YouTube and Bing.
Since quite some time, Google provides an autocomplete feature extended to combinations of words. That’s quite an useful feature because often it “saved” my time from typing full words or combinations of words. What’s interesting is that the autocomplete algorithm provides the terms based on user’s search activities. I was asking myself if we could do more with search queries. This evening, while browsing, I discovered A. Smarty’s post on “How To Visualize and Play with Google Suggest Results”, in which she shortly presents three interesting tools: Web Seer, What do you suggest and Soovle. As I found out there are several other tools like Übersuggest, Quintura, etc. In this post I will focus only on Web Seer, following to review shortly several other similar tools in the next posts.
Web Seer allows users to compare the “matches” between two Google queries, for example “are man” vs. “are women”, “will he” vs. “will she”. To remain in blog’s thematic , I checked tool’s output for “data” vs. “information” and “information” vs. “knowledge”:
The query results for both terms are somehow predictable – “data mining”, “data warehouses”, “data entry” and “data values”, respectively “information architecture”, “information management”, “information security”, “information technology”, “information is beautiful” (see also the book) are quite popular terms in the scientific and non-scientific literature. I would expect the comparison is based on the most popular terms, because the two concepts don’t share many common terms, and even if there are some common terms within the above results (e.g. “data architecture”, “data systems”) they aren’t highly ranked. Arrows’ weight depicts the number of occurrences of the respective terms, which combined with the terms themselves, help to make an idea of the strength and resemblance existing between two concepts.
Climbing the DIKW scale here are the comparisons between information and “knowledge”, respectively “knowledge” and “wisdom”:
As it seems the results are consistent between relations, same combinations being used in two comparisons in which the same term is involved, life in the above diagrams. It’s natural that the results are also commutative, in order words “knowledge” vs. “information” renders same result as “information” vs. “knowledge”.
The association is also reflexive:
And transitive, as “data” vs. “information”, and “information” vs. “knowledge” lead to “data” vs. “knowledge”:
The algebraical operations are not so important, though some consistency of the results is needed between representations. It’s interesting that the comparison is influenced by a space placed at the beginning (e.g. “ data”) or end (“data ”), as can be seen in the following representation of the two:
I would expect other similar signs (e.g. punctuation signs, special characters) influence the comparisons too. Talking about DIKW, the knowledge pyramid, let’s see the comparison between “DIKW” and “data information knowledge wisdom”:
As the two concepts have close semantics, “DIKW” is the acronym for “data information knowledge wisdom”, here’s the comparison between two synonyms: “distribution” vs. “diffusion” (like in distribution/diffusion of knowledge). As can be seen the association is stronger.
Actually the first attempt with the tool was a comparison “concept map” vs. “mind map”:
Which looks slightly different than “concept maps” vs. “mind maps” (so the plural form of words introduces variances):
Considering the few examples run, the tool is quite intuitive and catchy. I would consider its utility as relative, even if the above examples are not representative and the relationships between them are more contextual. Still it’s a good tool for identifying automatically the relations/associations between concepts, to identify associations’ strength and maybe several semantic connotations. It would be interesting to see only the common terms, as many K-maps focus on this aspects, to introduce language and context, and the possibility to compare more than two terms (for example using Venn diagrams) or to show more/less common terms.