Archive for August, 2011

Aug 05 2011

Tool: Corpex – Wikipedia Corpora Explorer

Developed within the RENDER project by KIT Karlsruhe,  the Wikipedia corpora explorer Corpex let’s you swiftly browse through all the words of Wikipedia. Select your language, and when you start typing, the system shows you two statistics in four graphs:
  1. the ten most frequent words that start with the typed sequence of letters (as a barcharts and a piechart), and
  2. the most frequent letter following the already typed sequence of letters (again, as a barchart and a piechart).
Additionally, the ten most frequent following words of any input word are visualized (as a barcharts and a piechart).
This can be used for many applications where the occurence of words in different language editions of Wikipedia is of use. An API is also provided for easy use of the data.
Corpex is currently available in the following languages: German (de), English (en), Spanish (es), French (fr), Hungarian (hr), Romanian (ro), Albanian (sq), Bulgarian (bg), Czech (cs), Italian (it), Swedish (sv), Serbian (sr), Croatian (hr), Serbo-Croatian (sh), Bosnian (bs), and simple English (simple). It is further available for the Brown Corpus (brown). Further languages are being prepared.
Corpex is still under development. The source code is fully open source, and all the data is also freely available. Feedback, and especially suggestions for cooperation, is welcome.

No responses yet

Aug 02 2011

Personalization or What the Internet is hiding from you

The Economist’s article about the dangers of the internet goes into detail about the so called “filter bubble”, a unique universe of information for each single person, that we can experience everyday on Google, Amazon, Facebook and Co. Our location, interests, previous surf behaviors etc. are taking into account from these sites and we are presented with a personalized result. That sounds good, but is this all we want to have?

Eli Pariser and other critics think this is dangerous and believe that this approach prevents us from seeing and using the full potential of the Internet, not being presented with information that doesn’t fit into our own universe of opinions and interests. Eli Pariser calls this “invisible autopropaganda, indoctrinating us with our own ideas”. In his book “The filter bubble: what the Internet is hiding from you” he goes into detail how such a filtered Internet can be dangerous and how sites can give users more control over their personal data.

No responses yet