Browsing for tag "corpora"

Aug 05 2011

Tool: Corpex – Wikipedia Corpora Explorer

Developed within the RENDER project by KIT Karlsruhe,  the Wikipedia corpora explorer Corpex let’s you swiftly browse through all the words of Wikipedia. Select your language, and when you start typing, the system shows you two statistics in four graphs:
  1. the ten most frequent words that start with the typed sequence of letters (as a barcharts and a piechart), and
  2. the most frequent letter following the already typed sequence of letters (again, as a barchart and a piechart).
Additionally, the ten most frequent following words of any input word are visualized (as a barcharts and a piechart).
This can be used for many applications where the occurence of words in different language editions of Wikipedia is of use. An API is also provided for easy use of the data.
Corpex is currently available in the following languages: German (de), English (en), Spanish (es), French (fr), Hungarian (hr), Romanian (ro), Albanian (sq), Bulgarian (bg), Czech (cs), Italian (it), Swedish (sv), Serbian (sr), Croatian (hr), Serbo-Croatian (sh), Bosnian (bs), and simple English (simple). It is further available for the Brown Corpus (brown). Further languages are being prepared.
Corpex is still under development. The source code is fully open source, and all the data is also freely available. Feedback, and especially suggestions for cooperation, is welcome.

No responses yet