|
Polish language of the XX century sixties
This site is dedicated to the corpus of frequency
dictionary of contemporary Polish. The original purpose of the
corpus was to create a general frequency dictionary of contemporary
Polish. The work started in 1967. Partial results were published
between 1972 and 1977, the completed dictionary in 1990. The corpus
was later augmented in various respects, both by manual editing and
automated procedures.
Corpus data contain 10,000 samples divided into 5 parts: essays,
news, scientific texts, fiction and plays. Every sample is approximately
50 words long, they all come from texts published between 1963 and 1967
and contain bibliographic description of its source. Each word is tagged
with its base form and some morphological properties. Sentence boundaries
are also marked.
In 2001 corpus authors agreed to publish the data in the Internet
under GNU licence. This site presents corpus data in base and extended
(enhanced) version as well as additional materials and corpus
documentation.
Contact information
Webmaster: Maciej Ogrodniczuk (Maciej.Ogrodniczuk [at] ipipan.waw.pl).
| Last update: Maciej Ogrodniczuk, 20/10/2008 |
|