May 2014
Polish Summaries Corpus
A corpus of Polish news summaries based on texts extracted from the Rzeczpospolita corpus, released under the CC BY 3.0 licence. Co-funded by the ATLAS project and by the European Union.
This page offers the official Creative Commons Attribution 3.0 Unported License release of the corpus of Polish news summaries, which creation was cofounded by the ATLAS project and by the European Union from resources of the European Social Fund — Project PO KL „Information technologies: Research and their interdisciplinary applications”. By downloading the corpus data you accept the conditions of that licence.
Contact person: Mateusz Kopeć License: CC BY v.3

Texts to summarize were extracted from Rzeczpospolita corpus and are currently available on terms stated at that corpus webpage.
Documentation
Description of the corpus (in English).
Downloads
Preliminary version of the corpus is available to download under the following link:
There is a Java API to the corpus:
- source code is available at git repository
- Maven users may add following dependency:
<dependency>
<groupId>pl.waw.ipipan.zil.summ</groupId>
<artifactId>pscapi</artifactId>
<version>1.0</version>
</dependency>
and repository:
<repository>
<id>zil-maven-repo</id>
<name>ZIL maven repository</name>
<url>http://maven.nlp.ipipan.waw.pl/content/repositories/releases/</url>
</repository>
Citing
When using Polish Summaries Corpus, please cite the following article:
Maciej Ogrodniczuk and Mateusz Kopeć. The Polish Summaries Corpus. In Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, Reykjavík, Iceland, pp. 3712–3715. [PDF]