LREC Workshop: Leveraging Derived Text Formats to Unlock Copyrighted Collections for Open Science
PROGRAM
Tuesday, May 12, 2026
Session 1: Overview
14:00–15:30 · Room 9 · Chair: Philippe Genêt (Deutsche Nationalbibliothek)
| Time | Title | Authors |
|---|---|---|
| 14:00–14:10 | Welcome and Introduction | — |
| 14:10–14:30 | Derived Text Formats as Strategic Transformations of In-Copyright Materials to Support Open Science: A Survey | Christof Schöch |
| 14:30–14:50 | A Multi-dimensional Constrained Framework for Derived Text Formats | Keli Du, Christof Schöch |
| 14:50–15:10 | Legal implications of Derived Text Formats – a copyright perspective | Gianna Iacino, Pawel Kamocki, Keli Du |
| 15:10–15:30 | Revisiting Masking After Fifteen Years: Early Approaches to Non-Reconstructable Linguistic Data in the current context | Georg Rehm, Thorsten Trippel, Andreas Witt |
| 15:30–16:00 | Break | — |
Session 2: Applications
16:00–18:00 · Room 9 · Chair: Piroska Lendvai (Bavarian Academy of Sciences and Humanities)
| Time | Title | Authors |
|---|---|---|
| 16:00–16:20 | Multi-Label Text Classification of Derived Text Formats with DistilBERT | Jennifer Ecker, Roman Schneider |
| 16:20–16:40 | Training data generation for context-dependent rubric-based short answer grading | Pavel Šindelář, Filip Prášil, Dávid Slivka, Christopher Bouma, Ondrej Bojar |
| 16:40–17:00 | DUO_DE A1: An Annotated Corpus of Online Learning Material for Beginning Learners of German as a Foreign Language | Jammila Laâguidi, Vitaliia Ruban, Ronja Laarmann-Quante, Anastasia Drackert |
| 17:00–17:20 | Why Reconstructing Scrambled Texts Fails | Keli Du, Christof Schöch |
| 17:20–17:40 | DIN 19461: A National Standard for Derived Text Formats | Thorsten Trippel, Florian Barth, Jose Calvo Tello, Keli Du, Philippe Genêt, Daniel Kurzawe, Peter Leinen, Piroska Lendvai, Christof Schöch, Andreas Witt, Arden Zimmermann |
| 17:40–18:00 | Final discussion and closing | — |
Call for Papers
The workshop Leveraging Derived Text Formats to Unlock Copyrighted Collections for Open Science will be held at the Language Resources and Evaluation Conference (LREC 2026).
Derived Text Formats (DTF), also known as extracted features, offer a promising solution for enabling research on textual data that cannot be shared in its original form due to copyright or privacy restrictions. This workshop brings together researchers, legal experts, and infrastructure providers to explore the creation, standardization, legal framing, and scientific use of derived data in linguistics, digital humanities, and language technology.
We invite contributions from the community that address practical experiences, challenges, and solutions related to:
- The creation and processing of DTF
- Legal and ethical considerations in publishing derived data
- Use cases from digital humanities, linguistic research, corpus linguistics, or NLP
- Infrastructure and tools supporting DTF flows
- Standardization efforts (e.g., TEI, SynAF, MAF, ISO standards)
The workshop will be held as a hybrid event. The exact workshop date will be communicated in due time.
Submission Format
Submissions should be 4 to 8 pages in length (excluding references and potential Ethics Statements). Submissions should follow the LREC stylesheet, available on the conference website on the Author’s kit page. Submissions will be reviewed by the workshop organizers and the programme committee.
Important Dates
- Extended Submission Deadline: 2 March 2026 (AoE)
- Reviewing period: 21 February 2026 – 10 March 2026
- Notification of Acceptance: 11 March 2026
- Camera Ready paper submission Deadline: 30 March 2026
- Workshop Date: 11, 12 or 16 May, 2026
Submission
Submissions will be handled via the conference submission system Softconf.
When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones)
Workshop Organisers
- Florian Barth, Göttingen State and University Library
- Keli Du, University of Trier
- José Calvo Tello, Göttingen State and University Library
- Philippe Genêt, German National Library
- Piroska Lendvai, Bavarian Academy of Sciences and Humanities
- Christof Schöch University of Trier
- Thorsten Trippel, University of Tübingen and Leibniz-Institut für Deutsche Sprache
Contact
For questions, please contact: dtf-at-lrec2026@googlegroups.com
Veranstaltungsseitezuletzt aktualisiert: 07.05.2026