LREC Workshop: Leveraging Derived Text Formats to Unlock Copyrighted Collections for Open Science

Palma de Mallorca
12.
Mai 2026

PROGRAM

Tuesday, May 12, 2026

Session 1: Overview

14:00–15:30 · Room 9 · Chair: Philippe Genêt (Deutsche Nationalbibliothek)

TimeTitleAuthors
14:00–14:10Welcome and Introduction
14:10–14:30Derived Text Formats as Strategic Transformations of In-Copyright Materials to Support Open Science: A SurveyChristof Schöch
14:30–14:50A Multi-dimensional Constrained Framework for Derived Text FormatsKeli Du, Christof Schöch
14:50–15:10Legal implications of Derived Text Formats – a copyright perspectiveGianna Iacino, Pawel Kamocki, Keli Du
15:10–15:30Revisiting Masking After Fifteen Years: Early Approaches to Non-Reconstructable Linguistic Data in the current contextGeorg Rehm, Thorsten Trippel, Andreas Witt
15:30–16:00Break

Session 2: Applications

16:00–18:00 · Room 9 · Chair: Piroska Lendvai (Bavarian Academy of Sciences and Humanities)

TimeTitleAuthors
16:00–16:20Multi-Label Text Classification of Derived Text Formats with DistilBERTJennifer Ecker, Roman Schneider
16:20–16:40Training data generation for context-dependent rubric-based short answer gradingPavel Šindelář, Filip Prášil, Dávid Slivka, Christopher Bouma, Ondrej Bojar
16:40–17:00DUO_DE A1: An Annotated Corpus of Online Learning Material for Beginning Learners of German as a Foreign LanguageJammila Laâguidi, Vitaliia Ruban, Ronja Laarmann-Quante, Anastasia Drackert
17:00–17:20Why Reconstructing Scrambled Texts FailsKeli Du, Christof Schöch
17:20–17:40DIN 19461: A National Standard for Derived Text FormatsThorsten Trippel, Florian Barth, Jose Calvo Tello, Keli Du, Philippe Genêt, Daniel Kurzawe, Peter Leinen, Piroska Lendvai, Christof Schöch, Andreas Witt, Arden Zimmermann
17:40–18:00Final discussion and closing

Call for Papers

The workshop Leveraging Derived Text Formats to Unlock Copyrighted Collections for Open Science will be held at the Language Resources and Evaluation Conference (LREC 2026).

Derived Text Formats (DTF), also known as extracted features, offer a promising solution for enabling research on textual data that cannot be shared in its original form due to copyright or privacy restrictions. This workshop brings together researchers, legal experts, and infrastructure providers to explore the creation, standardization, legal framing, and scientific use of derived data in linguistics, digital humanities, and language technology.

We invite contributions from the community that address practical experiences, challenges, and solutions related to:

  • The creation and processing of DTF
  • Legal and ethical considerations in publishing derived data
  • Use cases from digital humanities, linguistic research, corpus linguistics, or NLP
  • Infrastructure and tools supporting DTF flows
  • Standardization efforts (e.g., TEI, SynAF, MAF, ISO standards)

The workshop will be held as a hybrid event. The exact workshop date will be communicated in due time.

Submission Format

Submissions should be 4 to 8 pages in length (excluding references and potential Ethics Statements). Submissions should follow the LREC stylesheet, available on the conference website on the Author’s kit page. Submissions will be reviewed by the workshop organizers and the programme committee.

Important Dates

  • Extended Submission Deadline: 2 March 2026 (AoE)
  • Reviewing period: 21 February 2026 – 10 March 2026
  • Notification of Acceptance: 11 March 2026
  • Camera Ready paper submission Deadline: 30 March 2026
  • Workshop Date: 11, 12 or 16 May, 2026

Submission

Submissions will be handled via the conference submission system Softconf.

When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones)

Workshop Organisers

  • Florian Barth, Göttingen State and University Library
  • Keli Du, University of Trier
  • José Calvo Tello, Göttingen State and University Library
  • Philippe Genêt, German National Library
  • Piroska Lendvai, Bavarian Academy of Sciences and Humanities
  • Christof Schöch University of Trier
  • Thorsten Trippel, University of Tübingen and Leibniz-Institut für Deutsche Sprache

Contact

For questions, please contact: dtf-at-lrec2026@googlegroups.com

Veranstaltungsseite

zuletzt aktualisiert: 07.05.2026