The EWC’s AI toolkit for the book sector

Recommendations by the European Writers’ Council (EWC) for writers and translators, publishers, booksellers, event organisers and further stakeholders of the book sector for bilateral and contractual agreements and technical requirements including opt-out.

This Tool Kit can be used by political decision-makers as well as GenAI providers who need to install transparent usage documentation and remuneration obligations.

This AI Tool Kit is neither binding nor intended to replace or prescribe national agreements.

The European Writers’ Council (EWC) is the world’s only and largest representation of solely writers in the book sector and of all genres (fiction, non-fiction, academic, children’s book, poetry, etc.). With 50 organisations and professional guilds from 32 countries of the EU, the EEA, and the non-EU areas, the EWC represents 220,000 writers and translators. These individuals write and publish in 35 languages, also worldwide.

Writers, illustrators, cover designers, translators, audio book narrators as performers, as well as publishers, editors, and Collective Management Organisations (CMOs, RROs) are directly and immediately affected by the consequences of so-called AI in the book sector and in particular by the production and use of generative informatics. The EWC together with its expert task force has developed the following recommendations, to help to establish a set of fair practices within the book sector and between writers, their possible agents, translators, publishers, and, where applicable, book sellers or event organisers, as well as (generative) AI developers.

In this toolkit, you will find:

A definition framework of “AI”
10 recommendations for a fair practice-based relation
Specific handouts and proposals grounding the recommendations

Note: The recommendations are aimed at territories where the Directive 2019/790 (EU) on Copyright in the Digital Single Market (CDSM Directive) applies and has accordingly introduced the TDM Art. 4 exception regime; and they also apply to translations published in the EU Member States. All other points are also applicable internationally.

Please note that the interpretation of whether text and data mining (TDM) laid down in the CDSM Directive Art. 4 covers the further use of works for the development of generative AI remains legally controversial. The AI Act (of 21 May 2024) refers to the CDSM Directive in its recitals but does not formally clarify this aspect either. We expect corresponding litigation, and/or clarification by the Commission during the evaluation of the CDSM Directive in the course of 2025-2027. The EWC position is to declare the TDM reservation of rights in any case (‘opt out’) to avoid usage for generative AI (GAI) but, at the same time, we see the use for the development of generative AI as a very probable new exclusive right. In this sense, we are using the opt-out as a protective shield until the legal situation that TDM does not mean GAI development has been formally clarified.

PART I

NOT ALL AI IS AI:

A DEFINITION FRAMEWORK ON WHICH KIND OF APPLICATIONS THIS TOOL KIT ADDRESSES

Already, numerous wrongful and damaging “AI business models” have developed in the book sector – with fake authors, fake books and also fake readers. It can be assumed that the fundaments for large language models (LLM) such as GPT, Meta, StableLM or BERT have been generated from copyrighted book works whose sources are also shadow libraries such as Library Genesis (LibGen), Z-Library (Bok), Sci-Hub and Bibliotik – piracy websites.

Without legal regulation, generative technologies accelerate and enable the expansion of exploitation, legitimisation of copyright infringement, information and communication distortion, royalty fraud and collective licensing remuneration fraud.

At the same time, a close look and assessment is needed to categorise and regulate the individual aspects of advanced informatics; because not all smart software is “AI”, not every application is equally risky.

For a start, the EWC classifies the following three systems:

Assistive Informatics and Software – not considered as AI or a risk;
Analysing Informatics – partly considered as AI and a potential risk;
Generative Informatics – the AI category considered as risk, and related to text, voice, image works: generative artificial intelligence, in short “GAI” or GenAI.

In this paper, we focus on the legal, administrative, and technical aspects of so-called “generative AI” (GAI) and related practices.

We define the legal situation regarding input (contractual and technical routines) and output (labeling issues and transparency requirements).
The paper examines automated text robotics (example: GPT), automated translation machines (example: DeepL), generative image production (example: Midjourney), synthetic cloning of human voices or otherwise AI generated voices.

This follows in part with a view to the AI Act, as well as with regard to the EU and non-European legal framework on text and data mining or intellectual property issues that still require clarification: such as the definition of “machine learning”, what we consider more an “algorithm programming”, and which includes reproduction processes in preparation for GAI developing such as scraping, temporal conversion of .pdf, .mobi or .epub files into .xml; continuous copying to create a corpus (corpora) of words, the deposition of source files for reproducibility or verifiability purposes; further: copying, storing and contextual breaking up and reproduction of individual expression within artificial large language models, image diffusion and synthetical voice cloning, as well as aspects of proximity and style-imitation.

Please note: Analytical or assistive technology and software, such as semantic and proofreading analysis (example: Word Editor, Grammarly), image refinement (Photoshop), database management, filing, converting, citation indexes, storyboard software, text summarisation for metadata generation, sound mixing studio editing or automated inventory processes including CAT tools used by translators are not covered by this tool kit.

PART II

10 RECOMMENDATIONS FOR A FAIR AND PRACTICE-BASED RELATION

Authorisation of exploitation for Text and Data Mining or GenAI-related algorithm coding: Informed consent and written permission by authors, visual artists and audio book performers are the basis for respecting their intellectual property rights for any use of their text, visual or audio works for (a) TDM, for (b) scraping, and (c) any related steps within algorithm programming to develop generative AI. Therefore, contractual and communication routines for TDM opt-outs, as well as possible individual or collective licences shall be obtained as such.Authors who want to prevent their already published or soon to be published works from TDM scraping and usage for GenAI, shall request their publishers in written form to declare an opt-out.
Remuneration: Authors, artists, and performers who permit their works or performances informed and unforced to be exploited in text and data mining or programming of generative AI shall be remunerated appropriately and proportionately in line with time- and scope limited licensing models. Licensing models should include transparency on the purpose and how the works will be used as well as clear reporting obligations over these uses for adequate and recurring remuneration. Whether this is administered through individual, or collective licences via CMOs and RROs is a question of national legislation and negotiations among all national stakeholders.
Transparency on Input: Scrapers and crawlers, as well as corpora builders and ultimately AI developers, are required through the AI Act to provide sufficiently detailed summary about legally accessed titles, authors, sources, acquisition methods of protected works incl. their IP (meta and other relevant) data.For this purpose, the book sector could develop and implement suitable and harmonised standards for e.g., Meta Data, ONIX, or the International Standard Content Code (ISCC), Digital Object Identifier (DOI), ISBN (International Standard Book Number) or with entities such as W3C or Creators’ Credentials to facilitate the tracking of works incl. online and other sources.
Transparency on Output: Every automated text including machine translation, AI generated visual product, as well as synthetic audio product published should be labelled as AI-generated Upon considerations, 100 % human works can also be labelled, following the concept of “Trusted Shops” models.
Clear communication and respecting the moral rights of authors on the integrity of their work. Publishers and other contractual counterparts shall seek author’s approval before using generative AI in relation to their works and establish a mutual understanding on the utilisation of different kinds of software and advanced informatics, such as, but not limited to synthetic voices in audio books, machine translation, generative cover, and any other adaptation of the work by generative AI. Authors should have the right to choose to use human work and refuse AI covers, AI audiobook-adaption or AI translations of their work without being disadvantaged and without negative consequences, such as a lower Publishers should also be able to be sure that they know whether a work has generated AI components.
Respecting the writers’, translators’, or artists’ and performers’ own choice of working: Authors, translators, performers, or illustrators should not be forced to use any generative AI or to work from AI-generated text incl. machine translation or GAI images.
A clear information and sublicensing chain: Publishers need to declare to third parties including platforms, aggregators, libraries, or trade distributors, whether they have the right(s) to sublicense or to reproduce and/or otherwise use the work in any manner for purposes of text and data mining or for programming of generating technologies (GAI), or not. This rights reservation protocols (“TDM / GAI Opt-out”) should be communicated for each title file, e.g., in meta data or other digital rights management procedures, e.g., but not limited, the ISCC or TDM Reservation Protocol (TDMRep).
When applicable, personal websites of authors, artists, and performers could, and official websites of publishers and retailers should declare the TDM reservation right under Art 4, CDSM Directive 2019/790 and make clear to not grant permission for TDM, scraping and usage for machine learning or algorithm programming of generative AI. This is possible and complies with requirements when stated in the general terms and conditions (T&C) or the imprint, but especially in a machine-readable way, for example in the robots.txt of a website URL or via the TDM Reservation Protocol (TDMRep). Publishers are required to raise a text and data mining opt-out flag on their company websites for each title or all portfolio, as well as retailers and book sellers on online websites.Beyond websites, i.e. for the work itself, other technical and machine-readable standards can be used, such as the ISCC, or meta data within the work.
Check the T&C of the software within a publishing or aggregators’ company as well as agents, editors or translators, and make sure, that their software does not claim within its T&C to be allowed to scrape, use, copy, store the content(s) for developing, improving or enhancing of AI incl. generative AI; this is also to be applied to platforms, social media and portals, if video recordings from reading events or panel discussions are published.
Everyone should be aware of their own ethical responsibilities. As it is well known, most larger GAI systems in existence today are allegedly built on copyright infringement. The works and investments of authors and publishers have been used without knowledge, authorisation, remuneration or transparency, partly through piracy sites and well before the non-retroactive TDM exceptions of the CDSM Directive 2019/790 came into force. A concerted effort to denounce and monetarily compensate for this damage, as well as to push for the systems to be shut down, if necessary, should be an aspiration of the sector to secure its future. The repeated misinterpretation that machine “learning” is the same as human reading and therefore a “right” is wrong. To assess the damages, together with CMOs or responsible ministries, but also class action, if necessary, as well as pushing for European-wide and internationally applicable regulations that specifically hold AI developers accountable, are necessary. Working together for an “AI and IP traffic code” and regulations for a fair, ethical, and regulated future with (G)AI protects the power of innovation and prevents the disruption of culture and its human creativity.

We are at the beginning of a continued discussion in the book sector that will shape future generations. May these recommendations by the EWC serve as an initial impetus.

PART III

GROUNDS FOR THE 10 RECOMMENDATIONS

On Contractual Matters

Overview of the three legal grounds for the recommendations:

1.1. Author’s intellectual property rights (copyright, authors’ right) related to commercial text and data mining (TDM), and on the new hitherto forms of exploitation: scraping, copying, storing, other forms for machine “learning” aka algorithm programming for generative AI.

1.2 Author’s moral rights: integrity of the work incl. translation, cover-art, or audio narrating

1.3 Practical agreements on labelling and assessment of remuneration entitlement1.2 1.1. Authors’ intellectual property rights: (I) Right to opt-out from TDM and (II) Reservation rights for all processes involved in algorithm programming to develop (G)AI :

Every author, artist and audio performer have the right to decide how their work is published, distributed, copied, and used, according to the Berne Convention Art. 9.1, Art 9.2, Art 9.3, unless national or transnational laws impose restrictions in form of exceptions, limitations, or other binding agreements. This includes the decision whether the work may be copied and used (a) for TDM for non- and for commercial general purposes, as well as for the practices of (b) scraping, copying, and storing, and (c) machine programming (“training”) for general purpose and to develop generative informatics (text, image, voice) and GAI. The uses in (b) and (c) are from EWC’s perspective new types of uses not previously within the remit of author contracts or legislative exceptions. Exploitation of the works needs authors’ written consent.

In the EU, the exceptions of Art. 3 (non-commercial TDM) and Art. 4 (commercial TDM) Directive 2019/790 on Copyright in the Single Market, came into force on 7 June 2021. Art 4. allows text and data mining for commercial purposes; authors and publishers only have the right to declare a rights reservation in machine-readable or otherwise sufficient manner, the so-called “TDM opt out” or “TDM rights reservation”, to object to this non remunerated exploitation.

There are still many unresolved issues to be clarified, such as: how to deal with existing works before 2021, do they need a machine-readable opt-out, how is their demonstrated use by AI companies to be proven, tracked and, if necessary, remunerated? Also, as long as it is not clarified under current EU law by the e.g. Commission or the Courts whether algorithm programming of economic substitutions of authors’ works (GenAI) is at all covered by the exception of Art. 4 CDSM Directive 2019/790 and its national implementations, the contractually declared or otherwise agreed opt-out to TDM for commercial purposes by the author (writer, translator, illustrator, performer) is the parachute with which authors and their publishers can reserve the rights until clarification is provided. Also, an opt-out entitles to licensing if the author so wishes, which can be carried out by publishers or CMOs.

An addendum about TDM rights reservation is recommended for existing contracts. However, this requires a high administrative and personnel effort to amend old contracts (up to 13.8-20 million in Europe alone). Here, publishers and authors need to discuss on procedures for works already on the market, and whether, for example, publishers will always opt-out and only not if an author expressly authorises TDM under Art. 4. In parallel, authors should request their publishers in written form to declare the opt-out on published, and on their upcoming works.

New contracts including foreign rights / translation agreements shall include a clause or another form of written confirmation to this effect of coordinating the opt-out. Examples:

The right to commercial text and data mining is not transferred with this contract (TDM reservation right under Art. 4, CDSM Directive 2019/790 (EU)). The author requests that the publisher informs them about this TDM opt-out declaration in a binding way.
The right for text and data mining, scraping, copying and storage, as well as algorithm programming for general purposes including but not limited to the purpose of producing any generative AI, is explicitly reserved by the author.The author requests that the publisher informs them about this TDM opt-out declaration in a binding way.
The publisher will apply all necessary measures to communicate the commercial TDM rights reservation, in an appropriate and effective manner, including but not limited, within the meta data and ONIX, with machine-readable indications on websites or the imprint, and ensures that licensees or retailers also communicate any associated reservations of rights.

With this paper trace, publishers can act in a compliant manner and label to be published books (e-books, digitised audio books as well as printed books) with this TDM reservation right in compliance with the current Art. 4 CDSM Directive 2019/790 requirements (Meta Data and ONIX, Imprint, ISCC identifier, or via TDMRep).

To complete the picture about TDM rights reservation, please note the following:

Translators also must comply with a reservation of rights to text and data mining or to the programming of generative AIs, and should not upload texts to non-secure machine translation software without the knowledge or consent of the author or rights holder;
Booksellers or other intermediaries such as librarians should also not feed texts or files into a GPT, e.g., to have summaries generated, as this would be contrary to opt-out;
Libraries who are giving access to digitised works under the Ulmer-TU Darmstadt case (C‑117/13, 2014) are not entitled to sublicence or giving access for commercial TDM;
Within a publishing system, it must be clearly communicated whether and how a publisher enters texts into systems, e.g., to extract keywords, summaries, or other knowledge from them, as this could also be ruled out by the authors’ reservation of rights. Here, all parties involved must exchange information openly and transparently and come to an agreement and mutual understanding on the publishing workflow;
During the implementation of the EUIPO Portal for out of commerce works, entitled entities who digitise out of commerce works are not entitled to give access to works for commercial TDM or algorithm coding (GenAI development). Entitled entities shall secure that a machine-readable opt-out is applied.

1.2 Authors’ moral rights

Moral rights of authors include the respect of the integrity of the work – it is called the “integrity right”. In practice, this has been relevant for decades, for example, when an author checks the proofs and approves them before they go to print or gives their written consent to an abridged version for audiobooks or digest editions. Also, authors did not, ever, imagine their works and individual expressions scattered into millions of pieces to help the structuring of GAI systems. It is a violation of their right of integrity as facing an unexpected digital use of their work for another purpose than its original aim to convey art.

In the field of GAI, the author’s moral right on the integrity of their work is extended to:

Audio editions
Translations
Equipment, especially covers and illustrations
Transforming and presenting their work.

The moral right shall also include the right to attribution (i.e., the right to have the authors’ name on the work and no other). GenAI software shall get no right to appear in the imprint like a human author.

Agreements on the audio book

Authors and publishers should agree that the writer has the right to refuse licensed audiobook editions of their works using artificial and/or synthetic voices. In principle, the writer must give their permission in in a written form and must not fear any disadvantages if they reject GAI voices or insist on human narrators, if a licensed audio book is made.Some publishers and audiobook producers may have different views, for example on economic considerations. It is to be hoped that the common interest in the appreciation of human work and cultural skills will continue to prevail.
Attention: the requirements of the European Accessibility Act (EAA, Directive 2019/882), coming into force 2025, allow a text-to-speech adaption of an e-book, which will be carried out by GAI voices on devices. This presumably cannot be contractually excluded, but can be defined specifically, such as “AI voice output is only permitted in the context of legally authorised text-to-speech under Directive 2019/882. Any further use of the text set to audio in this way for TDM or algorithm coding is not permitted.” Politically, it would be desirable to debate with legislators that a text-to-speech AI edition of an e-book set to a synthetic voice shall not compete with the audiobook edition by a human narrator.
If authors record a book themselves, it must be stipulated in a separate contractual agreement that without the consent to recording for voice cloning, this is not permitted.
Accordingly, the audiobook publisher should respond to the author’s wishes to opt out of audio TDM or the use of the narrator’s performance for synthetic voice replication.

Agreements on translations in case of transfer of foreign rights

To preserve the integrity of the work in translation as well as to reserve the right of use for TDM and of any GAI developing or enhancing, an author has the right to refuse to have their work translated in whole or in part by machine translation, and to exercise their right of approval over machine translation incl. to reject a pre-MT and post-editing by a translator. This is particularly important if the author wants to exercise their legitimate right not to have their texts used for GAI developing, which already happens when manuscripts are fed into machine translation software. Unfortunately, authors may experience that the refusal of machine translation might lead to the effect that a publisher decides to not pursue a translation of the work. We hope that the publishing industry will remain true to its USP and the value of human work.
At the same time, translators also have copyright and moral rights, and can refuse to post-edit a pre-machine-translated text. Here, agreements and clear communication in the author-publisher-translator triangle is important, especially about responsibility for the result of the translation and the labelling requirements. Unfortunately, authors may experience that the refusal of GenAI by author and/or translator might lead to the effect that a publisher decides to not pursue a particular exploitation of a work.
Translators also have the right to reserve their right for TDM and exploitation for developing (G)AI as far as their language version is concerned. Both author and translator should, at best, agree that the final work may not be used for TDM, scraping-copying-storage, or machine programming and any production of generative informatics, and request within the contract or other binding written form the publisher to declare the opt-out.The foreign publishing house shall apply the necessary measures to declare the machine-readable rights reservation, and to secure, that all sublicensees and further distributors are informed and respect and apply the rights reservation protocol.
Both author and translator should not be afraid to reject the use of machine translation.

Agreements on cover art and other design (graphics, illustrations, pictures)

Publishers should not, without the author’s written consent, use covers, graphics, other design, or illustrations that has been created exclusively or to a significant extent by generative image reproduction (GAI, text to image) to equip their work.
Neither cover designers, illustrators, or other visual artists nor authors should fear any disadvantages, e.g. lower royalty, if they reject the use of GAI.
In general, parties must be transparent over their use of the extent to which advanced informatics is used as an assistive tool, such as image enhancement of human-created artwork, or automatically generated ALT text for accessible e-book formats.When producing ALT-text, publishers should ensure that the images they load into AI description software are not used by the software developer, e.g. for GAI training.
In principle, corresponding clauses can already be included in new contracts.

Respecting the writers’, translators’, or visual artists’ own choice of working

Writers, translators, visual artists, and performers should not be required or forced to use any generative AI or to work from AI-generated text or GAI images against their wish.

1.3 Self-Declarations by authors and publishers’ requirements of labelling

GAI products are not human creations and, therefore, not protected by intellectual property rights. Accordingly, they do not acquire a claim to remuneration and, strictly speaking, cannot be licensed, or transferred for rights exploitation – and can be copied and used by anyone again.

This also applies to authors who offer a manuscript; if it is GenAI-produced, they do not have the right to grant licences of exploitation or to receive remuneration. Accordingly, agents and editors need to know whether the author has used GenAI. In this way, the transparency chain can be built up to the reader and the public: Knowing whether a to be published or already published work – book, audio book or visual, for example – has been made by humans completely or generated by a software will be relevant, when it comes e.g. to remuneration from collective management organisations (copies in copy shops, print book loan via PLR, equipment levies, but also performers’ rights), on remuneration splits and royalties, or also to the labelling obligations within the AI Act. Institutions that award prizes or scholarships must also be sure that they continue to honour human achievement. Furthermore, machine outputs must not benefit from reduced VAT, fixed book prices or other subsidies dedicated to cultural human works.

Whether the writer or translator must provide a corresponding self-declaration: in principle, the usual clause in today’s contracts stating that the writer or translator is the (sole) originator of the work in accordance with national copyright law about the level of creation (= 100% own human work) is appropriate enough. However, publishers, out of legit concern about e.g., copyright infringement, plagiarism, also their duty to label AI products, and to ensure to have the rights to sublicence, may like to have a self-declaration by the author of (non-)generative AI use.

Labelling of the published work, either full AI-produced or partly, is necessary for all liability reasons; firstly, to clearly determine which authors and other rightsholders should be remunerated. Secondly, which intellectual property rights are engaged, and third, for any allocations of liability (infringement, plagiat, violation of personal rights, disinformation). In this context, it is equally important to equip AI products with human-readable labels. On the one hand, to allow the reader to make a fully informed decision about what they spend their money on. On the other hand, so that the privileges enjoyed by books as a cultural asset in many member states, such as reduced VAT exemption, publishing subsidies or grants and prizes, are not applied to machine products. Likewise, book aggregators and distributors are insisting that AI products be labelled.

However, it is to be expected that there will be different views in the publishing industry on the duty of labelling of all generative technologies used.

The EWC follows the principle that only full, reliable transparency and trust in human endeavour will make our sector future-proof and trustworthy for readers.

Accordingly, bilateral contractual and communication practices between authors and publishers must be established:

In existing contracts, the author already states that they have created the work in full of their own creative powers, in accordance with the respective national provisions on authors rights and copyright and the needed level of protected creation, e.g., Germany: §2(2) UrhG, “Schöpfungshöhe”, and have the full rights to transfer exploitation rights.In principle, this typical clause already excludes that generative AI is contained in the work.
For the future, publishers might prefer a more specific self-declaration by the author to not have used automatic generated text, image, or MT translation in the work to be published, or to indicate any usage of generative technologies. The issue of making distinctions on the amount of incorporated GAI is volatile and far from settled; in the U.S., for example, “subordinate extent of GAI” is considered to be a maximum of 5 % of the total work, to be still accepted as a copyright protected human work. Whether this is resolvable with percentages or specifying the assistive or analytical AI applications used will occupy associations and the book sector for some time.In general, there should be transparency from all sides and a mutual understanding.
Declarations of the usage of assisting or analysing software applications (for example, automatic citation index, automated synonym, or rephrasing suggestions, photoshop as a purely assistive software, word editor) or to have been mentally inspired by regarding an AI image or reading AI-generated “poems”, shall be no subject of declarations. The EWC is of the opinion that assistive technologies that are not creative and do not automatically or mechanically generate the work or parts of the work are exempt from the obligation to declare. In return, generatively produced “works” or parts of works must be indicated.

Fair play among colleagues: if you use ChatGPT, Bert or other generative text, image or speech robots, you may accidently infringe copyrights of your fellow authors and performers.

It is allegedly that the foundation models, on which large langue models are developed, has been built upon collections of over 4 Mio. copyright protected works. 194,000 titles are already identified from the corpora Pile and Books1, Books2 and Books3, and their source: bit torrent piracy sites. GAI software copy and ‘memorise’ word-chains and individual expressions from existing works, and often produces output with proximities to the originals or even word-by-word paragraphs. Everyone who uses these actual applications risks infringing the copyright of authors whose works have been reproduced in this process, especially as these applications do not provide traceability of the underlying works that have been used to generate the text delivered.

Content control software within large e-book distributors is examining each book before allowing the upload, and more and more anti-plagiarism software is used, with the goal to eliminate (a) AI products as well as (b) detect copyright infringement. It is recommended to not use any GAI for texts with the purpose of being published.

From B2B (“Business to Business”) to General Terms & Conditions of your Software – considerations on TDM rights reservation:

Some practical recommendations for reserving rights in the case of a CDSM Art. 4 TDM exception and a protocol for scraping and machine learning within your used software.

2.1. Everyone using software in the book sector: Is your software scraping you?

In 2023 and 2024, nearly all software manufacturers extended their T&Cs. This concerns text, software, image, management software, collaboration tools, online storage, cloud providers, mail providers, social media, etc. Now, the new T&Cs contain clauses that allow them to copy, store, reproduce and use the text, image and further content for the development, optimisation, or developing of AI, including GAI. This is not allowed under EU copyright, data protection and privacy law, nevertheless, opting out is often made difficult to impossible. If the consent is denied, the full functionality of the software is restricted by the manufacturer. This is how Microsoft, Adobe, Apple, Google, Meta and others work with the trick: either you give us access, or you no longer have all services available. Accordingly, it is up to everyone, from authors to agents to editors and publishers, to check the software they use so as not to inadvertently open a loophole that allows access to works, work data and other sensible business information.

2.2. Website developers of publishers, book trade, and authors: Be clear in opt-out of TDM and the prohibition for scraping and using works to develop (generative) AI

TDM reservation in written format: The {your company or author name} expressly reserves the right to use its content for commercial text and data mining within the meaning of {your national legislation on TDM}. Similarly, we/I expressly reserve all rights to grant scraping and machine learning for purposes such as, but not limited to, generative AI development. To obtain a license to use this material, please contact {your email}.
à Please note, that it is very likely that crawlers will not “understand” human language, but only codes, cryptograms, or other machine-readable means, for instance:
Opt-out via robots.txt by manual coding: https://www.iubenda.com/en/help/137640-block-openai-crawlers
The W3C TDM rights reservation protocol within EPUB and pdf files of a book, and every URL where a book title is listed: https://www.w3.org/2022/tdmrep/
Opt-Out via ISCC (International Content Code identifier). The EWC recommends the use of the ISCC to the book sector. This code can be used for works in all formats (print, digital, audio, image), and carries all essential information such as rights reservations, work data, author details, etc. in an irreplaceable form. In this way, not only can opt-outs be effectively declared, but AI developers are also able to use the ISCC code to document when they use, or licence works and easily compile the title lists for proof of use. For illustrators, the combination with the Creators’ Credentials system is also a good way of proving the original (and human) provenance of an image.
Further information:
https://iscc.codes
https://iscc.io/
https://www.youtube.com/watch?v=S1vK8LMK0f4
https://docs.tdmai.org/

Please note: The EWC will organise a Webinar for its members on the ISCC Identifier for book works as well as routines for a robots-txt opt-out for websites in the course of 2024.

2.3. Publishers: Close the backdoor of sub-licensing or ignoring your TDM opt-out.

Publishers need to declare to third parties including platforms, aggregators (incl. libraries for e-lending) or distributors incl. print-on-demand services, that these have NOT the right(s) to sublicense or to reproduce and/or otherwise use the work in any manner for purposes of TDM under Art 4 (2019/790 (EU)) training artificial intelligence technologies to generate text, images, or voices. The TDMRep may be useful for this purpose. Awareness must also be raised within the book sector that booksellers, librarians, literary critics, translators, scouts, agents, reviewers, etc., should not enter protected texts into software systems such as ChatGPT or Llama to generate summaries, keywords, or other information, as this is already a further copying and storage process for the (further) development of generative AI, which the rightsholder has objected to.

2.4. Publishers: Make your meta data fit in the era of scraping and crawling.

Publishers have the duty to make the opt-out visible in manners of machine-readable meta data and ONIX, T&C on the website, with ISCC identifiers, TDMRep, or in the imprint – although the last might be in no means sufficient to signal crawlers the rights reservation but can only be workable when read by a human eye. Also, publishers must find a declaration protocol with online retailers and book trade portals, who have the duty to signal any robot, crawler, scraper, that the opt-out applies to the book work online – for example with the TDMRep which is recognised by the ONIX for books standard to communicate the opt out to online retailers. Therefore, a harmonised standard for bilateral publisher-bookseller information flow, the “Opt-out-Chain”, shall be established, as well as the interfaces of online bookshops updated.

Two sources to learn more about ISCC + Ai / TDM Rep and Onix + TDM Rep:

2.5. Event organisers: Take care if you stream or video record a reading, panel, or a lecture.

Whether it’s a reading, a symposium, a book fair discussion at a round table: events are often recorded on video and later posted on websites, YouTube channels or social media. YouTube started to sublicense videos to AI developers like Open AI, transcripted from speech to text and fed into GAI models. This is neither licensed nor remunerated and considerably copyright infringement. Find out at the earliest opportunity about the terms and conditions of the respective platform and whether your content will be reused for the development of any AI – i.e., also voice, likeness, or the content of the lecture. Establish in bilateral agreements with the artist and author whether they agree to recording and republishing. In turn, the author and artist should have the opportunity to opt out of the use of their likeness, voice, and context.

Related resources