AI Training and Pirated Books: Authors Speak Out

Artificial intelligence (AI) is surging at a breathtaking pace, but not without its share of controversies. One of the latest involves the Books3 system, a database several tech giants use to train their AI systems. However, a significant oversight in this endeavour is causing outrage across the literary world: the authors of the nearly 200,000 books included in this dataset were never informed that their works were being used in such a manner.

The Books3 system, based on a collection of pirated e-books spanning various genres, was brought to light by an investigation conducted by The Atlantic. The advantage of using books over internet articles is clear: high-quality AI requires high-quality text to assimilate language. This has led to multiple lawsuits against companies like Meta, which have been training their AI on the Books3 database.

Mary H. K. Choi, whose debut novel “Emergency Contact” is part of this database, voiced her distress on social media, stating how using her profoundly personal story without consent feels like a violation. This sentiment was echoed by Min Jin Lee, who labeled the practice “theft.” With over 200 of her books being used, Nora Roberts also expressed her dissatisfaction, emphasizing the exploitation of writers by tech companies seeking a quick and inexpensive way to develop AI.

However, not all authors share this sentiment. James Chappel, for instance, expressed indifference to his book being included, stating that he wanted it to educate.

The conversation surrounding Books3 has further intensified, with the Writers Guild of America even going on strike to address concerns regarding the utilization of AI in creative processes. This isn’t the first time the artistic community has faced such problems. Visual artists, too, have found their work being employed to train AI systems without their consent.

Amidst these developments, US President Joe Biden plans to introduce an executive order on AI, emphasizing responsible innovation. Yet, the situation has left many authors feeling disillusioned. Mary H. K. Choi captured this sentiment poignantly, stating the challenges faced by writers in the realm of AI felt “insultingly inconsequential” yet “absolutely inevitable.”

As AI continues its rapid development and integration into various sectors, the ethical considerations around its use grow increasingly complex. The Books3 controversy underscores the need for a more thoughtful and inclusive approach to technology, ensuring that the rights and sentiments of creators aren’t sidelined in the rush for innovation.