There’s a famous moment in the history of copyright, Colm Tóibín is telling me down the line from America. “St Colmcille copied a manuscript and everybody said, ‘that’s great, thank you’, but he said, ‘no, no, I’m taking my copy away’. And there was this enormous conference and a big monk had to make the decision ‘who owns the copy?’, and the monk said, ‘to the cow its calf,’ meaning the monastery that owns the original owns the copy.”
One wonders what that 6th century monk would make of today’s copyright mess involving artificial intelligence (AI) and pirated eBooks.
Tóibín is one of many Irish writers whose work has been reported to have been used without permission to train AI. Late last month the Atlantic reported that a data set of 191,000 pirated eBooks had been used without permission to train generative-AI systems by Meta, Bloomberg and others. The data set was known as Books3 and has resulted in several active court cases against Meta, some of which are being brought by high-profile authors like Sarah Silverman and Michael Chabon.
Alex Reisner, The Atlantic journalist who discovered the data set, is also a computer programmer, and he developed a search tool which enabled authors to easily discover if their work was in the data set. Irish authors like Marian Keyes, Sally Rooney, Anne Enright and Cecelia Ahern are all in there, as well as poets like Vona Groarke.
Fostering at Christmas: ‘We once had two boys, age 9 and 11, who had never had a Christmas tree’
Fintan O’Toole: ‘My grandad is dead. I am going to tennis today’: Christmas letters to my son, 1997
After the fall of Assad, a family reunites
Christmas TV and movie guide: the best shows and films to watch
“What I fear is not only is [my work] being used to teach the machine how to write better,” says Groarke, “but I strongly suspect it’s being used to teach the algorithm to feel better,” Groarke said. “I think that’s why they’re interested in poetry. The article in The Atlantic says it’s about teaching [AI] to write better but I’m not altogether sure you would use books of poetry for that. Perhaps it is being taught to be more empathetic, to be more sensitive, to use images, to be more creative. I think my voice is being appropriated, my life, in a strange way, I fear, is also being appropriated, my sensitivity, my sensibility.”
[ Artificial intelligence could pose ‘major threat’ to college qualificationsOpens in new window ]
Liz Nugent, author of five bestselling novels, including her latest Strange Sally Diamond, is also in the data set. “The only thing I’m sure of is that it’s bad news for writers,” she said. “As far as I can tell anyone can access those files and write a book ‘in the style of Liz Nugent’. It’s theft, but worse because it’s not just theft of a book, but of my personal skills as a writer, honed over many years of writing and life experience.”
Joanna Walsh is a multidisciplinary writer and author of 11 books, some of which are co-written with AI that she coded herself. She won the Markievicz Award in 2021 for her project Miss-Communication.ie, an AI-generative work funded by the Arts Council. “I’m well aware of [AI’s] problems,’ Walsh says. ‘These problems are human, not technological, and they’re to do with prejudice and exploitation. Most AI scrapes data indiscriminately from the net, reproducing the dominant voices found there, which, as we all know, are often unsavoury.”
Two of her books were included in the data set. How does she feel about her work being used for AI learning? “I’m happy for my writing to be used to train AI. I want my voice there – the problem is the uses it’s put to and the profits made from it. I don’t want my AI created by Meta. Given its positive uses in science and medicine at the very least, AI should be an international collaborative state project for the benefit of everyone.”
it is conceivable that AI-generated novels could potentially be incredibly lucrative for publishing groups. Hachette, one of the “big five” publishing conglomerates, provided a statement saying: “Our industry relies on the creative talent of humans and would not exist without it. The value of our partnerships with our creative contributors cannot be overstated. We draw a distinction between ‘operational’ uses – ie uses that help us fulfil our mission and make it easy for more people to access our books and other products – and ‘creative’ uses – ie uses that harness AI to replace the creative work of a human author, designer, illustrator or translator. For this reason we are opposed to ‘machine creativity’ to protect original creative content produced by humans.”
Novelist and playwright Belinda McKeon is another author whose work is included in the data set. She is also the co-ordinator of the Creative Writing MA at Maynooth University.
“It’s probably the tip of the iceberg, the data-scraping that we do know about,” she says. “I think it’s probably wisest to assume that this is going on in the background all the time, and anything you write and put into some sort of public forum is vulnerable to theft in some way. To take a stronger line on it, it is theft.
“The reason I would feel strongly about that aspect of it is that these tech companies are massive, run by right-wing plutocrats – they’ve gutted the independent arts scenes, made it really impossible to be an independent artist, to live as an independent artist in many cities, so there is something just sort of predictably vile about novels being stolen essentially to train a system that we don’t really know yet the extent to which it could wreak havoc on the planet and the world.”
Does she see it as a threat to human creativity?
“Doing research for creative writing is not about getting the right answer, it’s all the little unexpected things that you stumble upon along the way. It’s the serendipity of the unexpected detail. Finding ways to think and see things completely differently but ways that were available to you all along. I don’t think artificial intelligence can ever come close to that.”
At the moment AI-generated text is still usually relatively easy to spot. As an educator McKeon says “it’s instantly recognisable for what it is, which is this sort of smooth-brained politeness”. But as AI learns from the work of some of the best writers in the world and becomes more sophisticated in its ability to mimic human creativity, will it always be so easy to tell?
“One day with nothing to do I did ChatGPT,” Colm Tóibín says. “I put in something like, ‘write the first page of a novel by me’. I did it to see how ridiculous it would be, but it wasn’t. I mean it wasn’t perfect and it wasn’t right in some way, there was something funny about it, but there were interesting images in it that I have never used and maybe I wouldn’t use but it didn’t seem wrong and I was surprised by it.’
Author Paul Murray’s is currently shortlisted for the Booker Prize with his novel The Bee Sting, and his debut novel Skippy Dies was found in the data set. He is unambiguous in his conviction that AI is a disaster for creativity and creative industries.
“One thing we need to be clear about is that ‘artificial intelligence’ is a misnomer,’ he said. “Even words like ‘trained’ and ‘learned’ are misleading. There’s no consciousness at work here. Large language models are just reproducing patterns. So it’s very hard, on the one hand, to imagine ChatGPT producing a novel that was anything more than a series of cliches, and, on the other, to imagine anyone wanting to read a book or look at a piece of art produced by a machine that didn’t even know what it was doing.
“But this is just getting started. My fear would be that in the long term AI will do for art what Facebook did for friendship – that the world will be so flooded with fakes that we stop being able to tell what real art even looks like, we forget what the meaning of the original concept even was. We won’t even look to literature as a means of self-expression or for consolation or as a way of making sense of or celebrating or mourning the crazy flux of reality. It’ll just be another app on our phone, producing a few seconds of distraction.”