Is there such thing as Fairly Trained AI model? Apparently, there is

Share:

The Authors Guild, alongside several prominent organisations, is backing an innovative generative AI company called Fairly Trained, which has helped develop the new AI model known as the Kelvin Legal Large Language Model (KL3M). Unlike many recent large language models (LLMs) that are trained with unlicensed data, KL3M is attempting to challenge the status quo established by industry giants like OpenAI, and it happens to be backed by the French government.

Read: Is it ‘fair use’ for OpenAI and AI firms to use copyrighted works?

Historically, the practicality of training LLMs without resorting to copyrighted materials has been contentious. OpenAI’s admission to the UK government in 2023, stating that avoiding copyrighted content in training leading AI models was “impossible,” has opened the door to numerous copyright infringement lawsuits. However, recent developments suggest a shift towards a more responsible and legally compliant approach to AI training.

Breaking the mould: the birth of KL3M

The LLM KL3M was created by the Chicago-based legal tech consultancy startup, 273 Ventures, using a carefully curated dataset exclusively comprising licensed content or public domain material. This contrasts with the prevailing practices within the AI industry, characterised by the widespread scraping of copyrighted content without permission.

Fairly Trained, founded in January 2024 by former Stability AI executive Ed Newton-Rex, provides certification to companies demonstrating a commitment to using legally compliant data sources. It claims to help to promote transparency and accountability in the AI landscape. Newton-Rex, motivated by the belief that AI can and should be developed responsibly, told WIRED, “There’s no fundamental reason why someone couldn’t train an LLM fairly.”

Certifying ethical AI

Law firms, among many other “risk-averse” clients, have been wary of the possible legal repercussions associated with traditional AI models, hence the need for a model like KL3M. According to Jillian Bommarito, co-founder of 273 Ventures, the initiative was born out of necessity to assure clients of the integrity of the data underpinning the AI outputs. This concern is well-founded, considering the legal challenges faced by entities such as OpenAI and Stability AI over alleged intellectual property infringements.

“Generative AI can exist without exploiting copyrighted work without permission. We’re pleased that we continue to meet and certify great AI companies and developers who prove this.”

Fairly Trained

Despite its relatively small dataset of around 350 billion tokens, its success showcases the potential for specialised, high-quality data to bring about optimum performance without the need for massive, indiscriminately compiled data troves. This approach could help mitigate legal risks but also improve the model’s efficacy in specific applications such as legal document analysis and contract drafting.

AI without copyright infringement

The certification of KL3M by Fairly Trained has gained support from a large group of stakeholders, including the Authors Guild, the Association of American Publishers, SAG-AFTRA, and Universal Music Group.

Read: Authors Guild releases guidelines on ethical use of AI

Mary Rasenberger, CEO of the Authors Guild, indicated the significance of Fairly Trained’s certification as a milestone in advocating for the rights of creators. “Too many generative AI companies exploit copyrighted work without permission. The certification available through Fairly Trained incentivizes AI companies to train on licensed data and centers human creators in the AI landscape,” Rasenberger added in a statement, highlighting the initiative’s role in ensuring that authors maintain control over their works in the age of AI.

Fairly Trained also recently announced the expansion of its Licensed Model certification to include LLMs and voice AI, with certification for several new companies.

Share:

More Posts:

Laura Gao on Messy Roots book ban and anti-LGBTQ sentiment

Internet Archive forced to remove 500k books from digital library

Libraries Change Lives Week on integral role in UK

Fossil Free Books faces backlash, corporations evade scrutiny – opinion

Subscribe To Our Newsletter:

Support Our Website

Your donations mean a lot to us.
Help us keep the website up and running by supporting our mission today.
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments