A college research team from Jiangsu Province in China has unveiled the country’s first Large Language Model (LLM). This groundbreaking model, named “Xunzi” after the renowned ancient Chinese philosopher Xun Zi (荀況,) is specifically designed to process and analyse Chinese ancient texts. It harnesses the power of deep learning techniques and vast datasets to revolutionise the research and preservation of Chinese ancient books.

The release of this LLM marks a remarkable advancement in the field of AI and historical research. According to Chinese state media Global Times, the team’s objectives for this project are multilayered, aiming to boost innovation, improve the quality of Chinese ancient book preservation, and facilitate a deeper connection between LLMs and the processing of historical manuscripts.
Read: Mongolian books face ban as part of China’s ‘cultural genocide’
Professor Wang Dongbo, leading the research team from the College of Information Management at Nanjing Agricultural University, has been looking to digitise ancient books and documents for over a decade. The team’s expertise, combined with the university’s computing resources, has culminated in the creation of this LLM.
Unlocking ancient wisdom: Xunzi’s unique features and applications
The LLM “Xunzi” contains a vast repository of Chinese ancient books and documents, including the monumental “Complete Library in Four Sections” or “Siku Quanshu.” With a staggering corpus of over 2 billion Chinese characters and words, Xunzi is reported to swiftly analyse, summarise, and extract key information from these ancient texts with unparalleled efficiency.
One of the features of Xunzi includes its ability to generate ancient poems that adhere to grammar and prosody rules. It will be able to provide new versions of ancient poetry, offering a glimpse into this world. Additionally, Xunzi is said to be able to translate ancient texts into modern Chinese, aiding researchers in comprehending the original meaning and significance of these ancient writings.
Open-source collaboration: Xunzi’s contribution to historical preservation
The research team has made the source available by publishing the LLM “Xunzi” on open source platforms like github.com and modelscope.cn. This means that the LLM is freely available for download and use by researchers and enthusiasts alike, promoting further exploration and innovation in the field.
Professor Wang Dongbo also expressed his vision for this LLM, stating, “We trained Xunzi using big data built on ancient books which can be obtained for free on the internet just like the way OpenAI trained ChatGPT. Although we spent great effort, labor force and money into it, we still share it for free with the aim to encourage more people to study and pay attention to traditional Chinese culture.”
China’s first LLM tailored for ancient books, “Xunzi” marks a significant milestone in the realm of AI and cultural preservation. While it appears to be incredibly beneficial in supporting China’s rich historical culture, it remains to be seen whether it will face similar challenges with copyright infringement. The country has been increasingly moving towards bringing back more historical cultural practices in recent years, including calling to reinstate learning traditional Chinese characters, hence it has been banning books that does not conform to what they deem a unified Chinese approach.
[…] Read: China unveils AI large language model for ancient books […]
[…] Read: China unveils AI large language model for ancient books […]