Chinese venture capitalist Kai-Fu Lee’s artificial intelligence startup 01.AI has been embroiled in a controversy over usages of open-source large language model architecture. The startup, already worth over US$1 billion since being founded in July, issued statement today to clarify its usages of existing model architecture.
01.AI, founded by chairman and CEO of Chinese venture firm Innovation Works, Kai-Fu Lee, complete a round of financing by Alibaba Cloud with valuation reportedly exceeding US$1 billion. It released two open-sourced pre-trained large models, Yi-34B and Yi-6B, last month on the open-source community Hugging Face.
After the unveiling of the two models, Jia Yangqing, former Vice President of Technology at Alibaba and the inventor of the deep learning framework Caffe, hinted that the Yi models just a shell of the LLaMA architecture.
LLaMA is a large language model created by Meta, released in July of this year and fully open-sourced. Some developers have stated that apart from two tensors being renamed, Yi completely used the LLaMA architecture.
In response to these doubts, 01.AI released a statement about the training process of Yi-34B, mentioning that "the core of the continuous development and breakthrough of large models lies not only in the architecture but also in the parameters obtained through training."
It clarified that in the process of training the model, 01.AI used the basic architecture of GPT/LLaMA, which allowed for a quick start and was more developer-friendly. But that Yi-34B and Yi-6B models were trained from scratch by 01.AI and underwent a lot of original optimization work.
Regarding the oversight of renaming some inference code borrowed from LLaMA after experimentation, the startup said the original intention was to fully test the model and conduct comparative experiments. The renaming of some inference parameters are not intended to deliberately conceal anything.
To further dispel the controversy, 01.AI said today that after several weeks of internal legal analysis, the company has confirmed that it is not involved in any shell or plagiarism issues.
After the open-source release of Yi-34B, a developer named Eric Hartford suggested that the Yi model should reverse the renaming of the two tensors in order to maintain consistency in tensor names across all models based on the open-sourced LLaMA architecture.
Following this, 01.AI resubmitted the model and code to various open-source platforms, reversing the renaming of the two tensor names.
However, critics still point out that the controversy was about how 01.AI presented its Yi models. The company came out stating that the Yi models are "the first Chinese-made model to top the global open-source large model rankings." The company emphasized how its models are "domestically-made", while mentioning nothing about utilizing existing open-sourced models or gave any credit to the LLaMA architecture.
Currently, the Yi model has been downloaded 168,000 times in the Hugging Face community. It has gained over 4,900 Stars on GitHub. Several well-known companies and institutions have also launched fine-tuned models based on the Yi model platform.
For example, OrionStar, a subsidiary of Cheetah Mobile, released the OrionStar-Yi-34B-Chat model, and the Cognitive Computing and Natural Language Research Center of Southern University of Science and Technology and the Guangdong-Hong Kong-Macao Greater Bay Area Digital Economy Research Institute jointly released the SUS-Chat-34B.