Alibaba Cloud has unveiled three new large language models (LLMs) today, including Qwen-72B, Qwen-1.8B, and the Qwen-Audio, each designed for unique use cases.
The move came after Alibaba Cloud officially released its Tongyi Qianwen model’s 2.0 version at the Alibaba Cloud Conference one month ago.
The most notable of the three new models is the Qwen-72B, which has 72 billion parameters and is trained on data based on 3 trillion tokens. The 72B model aims to benchmark against top-tier open-source models such as Meta’s Llama models.
The 1.8B model represents an exploration in consumer on-device applications, while the Qwen-Audio model is a new exploration in multimodality, an important direction for generative AI.
Alibaba Cloud claims that Qwen-72B achieved the best results among open-source models in 10 authoritative benchmark evaluations, even surpassing the closed-source GPT-3.5 and GPT-4 in some of these evaluations.
However, experts caution against directly equating these benchmark successes with superior real-world performance.
Currently, LLMs are divided into two main approaches: closed-source and open-source. In July this year, Meta released Llama 2, available in three sizes: 7B (7 billion parameters), 13B (13 billion parameters), and 70B (70 billion parameters).
The launch of Alibaba Cloud’s Qwen-72B seems to be providing the Chinese market a similar-sized open-source model comparable to Llama 2-70B.
Currently, Alibaba Tongyi Qianwen LLM family has models with 1.8 billion, 7 billion, 14 billion, and 72 billion parameters.
Different scales of models mean that application scenarios can be expanded further. With the introduction of Qwen-72B, medium and large enterprises can develop commercial applications based on it, while universities and research institutes can use it to assist in scientific research. These tasks require complex computations and depend on the continuous expansion of the model’s capabilities.
Qwen-1.8B model is specifically designed for edge computing, requiring only 3GB of VRAM to process 2K-length text content, making it runnable on consumer-level devices such as smartphones and computers.
The training and inference costs of LLMs remain high, and super-large-scale parameter models can only be deployed in the cloud. The Qwen-1.8B model can run offline on devices like smartphones and computers, capable of running light tasks like documents and images processing.
Alibaba Cloud has also open-sourced its audio understanding model, Qwen-Audio, marking an exploration in the multimodal field. Qwen-Audio can perceive and understand various types of audio signals, including human voices, natural sounds, animal sounds, and music.
Users can input an audio clip and ask the model to interpret the audio, and even use it for literary creation, logical reasoning, or story continuation based on the audio.