When Open AI’s GPT-3 model debuted in May 2020, its performance was universally regarded as a literal state of the art. GPT-3 set a new standard in deep learning by being able to generate writing unrecognizable from human-crafted language. What difference a year makes, though. On Tuesday, researchers from the Beijing Academy of Artificial Intelligence announced the release of Wu Dao, their own generative deep learning model that appears to be capable of performing everything GPT-3 can do and more.
First and foremost, Wu Dao is vast. It was trained using 1.75 trillion parameters (basically, the model’s self-selected coefficients), which is ten times more than the 175 billion parameters used to train GPT-3 and 150 billion more than Google’s Switch Transformers.
To train a model on these many factors in such a short period of time — Wu Dao 2.0 was released barely three months after version 1.0 in March — the BAAI researchers first created FastMoE, an open-source learning system similar to Google’s Mixture of Experts. This technology, which runs on PyTorch, allowed the model to be trained on both supercomputer clusters and regular GPUs. Because FastMoE does not require proprietary hardware like Google’s TPUs and can thus run on off-the-shelf gear — supercomputing clusters notwithstanding — it has more flexibility than Google’s approach.
Unlike other deep learning models, Wu Dao is multi-modal, similar in theory to Facebook’s anti-hate speech AI or Google’s recently introduced MUM. Wu Dao’s talents in natural language processing, text generation, image recognition, and image generation were demonstrated by BAAI researchers. Not only can the model compose essays, poetry, and couplets in traditional Chinese, but it can also generate alt text from a static image and nearly photorealistic visuals from natural language descriptions.
“Big models and big computers are the roads to artificial general intelligence,” stated Dr Zhang Hongjiang, head of BAAI, during the conference on Tuesday. “What we are constructing is a power plant for the future of AI; with mega data, mega computing capacity, and mega models, we will be able to transform data to feed future AI applications.”