Zhifei Li, founder of Chinese artificial intelligence firm Mobvoi, once joked in a Wechat post that the best business model for a Chinese AI company was to become an Internet celebrity via smart marketing and then pivot to become an e-commerce firm. But, he added self-deprecatingly, “because I’m not handsome enough and my Mandarin is terrible, I choose to sell premium hardware powered by our own proprietary AI technology.”
Behind the joke, however, lies Li’s personal struggle to answer the billion-dollar question: What is the ideal business model for a successful AI company? A native of China’s Hunan province, Li still speaks with a slight accent when China Money Network recently visited Mobvoi’s Beijing headquarters. But there was no ambiguity in his conviction.
“Our strategy is to develop the core voice recognition technology and then apply them to different hardware scenarios, where the technology makes sense for users,” the former Google scientist with a computer science Ph.D. from John Hopkins University told China Money Network. “We are an AI company…(and) have the capability to combine algorithm, software and hardware all together.”
Mobvoi, a name derived from combining the first three letters of “mobile” and “voice”, is among the leading players of a new pack of Chinese AI companies that are primarily focused on the so-called AI application layer. Instead of drilling deep into AI technology research, such as DeepMind Technologies, the vast majority of Chinese AI companies – 71% to be specific – are focused on generating real returns from applied artificial intelligence, according to a recent report released by Vertex Holdings, a member of Singapore’s Temasek Holdings.
This means Chinese AI companies have struggled with the brutal “business model” question for longer and deeper than their American peers, of which only 44% are application focused, according to a Tencent study. Discarding invalid answers (Li’s Wechat post featured the survey choice “Continue painting an ever brighter picture for VCs to raise more money”, which got the most votes from his friends), the choices for AI companies include: 1. Provide AI technology solutions to enterprises for a fee; 2. Go downstream to smart hardware.
For Li, the first choice is abhorrently lousy. In option one, the only way to scale up is to try to grab more orders from enterprises, which inevitably leads to price competition and margin compression. Not to mention that Chinese technology giants such as Baidu, Alibaba and Tencent (BAT) could offer such services for free. In fact, a company in this camp, Face++, already provides free basic face and image recognition technology, while charging a fee for premium services, in a freemium (free+premium) model.
As a veteran researcher with over a decade of experience first at the Natural Language Processing Lab of John Hopkins University and later on Google’s translation team, Li believes he and his team’s AI technology experience put Mobvoi at an advantage compared to other Chinese hardware makers. Other benefits of the software-hardware combo approach include brand awareness by end-users, potential to grow exponentially, and various ways to monetize the hardware.
Investors seem ready to buy Li’s vision. Having secured US$250 million in financing from Volkswagen Group China, Google, Sequoia Capital’s angel investment unit, SIG Asia Investment and others, Mobvoi is reportedly valued at a near-unicorn level, or approaching US$1 billion. Declining to disclose the company’s valuation, Li prefers to explain his philosophy on what an AI company must do to survive and thrive.
“If you want to make an AI company successful, you have to have products and integrate the technology in the products,” Li said while flashing Mobvoi’s Ticwatch S smart watch on his wrist. Mobvoi’s first step of re-tooling itself to shift from being a voice interaction mobile app maker to a hardware developer came in 2015, when it launched its first product, the Ticwatch smart watch. Today, the company’s product portfolio includes various Ticwatch products, smart speakers, smart rear-view mirrors for cars, voice-powered airpods, as well as a virtual personal assistant platform and a smart watch operating system.
But the challenges in hardware is equally daunting. For one, price competition is extreme. Mobvoi has to compete with Chinese tech giants Alibaba, Tencent and JD.com, which have all released or are about to release their smart speaker products. While Mobvoi’s smart speakers are priced at RMB1,399 (US$214), the same product sold by Xiaomi, known for selling products at prices approaching zero profitability, costs only RMB299 (US$46).
Xiaomi, founded by famed entrepreneur Lei Jun, also has a more complete product portfolio encompassing an air purifier, cleaning robot, rice cooker and hundreds more items. It potentially offers many more things that could be controlled by Xiaomi’s smart speakers, and therefore providing a better user experience.
Li said that Mobvoi targets high-end consumers in China, and his company is trying to get its smart speakers to control more things at home, in the car and elsewhere in the future. A US$180 million series D financing round led by Volkswagen Group China is certain to help. Outside of smart rear-view mirrors, cars produced by Volkswagen China may have an embedded smart dashboard with voice controls powered by Mobvoi’s technology.
“(Smart hardware) is a gigantic market because there will be billions of devices connected…It will be as prevalent as the smartphone…and I expect a few players will survive in the end,” Li told China Money Network. The unsaid is obvious, that Li certainly believes Mobvoi will be among those that survive and thrive.
You can listen to our conversation above or read a Q&A below. Don’t forget to subscribe to China Money Podcast for free in the iTunes store, or subscribe to China Money Network weekly newsletters. You can also subscribe to China Money Podcast’s Youtube channel or Youku channel.
Q: When did you find artificial intelligence to be the field you wanted to study, research or even start a company in?
A: Back in 2005, when I was pursuing my Ph.D. program, I found that voice recognition can be very useful when humans interact with machines. I was so curious about why machines can recognize people’s voices. It was 12 years ago, even two years before the iPhone was launched. Back then, it was a pure blank field, and there were not so much research going on at that time, even in the U.S.
After I completed my Ph.D. in computer science in 2010 from Johns Hopkins University, I joined Google Research developing Google Translate. At that time, mobile phones were so popular, and I thought it was time to redefine interaction between humans and machines. Previously, we were using touch screens or keyboards, which were not very convenient for mobile phones, while voice command will be a much better and natural way. So later, in 2012, I came back to China and started Mobvoi.
Q: Did you speak to Google or Zhen Fund before you started the company?
A: Yes, I was really lucky to receive funding from Sequoia Capital and Zhen Fund at the very beginning when I established the company.
Our products are quite innovative, especially our smart wearable and smart watch. We also developed Ticwear, a smartphone operating system based on Android. So when Google wanted to launch the Android Wear device in China, they partnered with some local companies providing voice search engines, that’s why Google invested in us.
Q: When you started your company in 2012, what product did you want to develop?
A: The first idea we had was to use voice recognition technology to develop mobile apps. You can talk to the app and it will provide the answers to you, just like Apple ’s Siri, a voice assistant.
Later, our strategy gradually shifted to hardware. The main problem of the app is that it is not so convenient (for users). People have to find the app, download it, press a button and then get the results. So we thought what other applications are there that can make voice technology more useful for customers. Then we start to focus on smart wearables.
Q: So you entered a brand new sector that was very different from developing software and technology. What were some major challenges?
A: The first problem is that we don’t have any hardware experience and we had to find talent. We were lucky that Nokia Beijing was laying off a lot of staff at that time. Our team stayed at the Nokia Beijing office and hired many experienced engineers.
Once we had the talent, the second thing was to understand the development cycle of hardware. For software, once you had the idea, you just write the code, and then launch it on the Internet. But for hardware, it takes a few months or longer. So, we had to completely change our mindset.
Q: Looking forward, what are some sectors that Mobvoi will get into in the future, and what are the sectors that you are definitely staying away from?
A: In the future, we will likely make smart wireless earphones. Our strategy is to develop the core voice-in-action software and then apply it to different hardware scenarios, where the technology makes sense for users.
But we will definitely avoid smartphones, because the competition is already really fierce. It is also difficult to change user behavior, as they are very used to touch screens on their smartphones.
Q: When we talk about smart home appliances, we can’t ignore Xiaomi Inc., which arguably has the most complete smart home ecosystem in China. What do you think are Mobvoi’s competitive advantages?
A: First of all, Xiaomi is a strong player in China. But I don’t think their smart home devices will dominate the Chinese market. More than 80% of its market share are devices that are provided by other manufacturers. So I think we still have a lot of opportunities.
We are an AI company and we have the core voice-in-action technology, as well as hardware products to apply our AI technology. We have the capability to combine algorithms, software and hardware all together, which Xiaomi does not have.
Q: Many companies are crowding into the smart home and IoT sectors, each wanting to set the standard and be the platform that integrates others. What do you think this will lead to?
A: We don’t have a good solution right now. It’s true that there are lots of players with different standards.
Currently, many companies claim that they have the voice recognition technology to control home appliances like TVs and air conditioners, but people are not really using it. Once the market becomes mature and customers get used to the technology, I am sure some leading players will emerge and they will set the standard.
Q: What’s your outlook for the smart home appliance and IoT sectors?
A: I think it’s a big market, because you will have billions of devices connected to the IoT platforms. For example, you have devices detecting environment and activities to make a smart decision. For now, the most important thing for a company is to grab market share and have as many devices as possible connected to you platform.
Q: Regarding smart speakers. Xiaomi’s smart speaker costs RMB299 (US$46) a piece. What’s your product’s price?
A: Our price is RMB1,399 (US$214) and we are targeting high-end consumers, a different customer group from Xiaomi.
If you look at the smart watch market in China, some smart watches only cost RMB300 or RMB200, but ours is priced at RMB999. Still, we were one of the top three smart watch companies in China in terms of sales. So I think some people prefer a lower price, but others are willing to pay for a higher price.
In China, we have 200,000 to 300,000 units of smart watches sold each year. For the overseas market, our sales are around 100,000 units. Currently, the international version only supports English language. Later we will use Android Wear and will have more languages available, which will support our expansion to more countries.
Q: Currently, there is no proven case in the smart watch industry. Are you optimistic about the industry’s future?
A: Absolutely. If you think about what smart devices have the ability to generate good sales each year, I think smart speakers and smart watches are the only two. Amazon.com’s Echo has over ten million units sold every year, and it’s a huge number compared with other smart devices.
Q: As an AI company in China, what do you think is the right business model to achieve revenue and make the firm sustainable?
A: If you want to make an AI company successful, you have to have products, and integrate the technology in the products, which is what we are doing.
The other solution is to identify some partners who already have a product. For example, you can collaborate with automotive companies and integrate your voice-in-action technology in their vehicles. We partner with Volkswagen AG. We also cooperate with Chinese real estate firm, China Resources Land Ltd. We install our smart home devices in their buildings.
Q: You just formed a strategic partnership with Volkswagen to establish a joint venture. Besides the AI rear-view mirror for cars, what are some new products that we can expect?
A: The smart rear-view mirror is really the first step. In the future, the cars produced by Volkswagen will have an embedded smart dashboard with voice recognition.
Q: Many people think the future disruptor of the BAT will be an AI company. Do you agree?
A: Absolutely. After five or ten year, most of the devices will be voice enabled devices instead of mobile phones. Companies like us have entered the market before it’s mature, and we are the natural players in the industry, because most voice-in-action technologies that other companies are doing will be based on our software and our devices.
About Zhifei Li:
Zhifei Li is the founder of Mobvoi, also known as Chumen Wenwen, a Chinese start-up developing voice recognition technology as well as smart hardware. Prior to that, he was a research scientist at Google. Li holds a Ph.D. in computer science from Johns Hopkins University.