Inside Rakuten AI: Lee Xiong on Japanese LLMs and the future of AI
In this series, we sit down with Rakuten AI leaders for a deep dive into the stories behind this transformative technology and the inspiring individuals driving Rakuten’s vision of AI for all. Watch this interview and others in the Inside Rakuten AI Series on our YouTube channel.
At the helm of Rakuten’s Machine Learning and Deep Learning Engineering Department, Lee Xiong, Rakuten Group Vice Director of AI Research Supervisory Department, is charting an ambitious course for Japanese artificial intelligence (AI). His journey into AI began during college, shortly before the field exploded into the popular consciousness.
“It was an era just before deep learning became mainstream,” he recalls. “At that time, machine learning and deep learning were considered very niche topics.”
In 2017, transformers burst onto the scene, revolutionizing the way text is interpreted by computers: “We realized that this was going to be revolutionary for natural language processing and all text-based tasks.”
Xiong has since made significant contributions to the field of AI, including the oft-cited ANCE (approximate nearest neighbor negative contrastive estimation) research paper, which has served as a cornerstone for RAG (retrieval-augmented generation) driven applications. He went on to pioneer GPU model deployment for search applications at Microsoft, before joining Rakuten and helping to launch the Machine Learning and Deep Learning Engineering Department in 2022.
Foundation, innovation and frontier research
Xiong explains Rakuten’s three overarching strategies for AI development.
“The first is what we call foundational scenarios,” he begins. “Our search, ads, recommendations – these have already proven that machine learning can generate value and significant improvement year over year across the industry.”
“LLMS ARE IMPORTANT IN MULTIPLE ASPECTS NOW. AND IF AGI (ARTIFICIAL GENERAL INTELLIGENCE) IS UNLOCKED ONE DAY, THE IMPACT TO SOCIETY AND ECONOMIES, WILL BE TREMENDOUS.”
LEE XIONG, RAKUTEN GROUP VICE DIRECTOR OF AI RESEARCH SUPERVISORY DEPARTMENT
Xiong and team have already developed and implemented an AI-driven semantic search system for several of Rakuten’s services.
“Search is one of the largest industry applications of machine learning. Still today, together with recommendation and ads, it’s one of the largest industry machine learning applications.”
The second strategy, Xiong’s team describes as innovations and prototypes.
“These are chat-based, image-based, voice-based, and many new emerging interaction models that frankly, people are still testing,” he continues. “We believe something different from today’s interaction will emerge in the coming years. And that will dominate how you interact, how you make purchases on e-commerce websites.”
Lastly, frontier research: “This means LLMs. LLMs are important in multiple aspects for AI today. In addition, if AGI (artificial general intelligence) is also unlocked one day, then the impact to society as well as economies, will be tremendous. I don’t think anyone needs to be reminded of that.”
Powerful data and strong fundamentals
One powerful advantage Xiong’s team enjoys is Rakuten’s diverse ecosystem of over 70 different services in Japan and abroad. This allows for powerful horizontal collaboration across different fields of machine learning – something not available to many other companies and universities.
“We do not just do LLMs alone. We actually do search, recommendation – and the code and engines we build are good for all machine learning workloads. The improvement from each of them helps each other,” Xiong states. “We think we are operating with very high efficiency, especially given the size of our team compared with some other hyperscalers.”
“WE DON’T TAKE SHORTCUTS. OUR LATEST MODELS WON’T HAVE A STRONG DEPENDENCY ON ANY PRETRAINED CHECKPOINT, SO WE CAN ACTUALLY BUILD OUR OWN FOUNDATIONAL MODEL FROM SCRATCH.”
“Rakuten probably has one of the most diverse data sets of any company, especially in Japanese,” Xiong notes.
To make the most of this powerful data, Xiong and team have committed to advancing Rakuten’s fundamental AI capabilities.
“Typical LLM development is broken down into two stages – one is called pre-training, one is called fine-tuning,” he explains. “Pre-training is what you do to have the model running on a corpus of natural text, and you ask it to predict the next token. Fine-tuning is when you actually train the model for specific tasks.”
In the world of AI development, it’s tempting to simply fine-tune a pretrained model to rate highly on a particular leaderboard, without making something meaningfully more useful. Xiong reveals that despite Rakuten’s recent success on the LLM performance charts, his team is focused on its own metrics.
“We don’t take shortcuts,” he stresses. “Our latest models will not have a strong dependency on any pretrained checkpoint, meaning that if a major developer does not open source their next model, or other foundational models don’t open source, we can actually build our own foundational model from scratch. I don’t think many companies can do that.”
Breakthrough in Japanese language AI
“English models just don’t work that well in Japanese,” Xiong explains, noting that without fine-tuning, many popular open-source models reply to Japanese queries in English. “Sometimes people also feel, if you’re typing Japanese or Chinese, or some other language other than English into most GenAI chatbots for the exact same question, you get a degraded answer quality. This shows that these models are overlooking non-English languages.”
This is understandable, Xiong notes, given the abundance of AI innovation coming out of the U.S.
“WE HAVE THE BEST DATA AND THE MODEL CAN SCALE. AND IT’S ABLE TO LEARN JAPANESE SUPER EFFICIENTLY.”
“Obviously most GenAI leaders from Silicon Valley will optimize for English, first and foremost. But that creates a gap as well as an opportunity for other players such as Rakuten,” Xiong argues. “We can build a model that’s much better in Japanese, and to a level that it’s actually practical to apply in production.”
Xiong’s team has done just that: In early 2024, Rakuten made waves in the AI community with the release of a suite of models, including Rakuten AI 7B. The model topped performance charts in Japanese at the time of launch, addressing crucial gaps in Japanese language processing capabilities among global AI models.
“I think we have a more efficient tokenizer for sure. We did this with a proprietary compression algorithm that we have proven gives the most optimal token for a given vocabulary, the most optimal token compression for the Japanese language – without hurting the English language.”
Xiong says that the key to LLM success is scalability and data quality.
“We have a lot of data, and a lot of the data is in Japanese, in pretty reasonable quality. And the team understands it, so they’re able to do better filtering,” he explains. “When we combine those together, then the chemistry happens. We have the best data and the model can scale. And it’s able to learn Japanese super efficiently.”
A mission to empower all with AI
The importance of developing native Japanese AI capabilities extends beyond mere functionality into the realm of technological competitiveness on the global stage, Xiong highlights.
“The scary situation would be the case where only one country, and only two or three companies in that country can build the most powerful AI. Then all the other companies and countries are at their mercy to some extent,” he proposes.
The key, Xiong argues, is to have more active players in the field.
“If there are many players in the world who can build super AI, then it’s not a problem,” Xiong notes. “I fundamentally believe that we as a tech company should be not just applying AI technology, but making AI as well… Hopefully we can be one driving force for the democratization of such powerful technology!”
Preparing for a future of AGI
Xiong sees two paths into the future for AI at Rakuten. “The first track is where everyone is competing today: scale up the model, while reducing the cost.”
“THE VALUE SOMETHING LIKE AGI CREATES MAY FAR EXCEED EXPECTATIONS, SO IT’S EXTREMELY IMPORTANT TO HAVE MODELS THAT WE CAN BUILD AND OFFER OUR CUSTOMERS.”
One method to achieve this is called Mixture of Experts.
“How it works is you train a gigantic model made of many small models, and when you run the model, only a subset of the small models needs to be triggered,” Xiong explains. “You’re only paying a fraction of the cost compared to running an entire model, but you get performance that is almost as good as if you were running the full model.”
The technology is rapidly gaining steam, and Rakuten is an early adopter. “Our latest model is a Mixture of Experts model optimized for Japanese.”
Secondly, Xiong’s team is looking at things from a broader perspective, reexamining the very architecture of modern AI.
“The limitation of transformers has been well recognized in the industry. It doesn’t scale with longer inputs,” he explains. “Performance drops as you put more terms in the chat, as the model has to handle longer context.”
Breaking through this barrier is specifically important to achieving artificial general intelligence – something for which Xiong believes Rakuten needs to be on the ground floor.
“We believe that for AGI to happen, for something like human-level test proficiency in a professional setting to happen, a model needs to handle millions and millions of contexts,” Xiong remarks. “Once something like AGI happens, the value that it creates may far exceed anyone’s expectations. At that point, it will be extremely important to have models that we can build and offer to our customers.”