Rakuten AI and Google are driving the next generation of autonomous agents

Superintelligent virtual assistants and autonomous robots still feel like something out of a science fiction novel. But are we really that far off? One session at the most recent Rakuten Technology Conference in Tokyo explored this very question.

Rakuten’s Director of the AI Services Supervisory Department Taku Okoshi was joined by Hamidou Dia, VP of Applied AI Engineering at Google, for a discussion on the next generation of autonomous agents. Together, they walked the audience through how we arrived at this stage of AI evolution, and what might lie beyond the horizon.

“We are on the cusp of the biggest technology platform shift of our lifetime,” Dia stressed. “AI agents are the next frontier in this platform shift, and we are just at the very, very beginning of this AI era.”

No longer just chatbots

Rakuten Group’s Taku Okoshi presents the company’s shift beyond chatbots toward an AI “agent” ecosystem integrating services.

For Okoshi, who serves as an executive officer at Rakuten Group, this shift is already visible. Rakuten operates more than 70 different services in Japan alone, engaging with over 100 million users in e-commerce, banking, payments, travel, mobile, and more.

This scale and diversity present a unique challenge for Rakuten: Can AI graduate from simply answering questions, to taking action across the Rakuten Ecosystem?

“We set AI-nization* as a keyword across Rakuten Group. On a daily basis, we are fully utilizing our AI capability,” Okoshi said. “We want to set AI as a gate of the Rakuten Ecosystem. This is our latest vision and mission.”

This mission has advanced rapidly from internal optimizations to consumer-facing products. Rakuten AI can now act as a central agent, connecting shopping, hotel recommendations and even music streaming.

Rakuten Travel has launched an intelligent concierge that can help plan trips, discover hotels, and explore destinations. Okoshi highlighted how a collaboration with Google has allowed the agent to combine Rakuten’s travel data with public web information to overlay on Google Maps.

“Thanks to Google’s technology, we can fully integrate Google Maps capability into our agent.”

A process that previously required searching, filtering and navigating apps is rapidly transforming into a single step-by-step conversation that remembers context.

“These agents are very different from what we used to know about AI bots,” Dia explained. “These intelligent agents can reason, plan, take action on behalf of a human, and most importantly, they can also have memory.”

How did we get here, and where are we going?

“This whole thing started with the chatbot, right?” Dia said. “You put in a prompt and get a lot of information.”

Around two years ago, the AI scene was abuzz with talk of retrieval augmented generation (RAG), a tech that allowed LLMs to reference concrete data in their answers and avoid hallucination.

“Then LLMs started introducing reasoning capabilities, and also the ability to do function calling,” Dia continued. “Then we entered the tooling phase – reasoning, multi-step reasoning – and we started building agents. Now we are really in the multiple-agent system era.”

But the journey is far from over.

“We are all marching toward what we call AGI – artificial general intelligence, or superintelligence,” Dia said. “Multimodality – models that can reason across text, image, video, audio, coding and music. Then world models – models that generate environments dynamically. Then multiple-agent systems that leverage real-world, multimodal generation and feed it into physical agents.”

This trajectory doesn’t stop at software. “That’s what’s going to enable this paradigm shift – allowing us to get to physical agents with advanced reasoning.”

So where are the robots?

Dia discusses the challenges of bringing AI from code into the complexity of the physical world.

“Once we move from software-based agents and intelligence agents, how do we get into the real world?” Dia posed. “That’s where we are heading.”

He showed a demo video of an AI-powered robot sorting waste according to San Francisco rules – green bin for compost, blue for recycling, and black for regular trash.

“Sounds easy, but it’s extremely complex for a robot to understand those types of instructions and perform those tasks,” he stressed. “The physical environment is very challenging – it’s unforgiving compared to software.”

This is one of three reasons, Dia argued, that robotics is lagging behind the software side of AI. “In the digital world, when you’re dealing with bytes and pixels, if an error happens, you can undo it,” he noted. “In robotics, you can kill someone or break something. The physical environment is much more complex.”

The second reason: a lack of data.

“For LLMs, for example, Meta Llama 3 was trained on 15 trillion words. There’s a massive amount of internet data to train software agents,” he said. “The largest publicly available dataset for robotic actions is about 2.4 million examples. That’s nothing.”

This is one major challenge facing Google’s efforts in AI-powered robotics. “It’s extremely challenging to get real robotic action data to train these models. That’s why we’re focused on building generative models for the real world.”

And finally: expense. “Costs have gone down significantly, but physical robots are still very expensive.”

According to Dia, Google’s future lies in the models powering future robots.

“Google’s focus is not robot engineering itself, but model development that runs on top of robotics,” he explained. “We’re not building physical robots. We’re building the best robotic models that organizations can use to power robots.”

A new battleground: trust, safety, and control

As AI agents become more capable and autonomous robots become a possibility, the challenges facing the AI industry shift. Sheer intelligence is no longer enough; enterprises must now grapple with the question of whether these systems can be trusted.

“Security is also about looking at principles,” Dia remarked. “Privacy for all your data. Making sure when you are building this agent, your IP is protected. Your data is within the perimeter of your enterprise.”

It’s a question of particular importance for companies like Rakuten, with many different businesses, spanning everything from finance to telecommunications.

“Each business has regulations from the Japanese government or industry,” Okoshi said. “According to these regulations, we need flexibility in our AI agent platform and systems.”

“For Google, what’s important is openness – open source, open standards,” Dia offered. “In this agentic era, it’s important to define open standards and build frameworks that allow every organization to leverage an open system.”

All of this must reside atop a foundation of strong governance, incorporating role-based access, enterprise policies, compliance certifications and data residency.

“How do you make sure that if there are strong sovereignty requirements, not only is the data residing in the country, but the processing is also happening in the country?”

As we enter this new, agentic phase, the future of AI may not be decided by raw intelligence, but by who can make it safe, flexible and predictable enough to operate inside real organizations.

* AI-nization is Rakuten’s initiative to implement AI in every aspect of its business to drive further growth, and deliver on the commitment to realizing a world where everyone can enjoy the benefits of AI.”

Tags
Show More
Back to top button