Computer vision: Helping us to make sense of our world
In 1985, everyone thought the future belonged to hoverboards and giant TV phones. Three and a half decades later, we have turned to ride-sharing services and messaging each other on hand-held mobile phones.
But what is the next great technological innovation?
Imagine, if you will, a not-so-distant future. Imagine using your face as a “ticket” to a sports event or as a way to get menu recommendations at a noodle restaurant. It might sound far-fetched, but Rakuten Institute of Technology (RIT) has already conducted real-life trials of the technology. It’s only a small part of how RIT is teaching computers to “see” and recognize patterns to better understand our world and relieve people of mundane, repetitive tasks.
Improving everyday life with technology
At an event earlier this year, RIT staff carried out an experiment in which roughly 150 people volunteered to submit headshots of themselves before attending a baseball game at a stadium in Tokyo. At the gates, a computer system scanned participants’ faces. Participants were then asked to enter a pre-determined sequence of athletes as an added layer of identification security (and to vet out the possibility of identical twins and other look-alikes.) The system then compared their faces and sequences to those in its database. If there was a match, the participants were admitted.
In another experiment with a major noodle restaurant chain in Japan, a separate computer vision system analyzed imagery of participants and recommended menu items based on age and gender.
“A large part of our brain is dedicated to processing visual information, and we’re trying to teach the same thing to machines,” says Bjorn Stenger, head of RIT’s vision program. “Computer vision is trying to understand what is going on in images and video, taking visual information as input and outputting actionable data or high-level abstract data.”
From its beginnings at Massachusetts Institute of Technology in the 1960s, computer vision eventually became sophisticated enough to read zip codes and then detect faces in images, with real-time detection achieved some 20 years ago. In recent years, graphics hardware, better cameras and AI approaches such as deep learning have accelerated progress in the field. Now face detection is standard in many new smartphones, and face recognition is becoming more common on social media platforms.
RIT’s vision program is made up of seven units with a total of around 20 people. Stenger and his fellow researchers have been developing computer vision technologies for various applications that go beyond face detection and recognition, for example using augmented reality on mobile devices to allow shoppers to “try out” furniture before purchase or creating 3D models from 2D floor plans.
Streamlined online shopping
A major focus at RIT is computer vision tools that will help to improve online shopping. One effort involves training an AI system to scan a set of images and marketing material and automatically generate banners for online advertising campaigns or landing pages. Another tool can review thousands of merchant pages on Rakuten Ichiba, the e-commerce marketplace, and automatically flag imagery that may be confusing, unclear or too distracting, such as photos with too much text. Meanwhile, photo-enhancement features can instantly improve color, contrast and other features, boosting the chances that customers may take interest in a product.
“One challenge is really to improve Rakuten’s catalog and search functionality,” says Stenger. “Merchants have a high degree of freedom to post visuals and text on Rakuten Ichiba, but that means cataloging an item is challenging. We use both text and computer vision to better know whether a product is in the right place in a catalog.”
Born in Germany, Stenger completed a PhD at the University of Cambridge before starting a computer vision career in industrial research. With his 15 years of experience and extensive research track record, he was recruited to join RIT’s Reality Domain Group. Reflecting Rakuten’s embrace of open innovation, Stenger and his colleagues share their results at academic conferences and collaborate with universities through research programs and internships.
“In contrast to a university research lab, it’s very easy here to apply research to the real world,” he says. “We have lots of training data — not just e-commerce — because Rakuten has over 70 different businesses. RIT is working on problems in a huge variety of domains such as fintech and customer verification, and we apply solutions to all these different areas like sports events, travel and marketing.”
RIT’s strengths in computer vision lie in resolution and image enhancement techniques. It’s also working on augmented and virtual reality demos, as well as visual simultaneous localization and mapping (vSLAM), which allows sensing devices to visually map their environment and locate themselves within it. Meanwhile, RIT researchers are also working with California-based Rakuten Medical to harness computer vision in order to detect biomarkers as part of the fight against cancer.
“RIT has quite an international environment, with high-quality research output,” says Stenger. “The mission of our lab is to bridge the gap between cutting-edge research and business. Computer vision is a really exciting field, and we hope to bring more research into production at Rakuten.”
Find out more about the research conducted and open positions at Rakuten Institute of Technology here: https://rit.rakuten.co.jp/