Google’s latest innovation in artificial intelligence, the Robotics Transformer 2 (RT-2), is breaking new ground in the world of robotics. Leveraging cutting-edge AI tech similar to the systems powering AI chatbots like Bard and ChatGPT, Google aims to enable robots to perform tasks more efficiently and effectively. Vincent Vanhoucke, the head of robotics for Google DeepMind, unveiled RT-2 as a “first-of-its-kind vision-language-action (VLA) model” that allows users to control robots using natural language.
This groundbreaking development enables robots to interpret textual and image data from the web and execute corresponding actions. With RT-2, Google is making strides in revolutionizing how robots comprehend the world around them, ultimately enhancing their capabilities in various fields, starting with the mundane but essential task of disposing of trash.
AI tech for conventional robot control of Google’s RT-2
Google’s RT-2, the visionary vision-language-action (VLA) model, has propelled the robotics field forward by leaps and bounds. Leveraging technology similar to AI chatbots Bard and ChatGPT, RT-2 transcends the confines of conventional robot control. In contrast to chatbots that process text data for human interactions, robots face a more intricate challenge of understanding their physical environment. This distinction poses a unique hurdle as robots must differentiate objects, interpret context, and execute precise actions based on their perceptions.
Vincent Vanhoucke emphasized the complexity of robot comprehension, pointing out that recognizing a simple object like an apple is much easier than distinguishing between a delicious red apple and a red ball before accurately picking up the desired item. Google’s RT-2 effectively bridges this gap, empowering robots to comprehend real-world scenarios and respond to instructions given in natural language. By combining language understanding and visual perception, RT-2 ushers in a new era of robotics with applications spanning manufacturing, healthcare, disaster response, and beyond. With RT-2’s remarkable capabilities, the potential for human-robot collaboration and transformative advancements in various industries is nothing short of astounding.
Empowering robots to dispose of trash
Previously, training robots to perform even seemingly simple tasks like throwing away trash involved an arduous and time-consuming process. Engineers had to teach the robot to identify the trash, grasp it appropriately, locate a suitable trash can, and then deposit the waste carefully. This intricate choreography demanded extensive training and fine-tuning of numerous parameters, rendering the process slow and monotonous.
With the advent of RT-2 and its ability to draw from vast troves of online image data, the training process for robots has been transformed. The new AI model empowers robots to swiftly learn to identify trash and autonomously execute the necessary steps to pick it up and dispose of it correctly. With a small amount of training data, RT-2 remarkably allows robots to transfer concepts embedded in their language and vision training data, enabling them to perform complex tasks they have never been explicitly trained for.
In a striking demonstration, a robot effortlessly identified and lifted a toy dinosaur when asked to pick up an extinct animal among a group of toys. Another challenge saw the robot proficiently moving a small toy Volkswagen car towards a German flag. These real-life examples illustrate how RT-2’s language and vision capabilities propel robotics to new heights of versatility and responsiveness.