Google DeepMind has developed a new family of Gemini Robotics models, designed to enhance the capabilities of robots in performing complex physical tasks with unprecedented adaptability and dexterity. These models build upon the foundation of Gemini 2.0, incorporating fine-tuning with robot-specific data to add physical action to Gemini's multimodal outputs like text, video, and audio.
The development of Gemini Robotics models involved training on a broad range of tasks, rather than focusing on single-task training. This approach, known as broad task learning, allowed the models to generalize across various tasks and environments. The team conducted extensive testing, including tasks like putting pens into a shoe and performing a slam dunk with a toy basketball hoop, to validate the models' capabilities.
Google DeepMind is collaborating with partners like Apptronik to integrate these AI models into humanoid robots. The models are designed to adapt to multiple embodiments, including academic-focused robots and humanoid robots, enabling them to perform tasks like packing a lunchbox or wiping a whiteboard in different forms. Potential applications span both consumer and industrial settings, though specific commercial products and timelines for wider availability have not yet been announced.
Safety is a key consideration in the development of Gemini Robotics models. Google DeepMind has incorporated traditional robotics safeguards and leveraged Gemini's core safety features. The company has also introduced a new dataset called ASIMOV to help researchers measure the safety implications of robotic actions in real-world scenarios.
The introduction of Gemini Robotics models represents a significant step towards bringing AI into the physical world. While challenges remain in refining robotic dexterity, real-time decision-making, and broader generalization, these models lay the groundwork for AI-driven robots that can assist in homes, workplaces, and beyond.
For more detailed information, you can visit the official blog post and the Maginative article.