The ‘GPT moment’ in AI robotics is upon us

It’s no secret that foundational models are revolutionizing AI in the digital world. Large language models (LLM) such as ChatGPT, LLaMA, and Bard are revolutionizing AI for language. While OpenAI’s GPT models are not the only large language models available, they have achieved the most mainstream recognition for taking text and image inputs and delivering human-like responses – even in some tasks that require complex problem solving and advanced reasoning.

The viral and widespread adoption of ChatGPT has largely shaped how society perceives this new opportunity for artificial intelligence.

The next development that will define AI for generations is robotics. Creating AI-powered robots that can learn how to interact with the physical world will improve all forms of repetitive work in sectors from logistics, transportation, and manufacturing to retail, agriculture, and even health care. It will also unlock many efficiencies in the physical world as we have seen in the digital world in the past decades.

While there is a unique set of problems to solve within robotics compared to language, there are similarities in the core foundational concepts. And some of the brightest minds in AI have made significant progress in building “GPT for robotics.”

What makes GPT successful?

To understand how the “GPT for robotics” was created, first look at the core pillars that enable the success of LLMs like the GPT.

Foundation model approach

GPT is an AI model trained on a broad, diverse dataset. Engineers used to collect data and train specific AI for a specific problem. Then they need to collect new data to solve the other. Another problem? New data again. Now, with a foundation model approach, the exact opposite is happening.

Instead of building niche AIs for each use case, one can be used by everyone. And that one general model is more successful than each special model. AI in a foundational model is better at a specific task. It can apply learnings from other tasks and generalize to new tasks better because it has learned more skills from performing well in different tasks.

Train on a large, proprietary, and high-quality dataset

To have a general AI, you first need access to a lot of different data. OpenAI captures the real-world data needed to train GPT models reasonably efficiently. GPT trains data collected from around the internet with a large and diverse dataset, including books, news articles, social media posts, code, and more.

Creating AI-powered robots that learn how to interact with the physical world will improve all forms of repetitive work.

It’s not just the size of the dataset that matters; curating high-quality, high-value data also plays a big role. GPT models achieve unprecedented performance because their high-quality data are informed by most of the tasks that users care about and the most helpful answers.

Role of reinforcement learning (RL)

OpenAI uses reinforcement learning from human feedback (RLHF) to adapt the model’s response to human preferences (ie, what a user considers useful). There must be more to pure supervised learning (SL) because SL can only approach a problem with a clear pattern or set of examples. LLMs require AI to achieve a goal without a unique, correct answer. Enter the RLHF.

RLHF allows the algorithm to work toward a goal through trial and error while a human identifies correct answers (high reward) or rejects errors (low reward). AI finds the reward function that best explains human preference and then uses RL to figure out how to get there. ChatGPT can provide responses that mirror or exceed human-level capabilities by learning from human feedback.

The next frontier in foundational models lies in robotics

The same core technology that allows GPT to see, think, and even speak also enables machines to see, think, and act. Robots powered by a foundational model can understand their physical environment, make informed decisions, and adapt their actions to changing conditions.

“GPT for robotics” was built in the same way as GPT – laying the groundwork for a revolution that will, once again, change AI as we know it.

Foundation model method

By taking a foundational modeling approach, you can also build an AI that can perform many tasks in the physical world. A few years ago, experts advised to create a special AI for robots that select and pack grocery items. And that’s different from a model that can sequence different electrical parts, which is different from a model that unloads pallets from a truck.

This paradigm shift in a foundational model enables AI to better respond to edge-case scenarios that often exist in unstructured real-world environments and otherwise suppresses models with narrower training. Creating a general AI for all these scenarios is more successful. By training everything you get the human-level autonomy that was missing from previous generations of robots.

Train on a large, proprietary, and high-quality dataset

Teaching a robot to know what actions lead to success and what lead to failure is very difficult. This requires a lot of high-quality data based on real-world physical interactions. A lab setting or video examples are not reliable or well-sourced (eg, YouTube videos fail to capture the details of physical interactions and academic data tend to be limited in scope) .

Unlike AI for language or image processing, there is no prior dataset that represents how robots will interact with the physical world. Thus, large, high-quality datasets become a more complex challenge for robotics to solve, and deploying a fleet of production robots is the only way to create a diverse dataset.

The role of reinforcement learning

Like answering text questions with human-level competence, robotic control and manipulation requires an agent to seek progress toward a goal that does not have a single, unique, correct answer (e.g. , “What is the successful way to get this red onion?”) . Again, more than pure supervised learning is needed.

You need a robot that runs deep reinforcement learning (deep RL) to succeed in robotics. This autonomous, self-learning approach combines RL with deep neural networks to unlock higher levels of performance – the AI ​​automatically adapts its learning strategies and continues to improve its skills as it progresses. experience new scenarios.

Challenging, explosive growth is coming

Over the past few years, some of the world’s brightest AI and robotics experts have laid the technical and commercial groundwork for a robotic foundation model revolution that will change the future of artificial intelligence.

While these AI models have become similar to GPT, achieving human-level autonomy in the physical world is a distinct scientific challenge for two reasons:

  1. Building an AI-based product that can serve a variety of real-world settings has a unique set of complex physical requirements. AI must adapt to different hardware applications, because it is doubtful that one hardware will work in different industries (logistics, transportation, manufacturing, retail, agriculture, healthcare, etc.) and activities in every sector.
  2. Warehouses and distribution centers are an ideal learning environment for AI models in the physical world. It’s common to have hundreds of thousands or even millions of different stock-keeping units (SKUs) flowing through any facility at any given time – delivering the large, proprietary, and high-quality datasets needed to train the “GPT for robotics.”

The AI ​​robotics “GPT moment” is just around the corner

The development path of the robot foundation models is very fast. Robotic applications, especially within tasks that require precise object manipulation, are already being applied in real-world production environments – and we will see an exponential number of robotic applications available in commercially deployed at scale by 2024.

Chen has published more than 30 academic papers appearing in top global AI and machine learning journals.

Leave a comment