Sandhini Agarwal: We have a lot of next steps. I really think the virality of ChatGPT has caused a lot of issues that we knew existed to really pop up and become critical – things we want to fix as soon as possible. For example, we know that the model is always very biased. And yes, ChatGPT is very good at denying bad requests, but it’s also quite easy to write prompts that allow it to not deny what we wanted it to deny.
Liam Faith: It has been exciting to watch the diverse and creative applications from users, but we are still focusing on areas for improvement. We believe that through an iterative process where we deploy, get feedback, and refine, we can produce the most aligned and successful technology. As our technology evolves, new problems inevitably emerge.
Sandhini Agarwal: In the weeks since launch, we’ve looked at some of the most terrible examples people have found, the worst things people have seen in the wild. We kind of assessed each of them and discussed how we should fix them.
Jan Leike: Sometimes it’s something that has gone viral on Twitter, but we have people reaching out to us quietly.
Sandhini Agarwal: A lot of things we found were jailbreaks, which is definitely an issue we need to address. But because users have to try these convoluted methods for the model to say something wrong, it’s not like it’s something we completely missed, or something that was very surprising to us. Still, it’s something we’re actively working on right now. When we find jailbreaks, we add them to our training and testing data. All the data we see feeds into a future model.
Jan Leike: Every time we have a better model, we want to release it and test it. We are very optimistic that targeted training on the opponent can significantly improve the situation with the jailbreak. It’s not clear if these issues will go away completely, but we believe we can make jailbreaking much more difficult. Again, it’s not like we didn’t know jailbreaking was possible before release. I think it’s very difficult to really anticipate what the real security issues will be with these systems once you deploy them. So we put a lot of emphasis on monitoring why people are using the system, what’s happening, and then reacting to that. That’s not to say we shouldn’t proactively mitigate security issues when we anticipate them. But yes, it is very difficult to predict everything that will actually happen when a system enters the real world.
In January, Microsoft unveiled Bing Chat, a search chatbot which many assume to be a version of OpenAI’s officially unannounced GPT-4. (OpenAI says, “Bing is powered by one of our next-generation models that Microsoft has customized specifically for search. It incorporates advancements in ChatGPT and GPT-3.5.”) The use of chatbots by tech giants Technology with a multi-billion dollar reputation to protect creates new challenges for those tasked with building the underlying models.