A new paper penned by researchers from Google DeepMind, ETH Zurich, University of Washington, OpenAI, and McGill University reveals that OpenAI and Google's AI models have been cracked open.
The new paper reveals that thirteen computer scientists from the aforementioned locations were able to launch an attack on OpenAI and Google's closed AI services, and this attack resulted in the revealing of a significant hidden portion of the underlying transformer models. More specifically, the attack revealed the embedding projection layer of a transformer model through API queries. Notably, the attack technique was originally proposed back in 2016 and has since been built upon to achieve the breaking of OpenAI and Google's AI models.
The team made Google and OpenAI aware of their infiltration, which both companies responded to by implementing mitigation techniques for that specific type of attack. Furthermore, the team decided not to publish their findings online, which would have been the exact size of OpenAI's GPT-3.5-turbo model, as this information was deemed harmful to the product since bad actors would learn aspects of the model, such as total parameter count, weight, size, etc.
"For under $20 USD, our attack extracts the entire projection matrix of OpenAI's ada and babbage language models," the researchers state in their paper. "We thereby confirm, for the first time, that these black-box models have a hidden dimension of 1024 and 2048, respectively. We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix."
"If you have the weights, then you just have the full model. What Google [et al.] did was reconstruct some parameters of the full model by querying it, like a user would. They were showing that you can reconstruct important aspects of the model without having access to the weights at all," explained Edouard Harris, CTO at Gladstone AI, in an email to The Register