Three remarkable things that happened in AI this week: former OpenAI employees speak out, Microsoft tells fraudsters not to use its new fraud tool, and better scaling data

Techtonic
4 min readJun 4, 2024

--

Former OpenAI employees warn about AI dangers

What happened: A group of mostly former OpenAI employees published an open letter warning of the significant safety risks created by AI, ranging from misinformation to human extinction. The letter noted that “advanced AI companies” have the most information about current and future AI capabilities, and therefore the risks, but that this information was not widely shared, and government oversight of AI is currently weak. The letter therefore called for those companies to commit to a series of principles mostly around transparency and the ability to flag risks.

Why it matters: The phrase “human extinction” certainly grabs your attention. The letter didn’t single out any companies by name, but most of the signatories were connected to OpenAI, which is currently under the most scrutiny for a cavalier approach to safety. Two senior OpenAI employees, Ilya Sutskever and Jan Leike, left the company in May, with Leike explicitly criticizing OpenAI on safety, and CEO Sam Altman has also recently apologized for aggressive non-disclosure clauses in separation agreements which have reportedly been used to try to prevent former employees from speaking out on safety. The open letter comes at a time when artificial intelligence is facing a growing backlash on whether the companies behind it are adequately considering fairness, safety, and the economic and social impact of their work.

Microsoft creates a tool for doing frauds but asks people not to do frauds with it

What happened: Microsoft announced VASA, a generative-AI tool that can use a single photo of a person and a short clip of their voice to create hyper-realistic video of that person talking.

Why it matters: I mean, seriously, Microsoft? There have been a number of video-call scams using generative AI, including one in February where a manager sent $25 million to scammers following a videoconference with what he thought was his company’s CFO. But don’t worry, because there’s a note at the bottom of the announcement saying that “It is not intended to create content that is used to mislead or deceive,” so, you know, problem solved. Microsoft, to its very partial credit, has also said that it won’t release VASA until “we are certain that the technology will be used responsibly and in accordance with proper regulations,” although it’s not at all clear what could give them that certainty, and what these proper regulations might be.

A smarter approach to measuring model scalability

What happened: A joint team from the Swiss university EPFL and from Hugging Face released a paper showing that a simpler approach to training models, with a constant learning rate throughout (until a short cooldown at the end), delivers similar results to the cosine schedule commonly used today. Say what, now? Here’s a picture from the paper of the two different approaches:

Source: Hagele, A., Bakouch, E., Kosson, A., Ben Allal, L., Von Werra, L., and Jaggi, M. 2024. Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations. arXiv preprint: arXiv:2405.18392 [cs.LG]

Why it matters: Look, I’m not going to lie; this is mostly a palate-cleanser after feeling depressed about OpenAI and Microsoft. But it’s still good news. I’ve written in the past about how inefficient much of modern LLM training is–in part, that’s because the large model builders have a vested interest in everyone’s believing that you need massive amounts of compute to build a good model, and in part, it’s because we don’t have great numbers about how much training is really required to produce the best possible model from a given architecture and training data.

Okay, take a deep breath, because this is a little technical. A good way to understand how your model scales with compute is to train multiple versions, each with different levels of compute, and see how much value that incremental compute adds. But with cosine scheduling, if you want five different data points, you have to train it five different times, which takes time and money, because the path each model training takes is different. With a constant learning rate, like the blue line above, you essentially only need to train it once, with the maximum compute. Then you simply take snapshots along the way (and cool them down, not too hard or costly), and those are your other data points. If people started to do this, we would have a much better understanding of model scaling, and we could probably train our models a lot more efficiently, which would have efficiency benefits (people’s time, compute time, energy usage) and would probably lead to more research into better architectures, instead of just the ongoing race for more money, hardware, and energy.

Three remarkable things is a more-or-less weekly roundup of interesting events from the AI community over the past more-or-less week

--

--

Techtonic

I'm a company CEO and data scientist who writes on artificial intelligence and its connection to business. Also at https://www.linkedin.com/in/james-twiss/