Three remarkable things that happened in AI this week: Ilya Sutskever is back, Apple -> Santa, and a smarter approach to CNN layer reduction
Ilya Sutskever starts something
What happened: Ilya Sutskever, the co-founder of OpenAI who left the company last month, has announced the launch of Safe Superintelligence Inc. (or SSI). SSI plans to do what it says on the label, build a superintelligent AI system designed with a strong safety mindset. In an interview with Bloomberg, Sutskever explained that he intended to take a different approach to safety, building it into the system instead of putting guardrails around it towards the end, telling Bloomberg, “By safe, we mean safe like nuclear safety as opposed to safe as in ‘trust and safety.”
Why it matters: Sutskever was part of the group that famously fired Sam Altman as OpenAI CEO last November, and ever since the attempt failed, his future at OpenAI had been the subject of speculation. When he left in May, many observers linked it to concerns about safety (although much of this speculation stemmed from the departure of Jan Leike around the same time, which Leike explicitly said was for safety reasons). The launch of SSI reinforces that view, with the name and announcement so clearly focused on safety.
There’s a second reason why the launch of SSI is interesting. Sutskever said that SSI was not going to create any interim products before the launch pf “safe superintelligence.” This is presumably to avoid the kinds of commercial considerations that many people think have led OpenAI to make safety tradeoffs; it also presumably means SSI will have no revenue and considerable costs. Whether investors are willing to keep funding a company, probably for years, on these terms will be a useful barometer of whether the current levels of enthusiasm for spending money on AI persist.
You know it’s bad when Apple is releasing things to open-source
What happened: Apple released 20 machine-learning models to Hugging Face. The models cover a range of applications from object detection to depth estimation, and have been optimized to run directly on user devices (which is another way of saying that they’re small). Apple also released some datasets and benchmarks. This follows Apple’s release of four LLMs onto Hugging Face in April.
Why it matters: Apple has always been famously guarded with its IP, and releasing models into the open-source community is a significant departure. There are two possible interpretations (well, if we rule out the theory that the company has just become extremely generous). The first is that Apple is acting from a position of weakness–it’s far behind in the AI race, and so it needs to bolster its AI credentials. The second stems from the fact that these models are meant to be run on-device–this could also be a Meta-like play to try to shape the future of AI in a way that lines up with Apple’s existing strategy. Apple would like to keep the device at the center of everything, with as much power as possible in the actual phone/tablet/computer.. If AI models are genuinely running on-device, then customers are more likely to pay up for a premium iPhone instead of a cheap Android device that’s just connecting back to OpenAI’s servers anyway.
The relentless march of sparsification
What happened: Researchers from Seoul National University, Samsung, and Google published a paper proposing a new method of pruning convolutional neural networks. This is normally done by removing activation layers, and since that leaves two adjacent convolution layers, merging them together. This reduces the time required for inference, but the problem is that your merged layers require a larger kernel to process, which reduces the net efficiency gain. Their solution is to remove both activation and convolution layers at the same time, holding the kernel size down.
Here’s an ugly analogy: what happens if you take the middle bun out of a Big Mac? You end up with two all-beef patties next to each other. You could combine those two meat patties into a bigger one, like from a quarter-pounder, but then of course you’d need a larger…um…mouth? So these guys take out the middle bun and one of the patties. Do I regret this analogy? Very much so.
Why it matters: As I’ve said before, we can’t just keep growing the compute requirements for LLMs exponentially. A lot of the most interesting advances in recent months have been about doing more with less, or at least doing the same with less, through clever design and through removing unnecessary bulk from models, a process called “sparsification.” This is another step forward, finding a way to squeeze more efficiency out of layer reduction.