Three remarkable things that happened in AI this week: Meta, Meta, and model sparsification

Techtonic
3 min readApr 20, 2024

--

Llama 3 released: score one for the…little…guy?

What happened: Meta announced the release of Llama 3, the next version of their (arguably) open-source LLM. It comes in 8b and 70b variants, like Llama 2, but with a range of improvements in architecture, pre-training, and fine-tuning that give it substantially better performance. Llama 3 easily surpassed other open-source models on major benchmarks and gave the closed-source ones a run for their money, although they must have forgotten to include GPT-4 on their list of comps. They’ve also dialed back some of the safety measures that led to complaints about Llama 2.

Why it matters: Meta has emerged as the standard-bearer for open-source LLMs, and Llama 3’s improved performance doesn’t give open-source the lead against closed-source, but it closes a lot of the gap. If current relative performance holds where it is, closed-source LLMs will no doubt continue, but it’s hard to see how much of a premium they can charge. Advances in open-source also make it harder to avoid use cases like deepfakes or dangerous activities. Meta’s release used cute language around safety, saying that “We envision Llama models as part of a broader system that puts the developer in the driver’s seat,” which seems to mean that they’re tired of people complaining that Llama 2 won’t answer questions and so they’re just going to let developers do what they want.

Meta decides deepfakes aren’t really a problem, or at least not its problem

What happened: Meta recently announced that it was changing its policy on deepfakes. Instead of automatically removing deepfakes when detected, it will now label AI-generated content as “Made with AI.” It will only remove content that also violates its policies in other ways (such as interfering with elections or promoting vaccine misinformation).

Why it matters: This is actually a walkback from their old policy, and almost amounts to giving up on deepfake detection entirely. Meta will identify AI-generated content in one of two ways: either the user self-reports it, or the content includes industry-standard metadata or watermarks showing an AI origin. There doesn’t seem to be any way for users to report other content (presumably because it doesn’t want to have to adjudicate those claims), and industry standards around labeling are patchy (at best) and will always be easy for a malicious actor to defeat. In my tests, I was able to slip this sophisticated deepfake past the detectors:

As part of my commitment to responsible AI, I would like to declare that this is not a real photograph. Source: Microsoft Designer

Model sparsification continues to advance while no one’s looking

What happened: Researchers from MIT and elsewhere released JetMoE-8B, a fully open-source LLM trained only on public domain data that outperforms the comparably-sized Llama2–7B. What’s exciting about JetMoE is that it was trained with an extremely tight compute budget (or “0.1M dollars,” as the paper puts it) — a tiny fraction of what Meta spent on Llama 2. The team did this through a combination of smart architectural decisions and (to oversimplify) only actually training the most important portions of the model. JetMoE also requires only about one-third of the compute of Llama 2 to run, although I can confirm it’s significantly harder to type.

Why it matters: Sam Altman recently (reportedly) said that it would require $5–7 trillion to train the next generation of LLMs, which, if true, would imply a major reshaping of the economy. It would also mean that LLM creation would be constrained to a handful of huge players. But projects like JetMoE (and I could have picked several others as examples) show that there’s tremendous upside in working smarter, not harder, by making better decisions about what and how we’re training. This could save us, well, trillions of dollars, and it also implies a more diverse landscape with more LLM creators.

3 remarkable things is a more-or-less weekly roundup of interesting events from the AI community over the past more-or-less week

--

--

Techtonic

I'm a company CEO and data scientist who writes on artificial intelligence and its connection to business. Also at https://www.linkedin.com/in/james-twiss/