OpenAI Shipped Its Most Capable Model Today. About 20 Companies Can Use It.
GPT-5.6 launched as a government-vetted preview. The cyber benchmarks are why.

Janet Torvalds
June 26, 2026OpenAI released its most capable model on Friday, and then made sure almost nobody can use it. GPT-5.6 went out as a "limited preview" to roughly 20 companies, all of them named to the U.S. government in advance. Everyone else waits.
The model comes in three sizes under a new naming scheme. Sol is the flagship, aimed at hard coding and security work. Terra is the middle tier, which OpenAI says matches its previous model, GPT-5.5, at half the price. Luna is the cheap, fast one for high-volume jobs like summarizing and drafting. The number is the generation; the names are meant to be durable tiers that get updated on their own schedule. So a future "Sol" can get smarter without becoming GPT-5.7.
For now the only way in is through OpenAI's API and Codex, its coding tool, and only if you are one of the trusted partners on the list. OpenAI says it expects to add more companies next week and reach a broad release "in the coming weeks."
What you actually get, if you can get it
Pricing is public even if the model mostly isn't. Per million tokens, Sol runs $5 for input and $30 for output. Terra is $2.50 and $15. Luna is $1 and $6. A token is roughly three-quarters of a word, so a long document costs real money on Sol and pocket change on Luna. That spread is the whole point of the tiers.
Two new knobs ship with the family. A max reasoning setting lets Sol spend more time thinking before it answers, which costs more output tokens but is supposed to help on problems that reward deliberation. And an ultra mode spins up subagents, meaning the model breaks a task into pieces and runs several copies of itself in parallel to get through complex work faster. Both are levers for trading money and time against quality, which is what most of the recent "reasoning" features actually are.
OpenAI also says Sol will run on Cerebras hardware at up to 750 tokens per second starting in July, for select customers. For comparison, that is fast enough to feel close to instant on most prompts. Cache pricing changed too: cache writes now cost 1.25 times the normal input rate, cache reads still get the 90 percent discount, and a cached prompt now sticks around for at least 30 minutes.
The benchmarks are real, and partial
OpenAI published a handful of evaluation results and openly said the full set is coming later, "when we make the model broadly available." That is worth holding onto. What got released is a curated slice chosen by the company shipping the model.
On coding, OpenAI claims Sol sets a new high on Terminal-Bench 2.1, a test of command-line tasks that require planning and tool use. On biology, it points to GeneBench v1, a genomics benchmark, where Sol beats GPT-5.5 while using fewer tokens. The one number that comes with outside methodology is ExploitGym, a security benchmark built by UC Berkeley researchers with OpenAI and other labs, published on arXiv. The others are OpenAI's own evals, which does not make them wrong, but does mean nobody outside has reproduced them yet.
The security results are the reason this whole thing is gated.
Cyber is the real story
OpenAI says Sol is its strongest model yet at cybersecurity, and it does not pretend otherwise. On its own ExploitBench, Sol is competitive with Anthropic's Mythos preview while using about a third of the output tokens. In tests against Chromium and Firefox, the company says, the model found bugs and "exploitation primitives," the building blocks of an attack, but did not, under the conditions tested, chain them into a working end-to-end exploit on its own.
That last clause is doing a lot of work. OpenAI's framing is that Sol "is better at helping people find and fix vulnerabilities than reliably carrying out end-to-end attacks." The model lands below the "Cyber Critical" line in the company's own preparedness framework. But OpenAI also concedes a benchmark cannot capture every way a model gets combined with other tools, which is why it paired the release with a heavier safety stack: refusals trained in, real-time classifiers that can pause a response mid-generation for a bigger model to review, and account-level checks that look across a user's conversations. It says it spent more than 700,000 A100-equivalent GPU hours having its own models attack the safeguards to find jailbreaks that work across many prompts rather than one.
Washington is now in the release pipeline
The 20-company cap did not come from OpenAI. The company says it previewed GPT-5.6 and its capabilities with the government over the past month, including meetings Sam Altman had at the White House in early June, and that the government asked it to start with a small, vetted group before going wider. This is the same pattern that hit Anthropic's Fable 5 and Mythos 5 models, which means Anthropic is no longer the only lab negotiating its launches with the federal government.
OpenAI is cooperating and clearly unhappy about it. "We don't believe this kind of government access process should become the long-term default," the company wrote. "It keeps the best tools from users, developers, enterprises, cyber defenders, and global partners who need them." It framed the limited preview as a short-term step while it works with the administration on a "repeatable process for future model releases."
There is a deadline behind all of this. Under an executive order signed June 2, the administration has until August to stand up a classified process for assessing the cyber capabilities of new AI models and deciding which count as "covered frontier models." GPT-5.6 is landing in the gap: the government has announced it will review models like this but has not finished saying how. So for now the review looks like a phone call, a list of names, and a number, 20.
The takeaway is less about one model than about what "launch" now means for a frontier lab. A year ago, shipping a model meant turning on an API. This week it meant getting sign-off on who gets to touch it first. The capabilities OpenAI is most proud of, the ones that find security holes, are exactly the ones that bought it a government chaperone.
Sources (3)
- Previewing GPT-5.6 Sol: a next-generation modelopenai.com
- OpenAI releases powerful new GPT-5.6 model under restrictionswww.axios.com
- GPT-5.6 Preview system carddeploymentsafety.openai.com