NVIDIA’s Nemotron‑3 Ultra is a frontier‑scale LLM with 550 B total (55 B active) parameters, a hybrid LatentMixture‑of‑Experts (LatentMoE) architecture, and up to 1 M token context length. It runs on NVIDIA GPUs (minimum 4 × B200/H100) and offers configurable reasoning traces, multi‑token speculative decoding, and multilingual support. Benchmarks show strong performance on agentic, reasoning, and long‑context tasks.
NVIDIA and Microsoft announced a full‑stack solution for building and running AI agents on Windows PCs, enterprise workstations, and Azure. New hardware (RTX Spark laptops, DGX Station for Windows, RTX PRO 6000 Blackwell servers) pairs with NVIDIA OpenShell runtime, open‑source models on Microsoft Foundry, and GPU‑accelerated Microsoft Fabric. The goal is to let developers code, tune, and deploy long‑running, secure agents locally or in the cloud.
Google Gemma: Announces Gemma 4 12B, a unified encoder‑free multimodal model for laptops released under an Apache 2.0 license. Google AI Developers: Highlights that Gemma 4 12B bridges their mobile E4B and larger 26B MoE models, offering frontier‑class reasoning and native audio. NVIDIA: Notes local AI agents advancing on DGX Spark and RTX PCs, with OpenShell arriving on Windows, new agentic AI optimizations, Broadcast 2.2, and upcoming RTX acceleration for Adobe apps and Blender. Visual Studio Code: Reports May updates—Agents window now stable, BYOK with air‑gapped support, and an integrated browser that can emulate devices and preview HTML without extensions. Ideogram: Introduces Ideogram 4.0, an open image model with downloadable weights, fine‑tuning on personal data, and availability across all plans and the API.
NVIDIA unveiled RTX Spark, a new class of Windows PCs built for on‑device AI agents. With up to 1 petaflop of AI compute, 128 GB of unified memory, and new security primitives (OpenShell), the platform promises faster, private inference for popular open‑source agents such as Hermes and OpenClaw. Similar capabilities are extended to Linux via DGX Spark, while multi‑GPU optimizations boost llama.cpp and ComfyUI performance.
Qwen: introduces Qwen3.7-Plus, a multimodal agent model that unifies vision and language with both GUI and CLI operation and serves as a coding and productivity assistant. OpenAI: frontier models and Codex are now generally available on AWS via Amazon Bedrock, extending enterprise security, compliance, and governance workflows. xAI: Composer 2.5 is now inside Grok Build, described as a fast, highly intelligent model for long‑running tasks and complex instructions. LangChain: highlights Fleet for secure agent access to private resources and adds LangSmith LLM Gateway spend limits that return a 402 error when caps are hit. Google Antigravity: is becoming a scientific workbench with a Science Skills bundle that runs complex workflows like protein analysis using Alpha* models and dozens of databases; Google Gemma: releases the first gemma‑skills iteration, enabling agents to build with Gemma, use MTP for speed, pick model size, and locate up‑to‑date resources. ClaudeDevs: resets 5‑hour and weekly rate limits for Pro/Max plans and fixes excessive parallel subagents; Cursor: raises usage limits for Teams and adds a Premium seat with 5× usage at 3× cost; Visual Studio Code: demos orchestrating agents via the VS Code Agents window; NVIDIA: adds real‑time AI media tools including Synthetic Video Detector (up to 92% accuracy, 22 ms latency), RTX Video Super Resolution and Frame Generation; Vercel: enables remote execution of Conductor’s parallel coding agents on fast Sandboxes; Perplexity: launches Search as Code, a new architecture that writes Python to call its search stack directly, now default in the Perplexity Agent API.
OpenRouter (@OpenRouter) has integrated its models into ComfyUI workflows, allowing users to leverage OpenRouter models directly within ComfyUI. GitHub (@github) highlights the 2026 Partner Pack, offering exclusive discounts and perks for maintainers. The GitHub Innovation Graph provides economic data on trends in GDP, inequality, and emissions, which researchers find valuable. Google AI Developers (@googleaidevs) showcases successful implementations of Managed Agents in the Gemini API, including Eigent_AI's root cause analysis and llama_index's document processing template. NVIDIA (@nvidia) discusses the Dell AI Factory with NVIDIA, which enables companies to build, run, and scale AI, with NemoClaw powering agentic AI on-prem. OpenAI (@OpenAI) shares Terence Tao's experience with AI, which gives researchers more freedom to experiment and pursue unconventional ideas. LangChain (@LangChain) emphasizes the efficiency of LangSmith Sandboxes, which pause automatically when idle, and encourages users to create agents using everyday language with LangSmith Fleet.
OpenRouter now supports "apply_patch," a server tool that lets models propose file edits using V4A diffs through the Responses API. The model generates a patch, and OpenRouter validates the diff syntax server-side. This feature allows for more efficient and accurate file editing. xAI has released grok-build-0.1 in public beta via the xAI API. This model powers the Grok Build CLI and excels at agentic coding, priced at $1/m input and $2/m output. Google AI has released an episode of Release Notes featuring the architects of Gemini, including @JeffDean, @koraykv, @OriolVinyalsML, and @NoamShazeer. They discuss their journey and the people behind the model. LangChain has released LangSmith LLM Gateway, which enforces spend limits and redacts PII before requests reach the model. They also announced Deep Agents v0.6, which makes harness profiles a first-class abstraction, allowing for production-grade performance at lower costs. NVIDIA has announced a new era of PC, but the details are unclear. OpenAI has launched Rosalind Biodefense to help trusted builders develop new biodefense and pandemic preparedness capabilities. They are also expanding trusted access to GPT-Rosalind for select U.S. government and allied partners.