📚 Tutorial: Google Open Source Blog: 2026

Posts from 2026

Gemma 4: Expanding the Gemmaverse with Apache 2.0

Thursday, April 2, 2026

by Nia Castelly & amanda casari, Google Open Source & Olivier Lacombe, Google DeepMind

Gemma 4: Expanding the Gemmaverse with Apache 2.0

For over 20 years, Google has maintained an unwavering commitment to the open-source community. Our belief has been simple: open technology is good for our company, good for our users, and good for our world. This commitment to fostering collaborative learning and rigorous testing has consistently proven more effective than pursuing isolated improvements. It's been our approach ever since the 2005 launch of Google Summer of Code, and through our open-sourcing of Kubernetes, Android, and Go, and it remains central to our ongoing, daily work alongside maintainers and organizations.

Today, we are taking a significant step forward in that journey. Since first launch, the community has downloaded Gemma models over 400 million times and built a vibrant universe of over 100,000 inspiring variants, known in the community as the Gemmaverse.

The release of Gemma 4 under the Apache 2.0 license — our most capable open models ranging from edge devices to 31B parameters — provides cutting-edge AI models for this community of developers. The industry-standard Apache license broadens the horizon for Gemma 4's applicability and usefulness, providing well-understood terms for modification, reuse, and further development.

A long legacy of open research

We are committed to making helpful, accessible AI technology and research so that everyone can innovate and grow. That's why many of our innovations are freely available, easy to deploy, and useful to developers across the globe. We have a long history of making our foundational machine-learning research, including word2vec, Jax, and the seminal Transformers paper, publicly available for anyone to use and study.

We accelerated this commitment last year. By sharing models that interpret complex genomic data and identify tumor variants, we contributed to the "magic cycle" of research breakthroughs that translate into real-world impact. This week, however, marks a pivotal moment — Gemma 4 models are the first in the Gemmaverse to be released under the OSI-approved Apache 2.0 license.

Empowering developers and researchers to deliver breakthrough innovations

Since we first launched Gemma in 2024, the community of early adopters has grown into a vast ecosystem of builders, researchers, and problem solvers. Gemma is already supporting sovereign digital infrastructure, from automating state licensing in Ukraine to scaling Project Navarasa across India's 22 official languages. And we know that developers need autonomy, control, and clarity in licensing for further AI innovation to reach its full potential.

Gemma 4 brings three essential elements of free and open-source software directly to the community:

Autonomy: By letting people build on and modify the Gemma 4 models, we are empowering researchers and developers with the freedom to advance their own breakthrough innovations however they see fit.
Control: We understand that many developers require precise control over their development and deployment environments. Gemma 4 allows for local, private execution that doesn't rely on cloud-only infrastructure.
Clarity: By applying the industry-standard Apache 2.0 license terms, we are providing clarity about developers' rights and responsibilities so that they can build freely and confidently from the ground up without the need to navigate prescriptive terms of service.

Building together to drive real-world impact

Gemma 4, as a release, is an invitation. Whether you are a scientific researcher exploring the language of dolphins, an industry developer building the next generation of open AI agents, or a public institution looking to provide more effective, efficient, and localized services to your citizens, Google is excited to continue building with you. The Gemmaverse is your playground, and with Apache 2.0, the possibilities are more boundless than ever.

We can't wait to see what you build.

Google Cloud: Investing in the future of PostgreSQL

Tuesday, March 31, 2026

by Dilip Kumar, Cloud SQL for PostgreSQL

At Google Cloud, we are deeply committed to open source, and PostgreSQL is a cornerstone of our managed database offerings, including Cloud SQL & AlloyDB.

Continuing our work with the PostgreSQL community, we've been contributing to the core engine and participating in the patch review process. Below is a summary of that technical activity, highlighting our efforts to enhance the performance, stability, and resilience of the upstream project. By strengthening these core capabilities, we aim to drive innovation that benefits the entire global PostgreSQL ecosystem and its diverse user base.

Our investments in PostgreSQL logical replication aim to unlock critical capabilities for all users. By enhancing conflict detection, we are paving the way for robust active-active replication setups, increasing write scalability and high availability. We are also focused on expanding logical replication to cover missing objects. This is key to enabling major version upgrades with minimal downtime, offering a more flexible alternative to pg_upgrade. Furthermore, our ongoing contributions to bug fixes are dedicated to improving the overall stability and resilience of PostgreSQL for everyone in the community.

Technical contributions: July 2025 – December 2025

The following sections detail technical enhancements and bug fixes contributed to the PostgreSQL open source project between July 2025 and December 2025. Primary engineering efforts were dedicated to advancing logical replication toward active-active capabilities, implementing missing features, optimizing pg_upgrade, and fixing bugs.

Logical Replication Enhancements

Logical replication is a critical feature of PostgreSQL enabling capabilities like near zero down time, major version upgrades, selective replication, active-active replication. We have been working towards closing some of the key gaps.

Automatic Conflict Detection

Active-active replication is a mechanism for increasing PostgreSQL write scalability. One of the most significant hurdles for active-active PostgreSQL setups is handling row-level conflicts when the same data is modified on two different nodes. Historically, these conflicts could stall replication, requiring manual intervention.

In this cycle, the community committed Automatic Conflict Detection which is the first phase of Automatic Conflict Detection and Resolution. This foundation allows the replication worker to automatically detect when an incoming change (Insert, Update, or Delete) conflicts with the local state.

Contributors: Dilip Kumar helped by performing code and design reviews. He is currently advancing the project's second phase, focusing on implementing conflict logging into a dedicated log table.

Logical replication of sequences

Until recently, logical replication in PostgreSQL was primarily limited to table data. Sequences did not synchronize automatically. This meant that during a migration or a major version upgrade, DBAs had to manually sync sequence values to prevent "duplicate key" errors on the new primary node. Since many databases rely on sequences, this was a significant hurdle for logical replication.

Contributors: Dilip Kumar helped by performing code and design reviews.

Drop subscription deadlock

The DROP SUBSCRIPTION command previously held an exclusive lock while connecting to the publisher to delete a replication slot.

If the publisher was a new database on the same server, the connection process would stall while trying to access that same locked catalog.

This conflict created a "self-deadlock," where the command was essentially waiting for itself to finish.

Contributors: Dilip Kumar analyzed and authored the fix.

Upgrade Resilience

Operational ease of use and friction-less upgrades are important to PostgreSQL users. We have been working on improving the upgrade experience.

pg_upgrade optimization for Large Objects

For databases with massive volumes of Large Objects, upgrades could previously span several days. This bottleneck is resolved by exporting the underlying data table directly rather than executing individual Large Object commands, resulting in an upgrade process that is several orders of magnitude faster.

Contributors: Hannu Krosing, Nitin Motiani and, Saurabh Uttam, highlighted the severity of the issue, proposed the initial fix and actively drove it to the resolution.

Prevent logical slot invalidation during upgrade:

Upgrade to PG17 fails if max_slot_wal_keep_size is not set to -1. This fix improves pg_upgrade's resilience, eliminating the need for users to manually set max_slot_wal_keep_size to -1. The server now automatically retains the necessary WAL data for upgrading logical replication slots, simplifying the upgrade process and reducing the risk of errors.

Contributors: Dilip Kumar analyzed and authored the fix.

pg_upgrade NOT NULL constraint related bug fix

A bug in pg_dump previously failed to preserve non-inherited NOT NULL constraints on inherited columns during upgrades from version 17 or older.

The fix updates the underlying query to ensure these specific schema constraints are correctly identified and migrated during the pg_upgrade process.

Contributors: Dilip Kumar analyzed and authored the fix.

Miscellaneous Bug Fixes

We continue to contribute bug fixes to help improve the stability and quality of PostgreSQL.

Make pgstattuple more robust about empty or invalid index pages

pgstattuple is a PostgreSQL extension for analyzing the physical storage of tables and indexes at the row (tuple) level, to determine whether a table is in need of maintenance. However, pgstattuple would raise errors with empty or invalid index pages in hash and gist code. This bug handles the empty and invalid index pages to make pgstattuple more robust.

Contributors: Nitin Motiani and Dilip Kumar, participated as author and reviewer.

Loading extension from different path

A bug incorrectly stripped the prefix from nested module paths when dynamically loading shared library files. This caused libraries in subdirectories to fail to load. The bug fix ensures the prefix is only removed for simple filenames, allowing the dynamic library expander to correctly find nested paths

Contributors: Dilip Kumar, reported and co-authored the fix for this bug.

WAL flush logic hardening

XLogFlush() and XLogNeedsFlush() are internal PostgreSQL functions that ensure log records are written to the WAL to ensure durability. In certain edge cases, like the end-of-recovery checkpoint, the functions relied on inconsistent criteria to decide which code path to follow. This inconsistency posed a risk for upcoming features i.e. Asynchronous I/O for writes that require XLogNeedsFlush() to work reliably.

Contributors: Dilip Kumar, co-authored the fix for this bug.

Major Features in Development

Beyond our recent commits, the team is actively working on several high-impact proposals to further strengthen the PostgreSQL ecosystem.

Conflict Log Table for Detection: Dilip Kumar is developing a proposal for a conflict log table designed to offer a queryable, structured record of all logical replication conflicts. This feature would include a configuration option to determine whether conflict details are recorded in the history table, server logs, or both.

Dumping tables data in multiple chunks in pg_dump: Hannu Krosing is working on this feature, this enables parallel workers for single, large tables (terabytes in size) to saturate hardware limits and speed up exports.

Adding pg_dump flag for parallel export to pipes: Nitin Motiani is working on this feature. This introduces a flag which allows the user to provide pipe commands while doing parallel export/import from pg_dump/pg_restore (in directory format).

Leadership

Beyond code, our team supports the ecosystem through community leadership. We are pleased to share that Dilip Kumar has been selected for the PGConf.dev 2026 Program Committee to help shape the project's premier developer conference.

Community Roadmap: Your Feedback Matters

We encourage you to utilize the comments area to propose new capabilities or refinements you wish to see in future iterations, and to identify key areas where the PostgreSQL open-source community should focus its investments.

Acknowledgement

We want to thank our open source contributors for their dedication to improving the upstream project.

Dilip Kumar: PostgreSQL significant contributor

Hannu Krosing: PostgreSQL significant contributor

Nitin Motiani: Contributing features and bug fixes

Saurabh Uttam: Contributing bug fixes

We also extend our sincere gratitude to the wider PostgreSQL open source members, especially the committers and reviewers, for their guidance, reviews, and for collaborating with us to make PostgreSQL the most advanced open source database in the world.

Advanced TPU optimization with XProf: Continuous profiling, utilization insights, and LLO bundles

Monday, March 23, 2026

by Yogesh SY, AI Infra @ Google

Advanced TPU optimization with XProf: Continuous profiling, utilization insights, and LLO bundles

In our previous post, we introduced the updated XProf and the Cloud Diagnostics XProf library, which are designed to help developers identify model bottlenecks and optimize memory usage. As machine learning workloads on TPUs continue to grow in complexity—spanning both massive training runs and large-scale inference—developers require even deeper visibility into how their code interacts with the underlying hardware.
Today, we are exploring three advanced capabilities designed to provide "flight recorder" visibility and instruction-level insights: Continuous Profiling Snapshots, the Utilization Viewer, and LLO Bundle Visualization.

Continuous Profiling Snapshots: The "Flight Recorder" for ML

Standard profiling often relies on "sampling mode," where users manually trigger high-fidelity traces for short, predefined durations. While effective for general optimization, this traditional approach can miss transient anomalies, intermittent stragglers, or unexpected performance regressions that occur during long-running training jobs.
To address this visibility gap, XProf is introducing Continuous Profiling Snapshots. This feature functions as an "always-on" flight recorder for your TPU workloads.
How it works: Continuous profiling snapshots (Google Colab) operates quietly in the background with minimal system overhead (approximately 7µs per packet CPU overhead). It utilizes a host-side circular buffer of roughly 2GB to seamlessly retain the last ~90 seconds of performance data. This architecture allows developers to snapshot performance data programmatically precisely when an anomaly occurs, bypassing the overhead and unpredictability of traditional one-shot profiling.

A diagram illustrating the limitation of traditional trace capturing, where a transient performance anomaly is missed because the trace capture was manually triggered before or after the anomaly occurred. — Figure 1: Traditional trace capturing without Continuous Profiling Snapshots.

A diagram showing how Continuous Profiling Snapshots capture comprehensive context. A performance anomaly occurs, and the 'always-on' circular buffer allows the user to snapshot the performance data, capturing the anomaly and the preceding 90 seconds of context. — Figure 2: Comprehensive context captured via Continuous Profiling Snapshots.

Key technical features include:

Circular Buffer Management: Continuously holds recent trace data to ensure you can capture the exact moments leading up to an anomaly or regression.
Out-of-band State Tracking: A lightweight service polls hardware registers for P-state (voltage and frequency) and trace-drop counters, ensuring the snapshot contains the necessary environmental context for accurate analysis.
Context Reconstruction: The system safely decouples state capture from the trace stream. This ensures that any arbitrary snapshot retains the ground truth required for precise, actionable debugging.

Visualizing Hardware Efficiency with the Utilization Viewer

Raw performance counters are powerful, but interpreting thousands of raw hardware metrics can be a daunting, time-consuming process. The new Utilization Viewer bridges the gap between raw data streams and actionable optimization strategies.
This tool translates raw performance counter values into easily understandable utilization percentages for specific hardware components, such as the TensorCore (TC), SparseCore (SC), and High Bandwidth Memory (HBM).

A screenshot or visualization of raw performance counter data, presented as a long, detailed list of thousands of uninterpreted hardware metrics and event counts. — **Figure: Raw Performance Counter**
Figure 3: Deriving actionable insights from raw performance counters.

From Counters to Insights: Instead of requiring developers to manually analyze a raw list of event counts, the Utilization Viewer automatically derives high-level metrics. For example, it can translate raw bus activity into a clear utilization percentage (e.g., displaying an average MXU bus utilization of 7.3%). This immediate clarity allows you to determine at a glance whether your model is compute-bound or memory-bound.

A visualization from the Utilization Viewer showing automatically derived high-level metrics, displaying clear utilization percentages for key hardware components like TensorCore (TC), SparseCore (SC), and High Bandwidth Memory (HBM), to help determine if a model is compute-bound or memory-bound. — Figure 4: Perf Counters Visualization in Utilization Viewer

Inspecting the Metal: Low-Level Operations (LLO) Bundles

For advanced users and kernel developers utilizing Pallas, we are now exposing Low-Level Operations (LLO) bundle data. LLO bundles represent the specific machine instructions issued to the TPU's functional units during every clock cycle.

This feature is critical for "Instruction Scheduling" verification—ensuring that the compiler is honoring your programming intentions and correctly re-ordering instructions to maximize hardware performance.

New Visualizations via Trace View Integration: You can now visualize LLO bundles directly within the trace viewer. Through dynamic instrumentation, XProf inserts traces exactly when a bundle executes. This provides exact execution times and block utilization metrics, rather than relying on static compiler estimates.

Why it matters: Accessing this level of granularity enables hyper-specific bottleneck analysis. For instance, developers can now identify idle cycles within the Matrix Multiplication Unit (MXU) pipeline, making it easier to spot and resolve latency between vmatmul and vpop instructions.

Conclusion

Whether you are trying to capture a fleeting performance regression with Continuous Profiling, verifying kernel efficiency with LLO Bundles, or assessing overall hardware saturation with the Utilization Viewer, these new features bring internal-grade Google tooling directly to the open-source community. These tools are engineered to provide the absolute transparency required to optimize high-scale ML workloads.

Get started by checking out the updated resources:

XProf GitHub Repository: https://github.com/openxla/XProf
Official Documentation: https://openxla.org/xprof

Open Source, Open Doors, Apply Now for Google Summer of Code!

Monday, March 16, 2026

by Stephanie Taylor, Mary Radomile & Lucila Ortíz, GSoC Program Admins

Join Google Summer of Code (GSoC) and start contributing to the world of open source development! Applications for GSoC are open from now - March 31, 2026 at 18:00 UTC.

Google Summer of Code is celebrating its 22nd year in 2026! GSoC started back in 2005 and has brought over 22,000 new contributors from 123 countries into the open source community. This is an exciting opportunity for students and beginners to open source (18+) to gain real-world experience during the summer. You will spend 12+ weeks coding, learning about open source development, and earn a stipend under the guidance of experienced mentors.

Apply and get started!

First things first. Read the Contributor Guide and Advice for people applying for GSoC for application basics.
Elevate your proposal! Review the Writing a proposal doc written by former contributors and the Guidance for GSoC Contributors using AI tooling in GSoC 2026.
Read the Program Rules, FAQ, and join us in our Discord Channel to connect with the GSoC community.
Explore the 184 mentoring organizations, find a couple that align with your interests/skills and reach out immediately by using their preferred contact methods listed on the GSoC site. Do not email mentors directly unless explicitly told to do so in their instructions.
Watch our Intro to GSoC video, as well as the GSoC Org Highlight videos and Community Talks Series to get inspired about projects that contributors have worked on in the past.

Please remember that mentors are volunteers and they are being inundated with hundreds of requests from interested participants. It may take time for them to respond to you. Follow their Contributor Guidance instructions exactly. Do not just start submitting PRs without reading their guidance section first.

Complete your registration and submit your project proposals on the GSoC site before the deadline on Tuesday, March 31, 2026 at 18:00 UTC.

We wish all our applicants the best of luck!

OpenTitan shipping in production

Wednesday, March 4, 2026

by Cyrus Stoller & Miguel Osorio, OpenTitan

Last year, we shared the exciting news that fabrication of production OpenTitan silicon had begun. Today, we're proud to announce that OpenTitan® is now shipping in commercially available Chromebooks.

The first OpenTitan part is being produced by Nuvoton, a leader in silicon security.

a close up of a blue circuit board focused on an IC

What is OpenTitan?

Over the past seven years, Google has worked with the open source communities to build OpenTitan, the first open source silicon Root of Trust (RoT). The RoT is the foundation upon which all other security properties of a device are derived, and anchoring this in silicon provides the strongest possible security guarantees that the code being executed is authorized and verified.

The OpenTitan project and its community are actively supported and maintained by lowRISC C.I.C., an independent non-profit.

OpenTitan provides the community with a high-quality, low-cost, commoditized hardware RoT that can be used across the Google ecosystem and also facilitates the broader adoption of Google-endorsed security features across the industry. Because OpenTitan is open source, you can choose to buy it from a commercial partner or manufacture it yourself based on your use case. In any of these scenarios, you can review and test OpenTitan's capabilities with a degree of transparency never afforded before in security silicon. This allows optimization for the use case at hand, whether it is having multiple reliable suppliers or ensuring the complete end-to-end control of the manufacturing process.

With OpenTitan, we are pushing the boundaries of what can be expected from a silicon RoT. For example, OpenTitan is the first commercially available open source RoT to support post-quantum cryptography (PQC) secure boot based on SLH-DSA. This helps future proof the security posture of these devices against potential adversaries with the capability to break classical public-key cryptography (e.g., RSA) via quantum computing. In addition, by applying commercial-grade design verification (DV) and top-level testing to an open source design, we have pushed for the highest quality while still allowing these chips to be transparent and independently verifiable. An added advantage of this approach is that we expect the high quality IP developed for OpenTitan to be re-usable in other projects going forward.

In addition to delivering this first instance of OpenTitan silicon as a product, we are proud of the processes that we have collaboratively developed along the way. In particular, both individual IP blocks and the top-level Earl Grey design have functional and code coverage above 90%—to the highest industry standards—with 40k+ tests running nightly. Regressions are caught and resolved quickly, ensuring design quality is maintained over the long term. Ownership transfer gives confidence that the silicon is working for you and helps to move away from co-signing so that you are in full control of your own update schedule. And since any IP is of little value without the ability to navigate and deploy it, we've prioritized thorough and accurate documentation, together with onboarding materials to streamline welcoming new developers to the project.

With lowRISC CIC and in collaboration with our OpenTitan partners, we pioneered open source security silicon development. While challenges are expected when doing something for the first time, the benefits of working in the open source have been clear: fast and efficient cross-organizational collaboration, retention of expertise regardless of employer, shared maintenance burdens, and high levels of academic research engagement.

What's next?

Firstly, bringup to deploy OpenTitan in Google's datacenters is underway and expected to land later this year.

Secondly, while we're thrilled about the advantages that this first generation OpenTitan part brings to Google's security posture, we have more on our roadmap, and have already begun work on a second generation part that will support lattice-based PQC (e.g., ML-DSA and ML-KEM) for secure boot and attestation. Stay tuned – more info on this coming soon!

Thirdly, OpenTitan started with the security use case because it is the hardest to get right. Having successfully demonstrated that we are able to deliver secure open silicon, we're confident that the same methodology can be used to develop additional open source designs targeting a wide range of use cases (whether the focus is on security, safety, or elsewhere). We're excited to see re-use of IP that was developed for OpenTitan being adapted for Caliptra, a RoT block that can be integrated into datacenter-class SoCs.

Getting Involved

OpenTitan shipping in production is a defining milestone for us and all contributors to the project. We're excited to see more open source silicon developed for commercial use cases in the future, and to see this ecosystem grow with lowRISC's introduction of new membership tiers.

As the following metrics show (baselined from the project's public launch in 2019), the OpenTitan community is rapidly growing:

Over ten times the number of commits at launch: from 2,500 to over 29,200.
275+ contributors to the code base
3.2k Github stars

If you are interested in learning more, contributing to OpenTitan, or using OpenTitan IP in one of your projects, visit the open source GitHub repository or reach out to the OpenTitan team.

Announcing CEL-expr-python: the Common Expression Language in Python, now open source

Tuesday, March 3, 2026

by Olena Huang, CEL (Common Expression Language) team

We're excited to announce the open source release of CEL-expr-python, a Python implementation of the Common Expression Language (CEL)! CEL (cel.dev) is a powerful, non-Turing complete expression language designed for simplicity, speed, safety, and portability. CEL is designed to be embedded in an application, and you can use CEL to make decisions, validate data, or apply rules based on the information your application has.

What is CEL-expr-python?

CEL-expr-python provides a native Python API for compiling and evaluating CEL expressions that's maintained by the CEL team. We'd like to acknowledge the fantastic work already done by the open source communities around support for CEL in Python, and look forward to your contributions to help us further enrich the CEL ecosystem.

The CEL team has chosen to develop CEL-expr-python by wrapping our official C++ implementation to ensure maximum consistency with CEL semantics while enabling Python users to extend and enrich the experience on top of this production-ready core in Python directly. Additionally, new features and optimizations implemented in CEL C++ will automatically and immediately become available in CEL-expr-python.

Who is it for?

If you're working on a Python project that needs to:

Evaluate expressions defined dynamically (e.g., loaded from a database, configuration, or user input).
Implement and enforce policies in a clear, concise, and secure manner.
Validate data against a set of rules.

...then CEL-expr-python is for you!

Why use CEL-expr-python?

CEL has become a prevalent technology for applications like policy enforcement, data validation, and dynamic configuration. CEL-expr-python allows Python developers to leverage the same benefits, including:

Safety: CEL expressions are side-effect free and terminate guaranteed.
Speed: Designed for efficient evaluation.
Portability: Expressions are language-agnostic.
Familiarity: Builds upon established CEL concepts.

With CEL-expr-python, you can now seamlessly integrate this technology within your Python stack.

Get Started!

Check out the CEL-expr-python Repository here: https://github.com/cel-expr/cel-python

We're thrilled to bring CEL-expr-python to the open source communities and can't wait to see what you build with it!

Here's a code snippet demonstrating how to initialize CEL-expr-python and evaluate an expression.

from cel_expr_python import cel

# Define variables
cel_env = cel.NewEnv(variables={"who": cel.Type.STRING})
expr = cel_env.compile("'Hello, ' + who + '!'")

# Evaluate and print the compiled expression
print(expr.eval(data={"who": "World"})))  // Hello, World!

For a more in-depth tutorial, check out our codelab here: https://github.com/cel-expr/cel-python/blob/main/codelab/index.lab.md

The CEL-expr-python repository will be initially available as read-only. We encourage you to try it out in your projects and share your experiences. Feel free to leave feedback in our github issue queue, as we are eager to hear your feedback and will be working promptly to address any issues or suggestions.

While we are not accepting external contributions at this moment, we are committed to building a community around CEL-expr-python and plan to open up for contributions in the future. Stay tuned for updates.

This Week in Open Source #15

Friday, February 20, 2026

by Daryl Ducharme, Google Open Source

This Week in Open Source for February 20th, 2026

A look around the world of open source

We're preparing for a busy conference season, with events like SCALE 23x and KubeCon + CloudNativeCon Europe on the horizon. A core part of our mission is "learning and sharing what we learn" so that our communities can continue to thrive together. Conferences are a great place to fulfill that mission.

This week, we're highlighting a few "Open Source Reads" that tackle some of the biggest questions facing our ecosystem today—from the complex ethics of AI-generated content to the global impact of open AI models. We hope these links provide valuable context as we work together to sustain the critical infrastructure we all rely on.

Upcoming Events

February 24 - 25: The Linux Foundation Member Summit is happening in Napa, California. It is the annual gathering for Linux Foundation members that fosters collaboration, innovation, and partnerships among the leading projects and organizations working to drive digital transformation with open source technologies.
March 5 - 8: SCALE 23x is happening in Pasadena, California. It is North America's largest community-run open source conference and includes four days of sessions, workshops, and community activities focused on open source, security, DevOps, cloud native, and more.
March 9 - 10: FOSSASIA Summit 2026 is happening in Bangkok, Thailand. It will be a two-day hybrid event that showcases the latest in open technologies, fostering collaboration across enterprises, developers, educators, and communities.
March 16 - 17: FOSS Backstage is happening in Berlin, Germany. This conference brings together the brightest minds in the industry to discuss and explore all about FOSS community, management and compliance.
March 22: Maintainer Summit EU is happening just before CloudNativeCon in Amsterdam, The Netherlands. This is an exclusive event for the people behind our projects to gather face-to-face, collaborate, and celebrate cloud first projects.
March 23 - 26: Kubecon + CloudNativeCon Europe is happening in Amsterdam, The Netherlands. This is the flagship conference for the Cloud Native Computing Foundation (CNCF) and brings together adopters and technologists from leading open source and cloud first communities.
March 26 - 29: ATmosphereConf is happening in Vancouver, British Columbia. This conference from the AT Protocol Community took a 2 day conference then booked the venue for the two days prior (March 26th & 27th) with smaller theaters and break out rooms for everything from extended events to developer training to building together.
April 7 - 8: PyTorch Conference EU is happening in Paris, France. Hosted by the PyTorch Foundation, this conference gathers top-tier AI pioneers, researchers, and developers to explore the future of AI.

Open Source Reads and Links

[Blog] An AI Agent wrote a hit piece on me - An AI agent wrote a harmful article about a maintainer after he rejected its code for a popular Python library. This shows a new risk where AI can attack people to get what it wants. We must be careful as AI misbehavior can hurt reputations and trust in software. This was followed by a part 2 after more things happened.
[Blog] Stop closing the door; fix the house - A different take on the crossover between AI and open source. Instead of closing contributions due to poor AI generated code, maintainers should guide contributors and AI tools with clear rules and automation. This helps keep quality high and keeps the community open.
[Article] Everyone uses open source, but patching moves too slowly - "Maintenance is the highest form of creation." Open source requires maintenance, especially when 60% of security incidents hit unpatched code. How can we work together to keep our communities healthy and secure?
[Paper] AI-powered open-source infrastructure for accelerating materials discovery and advanced manufacturing - Gen AI isn't the only type of AI in the game. This paper explains how AI and open-source tools help speed up the discovery of new materials. Through using data, simulations, and machine learning together we can build efficient and sustainable platforms.
[Blog] AI is destroying open source and it's not even good yet - Here's another post about how maintainers face more work and frustration because AI often makes mistakes and doesn't help fix problems. This growing issue could get worse as AI becomes more widely used without careful oversight. So, what types of conversations should we be having to create that oversight?
[Article] What's next for Chinese open source AI? - Chinese companies are creating and sharing powerful open AI models that anyone can use and modify. Because these models are cheaper and widely adopted globally, they challenge the western AI models. This open approach is changing how AI innovation happens and who controls its future.

Which of these stories will you be chatting about at your next meetup or conference? Let us know! Share with us on our @GoogleOSS X account or our @opensource.google Bluesky account.

Introducing the 185 Organizations for GSoC 2026

Thursday, February 19, 2026

by Stephanie Taylor, Mary Radomile & Lucila Ortíz, Google Open Source

The complete list of Google Summer of Code (GSoC) Mentoring Organizations is now available! 2026 brings us 185 open source communities who are eager to mentor a new group of open source contributors. Now is the time for prospective contributors to start looking for a community to participate with. Visit the full list of 2026 organizations to learn about each community, their project ideas, and read the specific contributor guidance to apply.

Who can apply as a GSoC contributor?
If you are 18 or older and a student or just starting out in open source (less than 2 years of open source experience), GSoC is for you!

Why participate in Google Summer of Code?
GSoC offers a unique opportunity to gain real-world experience and build new skills through open source contributions while being mentored by experienced maintainers and developers.

The application period starts on March 16th at 1800 UTC. Here are some things you can do to get started:

Visit our GSoC website and read the FAQ, the Contributor Guide, Advice for people applying for GSoC, Program Rules and the videos for Potential GSoC Contributors to learn the basics about GSoC.
Browse the Organization pages—use filters (languages, categories) to narrow down your choices.
Look at the Project Ideas for the Orgs you like. Pick one that excites you and reach out to them ASAP to talk about it.
Heads up! Each Org has its own application steps and maybe some required pre-tasks. Check their Contributor guidance link in their profile. Chatting and contributing early is HUGE for getting accepted!
Write a proposal based on the organization guidelines, also remember that some orgs do not allow the use of AI, be aware of their guidelines.
We strongly recommend submitting your proposal on the GSoC site at least 3 days before the hard deadline March 31, 1800 UTC.

Mark your calendars with the upcoming important GSoC 2026 dates!

Contributor application period: March 16 - 31 1800 UTC
GSoC 2026 Accepted Contributors announced: April 30 1800 UTC
Coding Starts: May 25

Thank you for being part of this wonderful community and we wish the best of luck to all the 2026 applicants!

The End of an Era: Transitioning Away from Ingress NGINX

Thursday, February 12, 2026

by Kaslin Fields, Cloud DevRel & Rob Scott, GCP Networking

For many of us, the first time we successfully routed traffic into a Kubernetes cluster, we did it using Ingress NGINX. It was the project that turned a complex networking API into something we could actually use.

However, the Kubernetes community recently announced that Ingress NGINX is officially entering retirement. Maintenance will cease in March 2026.

Here is what you need to know about why this is happening, what comes next, and why this "forced" migration is actually a great opportunity for your infrastructure.

Clarifying Terminology

First, there are some confusing, overlapping terms here. Let's clarify.

Ingress API - Kubernetes introduced the Ingress API as a Generally Available (GA) feature in 2020 with the release of Kubernetes version 1.19. This API is still available in Kubernetes with no immediate plans for deprecation or removal. However, it is "feature-frozen" meaning it is no longer being actively worked on or updated. The community has instead moved to Gateway API, which we'll talk more about later in this post.
Ingress NGINX - "Ingress" is an API object available by default in Kubernetes, as described above. You can define your Ingress needs as an Ingress resource. But that resource won't actually do anything without a controller. Ingress NGINX is a very popular controller that uses NGINX as a reverse proxy and load balancer. This will no longer be maintained as of March 2026.

As it says in the What You Need To Know blog from the Kubernetes project, "Existing deployments of Ingress NGINX will continue to function and installation artifacts will remain available." However "there will be no further releases, no bugfixes, and no updates to resolve any security vulnerabilities that may be discovered."

NGINX Ingress Controller - to make things more confusing, there is another controller called "NGINX Ingress." This controller to implement ingress for your Kubernetes resources via NGINX and NGINX Plus, owned and maintained by F5 / NGINX Inc. This will continue to be maintained and available in both its Open Source and Commercial forms.

In this blog post,we are going to talk about "Ingress NGINX," the controller being deprecated. We will also talk about "Ingress" or the "Ingress API", which is still around, but feature-frozen.

What Problem Did Ingress NGINX Solve?

In the early days of Kubernetes, getting external traffic to your pods was a nightmare. You either had to use expensive, cloud-specific LoadBalancers for every single service or manage complex NodePorts.

While the Kubernetes Ingress API was introduced as a standard specification for Layer 7 routing (HTTP/HTTPS), it was inherently limited, designed for a simpler time in Kubernetes' history, and offered minimal features. Features like advanced routing, traffic splitting, and non-HTTP protocols were not natively supported by the API

Ingress NGINX solved this problem by serving as a robust Ingress controller that executed the API's rules. Leveraging the widely adopted NGINX reverse proxy, the controller provided a powerful, provider-agnostic entry point for cluster traffic. It was able to:

Consolidate multiple services under a single IP address.
Provide robust Layer 7 capabilities, including SSL/TLS termination and basic load balancing.
Use familiar NGINX configuration logic inside a cloud-native environment.
Extend the basic Ingress API to support advanced features, such as rate limiting, custom headers, and sophisticated traffic management, by allowing users to inject familiar, raw NGINX configuration logic using custom nginx.ingress.kubernetes.io annotations (often called "snippets").

This flexibility, achieved by translating standard Ingress objects into feature-rich NGINX configurations, made Ingress NGINX the de-facto controller and the "Swiss Army Knife" of Kubernetes networking.

Why is it Retiring?

If it's so popular, why kill it? The very flexibility that made it so popular also (at least partially) led to its demise. The announcement points to two primary "silent killers":

The "Snippet" Security Debt: Ingress NGINX gained popularity through its flexibility, specifically "snippets" that let users inject raw NGINX config via annotations. Today, these are viewed as major security risks, as they can allow for configuration injection attacks. Fixing this architectural "feature" has become an insurmountable task.
The Maintainership Gap: Despite having millions of users, the project was sustained by only one or two people working in their spare time. In an industry where security vulnerabilities move fast, "best-effort" maintenance isn't enough to protect the global ecosystem.

Time for Gateway API

The removal of the popular NGINX ingress implementation opens up an opportunity to transition to the Gateway API. While the Ingress API in Kubernetes is not going anywhere (just the NGINX variant of it), development on it is frozen, and there are reasons for that.

Think of Gateway API as "Ingress 2.0." While the Ingress API is a single, limited resource, Gateway API is role-oriented. It separates the concerns of the Infrastructure Provider (who sets up the LB), the Cluster Operator (who defines policies), and the Application Developer (who routes the traffic).

For the Kubernetes Podcast from Google, we've interviewed Kubernetes maintainers working on Gateway API (Like in this episode featuring Lior Lieberman), and they tell a great story about why it was developed. In the early days of Kubernetes, the maintainers & contributors weren't sure exactly what users would need with regard to ingress management for workloads running on Kubernetes. The early Kubernetes Ingress object was an attempt to address the problems the maintainers thought users would need to solve, and they didn't get it all right. The annotations Ingress-NGINX supported on top of the Ingress API helped cover the many gaps in the Kubernetes API, but the annotations tied you to Ingress-NGINX. Those gaps have now been largely closed by Gateway API, and the API is supported by many conformant implementations, so you can have confidence in the portability of the API.

An important feature of Gateway API's design is that it is an API standard defined by the community, but implemented by your infrastructure or networking solution provider. Networking ultimately boils down to cables transmitting electrical signals between machines. What kind of machines and how they're connected has a big impact on the types of ingress capabilities available to you- or at least in how they're actually implemented. Gateway API provides a standard set of capabilities that you can access in a standardized way, while allowing for the reality of different networking implementations across providers. It's meant to help you get the most out of your infrastructure- regardless of what that infrastructure actually is.

How Gateway API solves the old problems with Ingress NGINX:

Security by Design: No more "configuration snippets." Features are built into the API natively, reducing the risk of accidental misconfiguration.
Standardization: Unlike the old Ingress API, which required custom annotations for almost everything (like traffic splitting), Gateway API builds these features into the spec, offering greater portability.
Extensibility: It is designed to handle more than just HTTP—it brings the same power to TCP, UDP, and gRPC.

The Challenges of Transitioning

Migration is rarely "click and play." Users moving away from Ingress NGINX should prepare for:

Annotation Mapping: Most of your nginx.ingress.kubernetes.io annotations won't work on new controllers. You'll need to map these to the new Gateway API "HTTPRoute" logic.
Learning Curve: Gateway API has more "objects" to manage (Gateways, GatewayClasses, Routes). It takes a moment to wrap your head around the hierarchy, but it was implemented that way based on experience - these objects should help you manage your workloads' ingress needs more efficiently.
Feature Parity: If you rely on very specific, obscure NGINX modules, you'll need to verify that your new controller (be it Envoy-based like Emissary or Cilium, or a different NGINX-based provider) supports them.

Why It's Worth It

The retirement of Ingress NGINX is not just a chore; it is a forcing function for adopting more sustainable architecture. By migrating to Gateway API, you gain:

Stability and Active Development: Gateway API is a General Availability (GA) networking standard that has maintained a "standard channel" without a single breaking change or API version deprecation for over two years. Unlike many Ingress controllers where development has largely paused, most Gateway controllers are far more actively maintained and continue to add new features like CORS and timeouts.
Portability: Choosing a different Ingress controller might seem easier, but if you rely on Ingress-NGINX annotations, you will likely have to migrate to another set of implementation-specific annotations. Gateway API provides more portable features directly in the core API and ensures a consistent experience across different implementations. When you select an implementation that is conformant with the latest v1.4 release, you can be confident that the behavior of these features will be consistent.
Future-Proof Extensibility: While Gateway API supports many more features than the core Ingress API, if you find a needed feature missing, an implementation is likely to provide a similar or equivalent feature as an implementation-specific extension. For example, GKE Gateway and Envoy Gateway extend the API with their own custom policies.

Next Steps

Start your migration planning today to capitalize on the opportunity and meet the deadline.

Audit Your Usage: Run kubectl get pods --all-namespaces -l app.kubernetes.io/name=ingress-nginx to see where you are still using the legacy controller.
Utilize Automation: Check out the ingress2gateway project. A lot of work is going into this tool to make the migration experience better, including adding support for the most widely used Ingress-NGINX annotations.
Experiment and Provide Feedback: Give Gateway API a try! Start a PoC with a conformant Gateway API implementation (like GKE Gateway, Cilium, or Envoy Gateway). The community welcomes help and feedback on ingress2gateway and encourages users to share feedback on what Gateway API is getting right and wrong.
Adhere to the Timeline: You have until March 2026 before the security updates stop. Start your migration planning sooner rather than later!

For more details on migrating from Ingress to Gateway API refer to our documentation.

This Week in Open Source #14

Friday, February 6, 2026

by Daryl Ducharme & amanda casari, Google Open Source

This Week in Open Source for February 06, 2026

A look around the world of open source

Here we are at the beginning of February, and the world of open source is navigating a fascinating landscape of innovation and challenge. The main focus of many articles this week is on the evolving relationship between AI and software maintenance. But open source is about more than just the code; it's about the people and the spirit of collaboration. With that we look at the Open Gaming Collective which is pushing Linux gaming further and the SLSA framework and how it is foundational in software security.

Dive in to see what's happening this week in open source!

Upcoming Events

February 24 - 25: The Linux Foundation Member Summit is happening in Napa, California. It is the annual gathering for Linux Foundation members that fosters collaboration, innovation, and partnerships among the leading projects and organizations working to drive digital transformation with open source technologies.
March 5 - 8: SCALE 23x is happening in Pasadena, California. It is North America's largest community-run open source conference and includes four days of sessions, workshops, and community activities focused on open source, security, DevOps, cloud native, and more.
March 9 - 10: FOSSASIA Summit 2026 is happening in Bangkok, Thailand. It will be a two-day hybrid event that showcases the latest in open technologies, fostering collaboration across enterprises, developers, educators, and communities.

Open Source Reads and Links

[Article] Curl shutters bug bounty program to remove incentive for submitting AI slop - The maintainer of popular open-source data transfer tool cURL has ended the project's bug bounty program after maintainers struggled to assess a flood of AI-generated contributions.
[Article] Vibe Coding Is Killing Open Source Software, Researchers Argue - So much open source software is utilized when people vibe code with LLMs. However, vibe coders don't give back, according to research. What can be done to make vibe coders understand the importance of the open source ecosystem and giving back?
[Blog] AI Slopageddon and the OSS Maintainers - AI-generated low-quality code, called "AI slop," is overwhelming open source maintainers and harming collaboration. Some projects have banned AI contributions, while others require disclosure and careful review to manage the problem. How can we make changes when platforms benefit from AI tools but often ignore the burden this puts on maintainers?
[Paper] Will It Survive? Deciphering the Fate of AI-Generated Code in Open Source - AI-generated code lasts longer in open-source projects than human-written code. It is changed less often but has more bug fixes and security updates. Predicting when AI code will be modified is hard because many outside factors affect it.
[Article] Open Gaming Collective (OGC) formed to push Linux gaming even further - On the fun side of open source the Open Gaming Collective is a new group uniting many Linux gaming projects to work together. They will share important tools and kernel patches to make Linux gaming better and less fragmented. Bazzite and other members will use OGC's shared improvements for better hardware support and gaming experience.
[Blog] Supply Chain Robots, Electric Sheep, and SLSA - Securing the software supply chain is crucial to protect against attacks that can compromise code and build systems. SLSA is a practical framework that helps organizations improve supply chain security step-by-step by verifying source code and build integrity. A good read to understand this aspect of software security.

As we like to say, "a community is a garden, not a building; it requires tending, not just construction".

How is your team tending to your open source "garden" this month? We'd love to hear your stories! Share them on our @GoogleOSS X account or our @opensource.google Bluesky account.

ZetaSQL is being renamed to GoogleSQL

Tuesday, February 3, 2026

by Olena Huang, GoogleSQL team

AI Generated image of the word ZetaSQL followed by a double arrow then the word GoogleSQL.

We're excited to announce a small but significant change: the open-source project known as ZetaSQL has been officially renamed to GoogleSQL(https://github.com/google/googlesql). This move unifies the name of our powerful SQL dialect, analysis, and parsing libraries under a single, consistent banner, whether you're using it within Google's cloud and internal services or as part of the open-source community.

For years, GoogleSQL has been the standard SQL dialect across many Google services like BigQuery and Spanner. Originally, while we called the language component GoogleSQL internally, we weren't using that name to describe the dialect in our public-facing products. Since then, we've started using the GoogleSQL name in our public-facing products and documentation, to emphasize that it's the same shared dialect across products.

Now, we're renaming the open source package too, to emphasize that it supports the same SQL dialect used in BigQuery, Spanner, and other products. The goal of open sourcing our work was always to allow developers outside of Google to leverage the same robust and compliant SQL foundation. With the name change, we aim to reduce confusion and make it easier for everyone to find and discuss the same great technology. Whether you're an internal engineer, a Google Cloud customer, or an open-source developer, you're using GoogleSQL.

This is primarily a branding change. The technology, features, and the team behind it remain the same. The open-source repository will continue to thrive, now proudly bearing the GoogleSQL name. We believe this unification will strengthen the GoogleSQL ecosystem, making it more accessible and understandable for our growing community of users and contributors.

We're enthusiastic about this next chapter for GoogleSQL in the open-source world and look forward to continued collaboration and innovation with the community.

This Week in Open Source #13

Friday, January 23, 2026

by Daryl Ducharme, Google Open Source

This Week in Open Source for January 23, 2026

A look around the world of open source

Can you believe we're already wrapping up the first month of the year? January is coming to a close. The open source ecosystem is buzzing with activity, from the upcoming community gatherings at FOSDEM in Brussels to new conversations around AI standards and cloud flexibility.

Google Open Source believes that "a community is a garden, not a building". It requires constant tending to thrive. This week, we're looking at how we can all contribute to that growth—whether it's by securing the software supply chain, standardizing AI agents, or simply learning from the legends of our field like Linus Torvalds.

Dive in to see what's happening this week in open source!

Upcoming Events

January 29: CHAOSScon Europe 2026 is co-located with FOSDEM in Brussels, Belgium. This conference revolves around discussing open source project health, CHAOSS updates, use cases, and hands-on workshops for developers, community managers, project managers, and anyone interested in measuring open source project health. It also shares insights from the CHAOSS context working groups including OSPOs, University Open Source, and Open Source in Science and Research.
January 31 - February 1: FOSDEM 2026 is happening at the Université Libre de Bruxelles in Brussels, Belgium. It is a free event for software developers to meet, share ideas and collaborate. Every year, thousands of developers of free and open source software from all over the world gather at the event in Brussels.
February 24 - 25: The Linux Foundation Member Summit is happening in Napa, California. It is the annual gathering for Linux Foundation members that fosters collaboration, innovation, and partnerships among the leading projects and organizations working to drive digital transformation with open source technologies.
March 5 - 8: SCALE 23x is happening in Pasadena, California. It is North America's largest community-run open source conference and includes four days of sessions, workshops, and community activities focused on open source, security, DevOps, cloud native, and more.
March 9 - 10: FOSSASIA Summit 2026 is happening in Bangkok, Thailand. It will be a two-day hybrid event that showcases the latest in open technologies, fostering collaboration across enterprises, developers, educators, and communities.

Open Source Reads and Links

[Article] The state of trusted open source - This review of the state of trusted open source report goes over many statistics. One of the interesting ones is that vulnerabilities most often hide in the smaller dependencies of the larger projects we might be focused on. What does this mean for your approach to security? How should various open source communities deal with this?
[Blog] Software Heritage Archive recognized as a digital public good - As the Software Heritage Archive celebrates its 10th anniversary, the Archive has scaled to protect over 27 billion unique source files, even solving the "2PB problem" by deploying protocols that compressed 78TB of graph data into a 3TB research dataset. This ensures that humanity's executable history remains a global commons rather than a proprietary secret, aligning with our belief at Google that Code is for today, Open Source is forever.
[Blog] Agent Definition Language: The open standard AI agents have been missing - The Agent Definition Language (ADL) creates a clear, shared way to describe AI agents so they work well across different systems. This helps teams understand what agents do, how they behave, and how to govern them safely. As an open and standard, ADL makes AI agents easier to build, review, and share in the open-source community.
[Blog] AI Agent Engineering in Go with the Google ADK - AI, agents, and the related protocols touch on many open source projects. This post gives you a technical hands on with the Agent Starter Pack. By following it you'll learn how to build, test, and securely deploy a Go AI agent using Google Cloud services.
[Article] How Kubernetes Broke the AWS Cloud Monopoly - Before Kubernetes, companies felt locked into AWS because of its unique APIs. Kubernetes allowed apps to run on any cloud, giving users more choice and helping other cloud providers grow. This has made multi-cloud the way forward for many enterprises. Are you utilizing a multi-cloud strategy? Has Kubernetes helped you get there?
[Article] Even Linux Creator Linus Torvalds is Using AI to Code in 2026 - Opinions vary on where and whether AI is useful in various areas. One place that it has shown the greatest benefit is in as a tool for writing code. It seems Linus Torvalds has started to use it to assist with part of his AudioNoise side project. What a good way to find out how best AI can work for oneself. How have you been using AI with your code?

What exciting open source events and news are you hearing about? Let us know on our @GoogleOSS X account or our new @opensource.google Bluesky account.

A JSON schema package for Go

Wednesday, January 21, 2026

by Jonathan Amsterdam & Sam Thanawalla, The Go Team

JSON Schema is a specification for describing JSON values that has become a critical part of LLM infrastructure. We recently released github.com/google/jsonschema-go/jsonschema, a comprehensive JSON Schema package for Go. We use it in the official Go SDK for MCP and expect it to become the canonical JSON Schema package for Google's Go SDKs that work with LLMs.

JSON Schema has been around for many years. Why are we doing this now, and what do LLMs have to do with it?

JSON is a flexible way to describe values. A JSON value can be null, a string, a number, a boolean, a list of values, or a mapping from strings to values. In programming language terms, JSON is dynamically typed. For example, a JSON array can contain a mix of strings, numbers, or any other JSON value. That flexibility can be quite powerful, but sometimes it's useful to constrain it. Think of JSON Schema as a type system for JSON, although its expressiveness goes well beyond typical type systems. You can write a JSON schema that requires all array elements to be strings, as you could in a typical programming language type system, but you can also constrain the length of the array or insist that its first three elements are strings of length at least five while the remaining elements are numbers.

The ability to describe the shape of JSON values like that has always been useful, but it is vital when trying to coax JSON values out of LLMs, whose output is notoriously hard to constrain. JSON Schema provides an expressive and precise way to tell an LLM how its JSON output should look. That's particularly useful for generating inputs to tools, which are usually ordinary functions with precise requirements on their input. It also turns out to be useful to describe a tool's output to the LLM. So frameworks like MCP use JSON Schema to specify both the inputs to and outputs from tools. JSON Schema has become the lingua franca for defining structured interactions with LLMs.

Requirements for a JSON Schema package

Before writing our own package, we took a careful look at the existing JSON Schema packages; we didn't want to reinvent the wheel. But we couldn't find one that had all the features that we felt were important:

Schema creation: A clear, easy-to-use Go API to build schemas in code.
Serialization: A way to convert a schema to and from its JSON representation.
Validation: A way to check whether a given JSON value conforms to a schema.
Inference: A way to generate a JSON Schema from an existing Go type.

We looked at the following packages:

https://github.com/invopop/jsonschema provides inference, but not validation.
https://github.com/santhosh-tekuri/jsonschema does not provide inference.
https://github.com/xeipuuv/gojsonschema does not provide a way to construct a schema in code.
https://github.com/qri-io/jsonschema does not provide inference or a way to construct a schema in code.

It didn't seem feasible to cobble together what we needed from multiple packages, so we decided to write our own.

A Tour of jsonschema-go

A Simple, open Schema struct

At the core of the package is a straightforward Go struct that directly represents the JSON Schema specification. This open design means you can create complex schemas by writing a struct literal:

var schema = &jsonschema.Schema{
  Type:        "object",
  Description: "A simple person schema",
  Properties: map[string]*jsonschema.Schema{
    "name": {Type: "string"},
    "age": {Type: "integer", Minimum: jsonschema.Ptr(0.0)},
  },
  Required: []string{"name"},
}

A Schema will marshal to a valid JSON value representing the schema, and any JSON value representing a schema can be unmarshalled into a Schema.

The Schema struct defines fields for all standard JSON Schema keywords that are defined in popular specification drafts. To handle additional keywords not present in the specification, Schema includes an Extra field of type map[string]any.

Validation and resolution

Before using a schema to validate JSON values, the schema itself must be validated, and its references to other schemas must be followed so that those schemas can themselves be checked. We call this process resolution. Calling Resolve on a Schema returns a jsonschema.Resolved, an opaque representation of a valid schema optimized for validation. Resolved.Validate accepts almost any value that can be obtained from calling json.Umarshal: null, basic types like strings and numbers, []any, and map[string]any. It returns an error describing all the ways in which the value fails to satisfy the schema.

rs, err := schema.Resolve(nil)
if err != nil {
  return err
}
err = rs.Validate(map[string]any{"name": "John Doe", "age": 20})
if err != nil {
  fmt.Printf("validation failed: %v\n", err)
}

Originally, Validate accepted a Go struct. We removed that feature because it is not possible to validate some schemas against a struct. For example, If a struct field has a non-pointer type, there is no way to determine whether the corresponding key was present in the original JSON, so there is no way to enforce the required keyword.

Inference from Go types

While it's always possible to create a schema by constructing a Schema value, it's often convenient to create one from a Go value, typically a struct. This operation, which we call inference, is provided by the functions For and ForType. Here is For in action:

type Person struct {
    Name string `json:"name" jsonschema:"person's full name"`
    Age int `json:"age,omitzero"`
}

schema, err := jsonschema.For[Person](nil)

/* schema is:
{
    "type": "object",
    "required": ["name"],
    "properties": {
        "age":  {"type": "integer"},
        "name": {
            "type": "string",
            "description": "person's full name"
        }
    },
    "additionalProperties": false
}
*/

For gets information from struct field tags. As this example shows, it uses the name in the json tag as the property name, and interprets omitzero or omitempty to mean that a field is optional. It also looks for a jsonschema tag to get property descriptions. (We considered adding support for other keywords to the jsonschema tag as some other packages do, but that quickly gets complicated. We left an escape hatch in case we decide to support other keywords in the future.)

ForType works the same way, but takes a reflect.Type. It's useful when the type is known only at runtime.

A foundation for the Go community

By providing a high-quality JSON Schema package, we aim to strengthen the entire Go ecosystem for AI applications (and, indeed, any application that needs to validate JSON). This library is already a critical dependency for Google's own AI SDKs, and we're committed to its long-term health. We welcome external contributions, whether they are bug reports, bug fixes, performance enhancements, or support for additional JSON Schema drafts. Before beginning work, please file an issue on our issue tracker.

Mar	APR	May
	02
2025	2026	2027