Polaris Catalog Is Now Open Source
In June 2024, Snowflake announced Polaris Catalog to provide organizations and the Iceberg community new levels of choice, flexibility and control over their data. It enables more open, secure lakehouse architectures with broad read-and-write interoperability and cross-engine access controls. Apache Iceberg™ has greatly improved data mobility by establishing a vast community around an open standard, and the next logical step is an open, community-driven catalog to complement Iceberg. This opens the door for truly vendor-neutral interoperability that many organizations want.
As of today, Polaris Catalog is open source under the Apache 2.0 license and is now available on GitHub. Snowflake’s new service powered by Polaris Catalog is now available in public preview for Snowflake customers.
Interoperability through community
Just as large communities have grown in support of open source projects for open file and table formats, there is a community emerging to collaborate on standards for metadata catalogs. Diversity of ideas and community contributions creates the most interoperable catalog across the widest variety of tools.
Polaris Catalog implements Apache Iceberg’s REST catalog specification, which means it already enables interoperability with Apache Doris™, Apache Flink™, Apache Spark™, Daft, DuckDB, Presto, Snowflake, Starburst, Trino, Upsolver and more. In addition, Alation, ALTR, Atlan, Collibra, dbt Labs, data.world, Dremio, Confluent, Fivetran, Google Cloud, Immuta, Microsoft, Project Nessie, and Salesforce also intend to add integrations or make contributions to the Polaris Catalog open source project.
Contributing Project Nessie capabilities to Polaris Catalog
Project Nessie is an open source intelligent metastore and catalog for Apache Iceberg™ with Git-like semantics. Created by Dremio co-founders, it became an Apache-licensed project in 2020.
The team at Dremio is excited to help bring the various functions and capabilities of Nessie into the Polaris project. Contributing the capabilities of Project Nessie to Polaris Catalog will form an inclusive community dedicated to developing the most robust open source catalog for open lakehouse architectures. Innovating in one project reduces catalog sprawl and enables a broader group of contributors to drive rapid advancements. This partnership not only accelerates technical progress but also brings more contributors into the Nessie community, further strengthening the growing ecosystem around Polaris. To learn more about the Nessie ecosystem, read this.
“As co-founders of Apache Arrow™, creators of Project Nessie and significant contributors to Apache Iceberg™, openness is ingrained in Dremio’s culture. We are delighted to support the launch of Polaris Catalog as open source under the Apache license and look forward to actively contributing to its success. With over four years of experience building Project Nessie as an open source Apache Iceberg™ Catalog, we’re excited to share its differentiated capabilities, such as catalog-level versioning, multi-engine support, multi-table transactions and Git for data, with Polaris Catalog and the broader community.”
—Tomer Shiran, Co-Founder and CPO, Dremio
Snowflake service, powered by Polaris Catalog, now in public preview
In addition to open sourcing, Snowflake’s service powered by Polaris Catalog is now available in public preview for Snowflake customers. This service is powered by the open source implementation of Polaris Catalog, and is an easy way to get started even if you don’t use Snowflake. You can use this service with the many engines listed above to both read and write to Iceberg tables with cross-engine security.
While other vendor-hosted catalogs deviate from the open source specification, which leads to lock-in, Snowflake’s service for Polaris Catalog is designed to be fully compatible with Polaris Catalog’s open source implementation both now and in the future. Snowflake handles the responsibilities of running the service like providing an endpoint, deploying bug fixes, and users get a completely portable catalog for their data, which can be used with Iceberg REST catalog-compatible tools.