Streamline Data Ingestion with New Connectors & Updates

Posted on July 20, 2024July 20, 2024 by cloudmatrix.website

The journey toward achieving a robust data platform that secures all your data in one place can seem like a daunting one. But at Snowflake, we’re committed to making the first step the easiest — with seamless, cost-effective data ingestion to help bring your workloads into the AI Data Cloud with ease.

Snowflake is launching native integrations with some of the most popular databases, including PostgreSQL and MySQL. With other ingestion improvements and our new database connectors, we are smoothing out the data ingestion process, making it radically simple and efficient to bring data to Snowflake. That means fewer tools and licenses, lower costs, and a more frictionless experience for your organization.

Like any first step, data ingestion is a critical foundational block. Ingestion with Snowflake should feel like a breeze. Given the many different ways to ingest data, in this blog we will walk through the various methods, calling out the latest announcements and improvements we’ve made.

Bringing in batch and streaming data efficiently and cost-effectively

Ingest and transform batch or streaming data in <10 seconds: Use COPY for batch ingestion, Snowpipe to auto-ingest files, or bring in row-set data with single-digit latency using Snowpipe Streaming.

COPY INTO now supports use cases for unstructured data with the new ingestion capabilities for Document AI (generally available soon). Users can now use Document AI to create a model and use it in automated batch ingest of unstructured documents with formats like PDF, JPEG, HTML and more. Together with Document AI, Snowflake customers can utilize the analytical insights they extract from documents and directly operationalize them in their data pipelines.

Both Snowpipe and Snowpipe Streaming are serverless, leading to better scalability and cost efficiency. Snowpipe Streaming, compared to Snowpipe, can handle high volumes of data at a lower cost and low latency — without complex manual client configuration and management. Exactly-once delivery, data ordering and availability are automatically managed by Snowflake, freeing up expensive developer resources for more mission-critical work. Users can also unify data pipelines, no longer needing to separate streaming and batch data. Ingest and transform easily in a single system without having to stitch solutions together or build additional data pipelines to move data around.

Snowpipe and Snowpipe Streaming also serve as foundations for Snowflake’s native connectors and partner integrations, such as AWS Data Firehose, Striim and Streamkap. Customers benefit from the same cost efficiency and low latency.

Simplifying ingestion with Snowflake native connectors

Building on the success of Snowflake native connectors — Snowflake Connector for Kafka and connectors for SaaS applications, like ServiceNow and Google Analytics — we have just announced a public preview (soon) of connectors for some of the leading open source relational databases, PostgreSQL and MySQL. The new database connectors are built on top of Snowpipe Streaming, which means they also provide more cost-effective and lower latency pipelines for customers. They further our commitment to offering simple native connectors for change data capture (CDC) from the top online transaction processing (OLTP) database systems. We soon expect to expand the connectors roster to leading proprietary databases as well.

These native connectors are built with the Snowflake Native App Framework, which means customers can connect their data through the Snowflake Marketplace with built-in security and reliability. Instead of transporting files between systems, data flows directly from the source right into Snowflake, and the data is always encrypted, whether in motion or at rest. Additionally, you can pay as you consume, with no need for additional licenses or procurement processes.

Developers can operationalize their analytics, AI and ML workflows by bringing Postgres and MySQL data into Snowflake with lower latency. Customers have already unlocked incredible value from these connectors across retail, healthcare, high-tech, media, financial services and other industries.

Figure 1: Snowflake’s Native Connectors can be found and used from Snowflake Marketplace

Now, let’s take a deeper look at how the native connectors work in Snowflake.

The OLTP database connectors, built on a strong foundation of capabilities that have already been highly recognized by our customers, offer the same set of benefits — ease of use, high scalability, cost-effectiveness, low latency — as our SaaS native connectors and Snowpipe Streaming, with little operational oversight.

Snowflake database connectors consist of two components:

The agent, a standalone application distributed as a docker image, available on Docker Hub, deployed in the customer’s infrastructure. It is responsible for sending the initial snapshot load and incremental load by reading data changes from the source database CDC stream.
The Snowflake Native App, an object that resides in the customer’s Snowflake account and is the brain behind the connector. It is primarily responsible for managing the replication process, controlling the agent state and creating all database objects, including the target database.

Figure 2: Example database connector configuration

Users can connect a single agent to multiple data sources and synchronize the data — whether continuously or at prescribed intervals — into a single Snowflake account. From inside the Snowflake Native App, they can select which tables and columns are replicated. In case of errors (e.g., a network issue or lost connection with the agent), users will be notified with an email alert. And, soon available in public preview, if a table in the source database changes its schema (e.g., a column is added, removed or renamed), the connector will automatically adjust and continue syncing the table with a new schema.

Customer use cases across industries

E-commerce and retail: A developer at an e-commerce platform, tasked with personalizing the shopping experience for millions of users, can now use Snowflake’s native database connectors to tap into near real-time website interaction data from globally distributed Postgres databases, continuously analyze that data in Snowflake and serve personalized recommendations without expensive ETL.

Healthcare: A healthcare company, planning to optimize patient-care experiences through data-driven insights, can securely integrate patient-interaction data from their hospital management system Postgres into Snowflake, without needing a third-party processor, and leverage Snowflake Cortex AI to analyze trends and improve service quality in real time.

Gaming: With Snowflake’s native connectors, developers can quickly and continuously stream billing and customer usage data from thousands of Postgres databases into Snowflake, enabling them to make lightning-fast decisions to optimize user engagement in their game and user portals.

You soon will be able to try out the Snowflake Connectors for PostgreSQL or MySQL by installing them from Snowflake Marketplace and downloading the agent from Docker Hub.

Connecting to more data with the Marketplace ecosystem and Connector SDK

In addition to the connectors delivered natively by Snowflake, customers can also benefit from a broad ecosystem of partners who have built Snowflake Native Apps to distribute connectors via the Snowflake Marketplace. For example, SNP developed SNP Glue to ingest SAP data directly to Snowflake; Omnata offers out-of-box SaaS connectors, such as Monday.com, HubSpot and Zendesk; as do many other providers, such as Nimbus and Informatica, to name just a few.

Figure 3: Example of connectors available via Snowflake Marketplace.

Additionally, developers have the option to build their own connectors. Snowflake Native SDK for Connectors offers core libraries and templates so developers can build connectors faster.

Of course, one of the key reasons why data engineering with Snowflake is revolutionary is the need for fewer data pipelines with easy data sharing. Customers have access to live data sets from Snowflake Marketplace, which reduces the costs and burden associated with traditional ETL pipelines and API-based integrations.

Continuous improvement with performance optimization and usability

To make data ingestion even more cost-effective and effortless, Snowflake continues to invest in higher performance and a better user interface. We have improved JSON file loading by up to 25% without any action required from customers, and up to 50% for loading Parquet files.

Snowsight makes it simple to get your data into Snowflake. Snowsight is now even easier to navigate, with a centralized location for a range of generally available features, including creating stages, uploading files to create a table, loading files into an existing table, installing a connection, and automatic schema inference with ability to update or override.

Snowsight now allows users to directly create tables using Schema Detection, load tables and stages with little-to-no coding, while users can now upload files as large as 250 MB — up from 50MB. Learn more here.

You can learn more about data ingestion here. Or simply visit Snowflake Marketplace or Snowsight to jumpstart your ingestion pipelines.