A practical introduction to data engineering on the powerful Snowflake cloud data platform.
Data engineers create the pipelines that ingest raw data, transform it, and funnel it to the analysts and professionals who need it. The Snowflake cloud data platform provides a suite of productivity-focused tools and features that simplify building and maintaining data pipelines. In Snowflake Data Engineering, Snowflake Data Superhero Maja Ferle shows you how to get started.
In Snowflake Data Engineering you will learn how to:
• Ingest data into Snowflake from both cloud and local file systems
• Transform data using functions, stored procedures, and SQL
• Orchestrate data pipelines with streams and tasks, and monitor their execution
• Use Snowpark to run Python code in your pipelines
• Deploy Snowflake objects and code using continuous integration principles
• Optimize performance and costs when ingesting data into Snowflake
Snowflake Data Engineering reveals how Snowflake makes it easy to work with unstructured data, set up continuous ingestion with Snowpipe, and keep your data safe and secure with best-in-class data governance features. Along the way, you’ll practice the most important data engineering tasks as you work through relevant hands-on examples. Throughout, author Maja Ferle shares design tips drawn from her years of experience to ensure your pipeline follows the best practices of software engineering, security, and data governance.
Foreword by Joe Reis.
About the technology
Pipelines that ingest and transform raw data are the lifeblood of business analytics, and data engineers rely on Snowflake to help them deliver those pipelines efficiently. Snowflake is a full-service cloud-based platform that handles everything from near-infinite storage, fast elastic compute services, inbuilt AI/ML capabilities like vector search, text-to-SQL, code generation, and more. This book gives you what you need to create effective data pipelines on the Snowflake platform.
About the book
Snowflake Data Engineering guides you skill-by-skill through accomplishing on-the-job data engineering tasks using Snowflake. You’ll start by building your first simple pipeline and then expand it by adding increasingly powerful features, including data governance and security, adding CI/CD into your pipelines, and even augmenting data with generative AI. You’ll be amazed how far you can go in just a few short chapters!
What's inside
• Ingest data from the cloud, APIs, or Snowflake Marketplace
• Orchestrate data pipelines with streams and tasks
• Optimize performance and cost
About the reader
For software developers and data analysts. Readers should know the basics of SQL and the Cloud.
About the author
Maja Ferle is a Snowflake Subject Matter Expert and a Snowflake Data Superhero who holds the SnowPro Advanced Data Engineer and the SnowPro Advanced Data Analyst certifications.
Table of Contents
Part 1
1 Data engineering with Snowflake
2 Creating your first data pipeline
Part 2
3 Best practices for data staging
4 Transforming data
5 Continuous data ingestion
6 Executing code natively with Snowpark
7 Augmenting data with outputs from large language models
8 Optimizing query performance
9 Controlling costs
10 Data governance and access control
Part 3
11 Designing data pipelines
12 Ingesting data incrementally
13 Orchestrating data pipelines
14 Testing for data integrity and completeness
15 Data pipeline continuous integration