Modern Analytics — A Data Stack For Successful Reporting

Photo Credit: Suzanne D Williams

What exactly is a data stack? A data stack is a process of collecting, storing, and analyzing data in order to make better judgments. Data stacks are made up of many components that work together to build an end-to-end solution for reporting on your company’s most critical data.

These components include ETL (Extract, Transform, and Load), which extracts raw data from multiple sources such as social media or portals, transforms it into useful formats such as tables with columns, and loads it into a data warehouse for analysis by business users using BI tools such as Tableau or Power BI. ELT (Extract, Load, and Transform) does the same thing, but generally automatically rather than manually, allowing businesses to save time by not having to do all of the data preparation themselves.

If a human operator needs to touch your system during normal operations, you have a bug. The definition of normal changes as your systems grow — Carla Geisser, Google SRE

The Modern Data Stack

Data Ingestion

A data warehouse is only as good as the data that it holds, and it’s only valuable if you can actually get useful data into it. So in order to make this happen, companies are employing data ingestion tools. These extract the raw datasets from different sources and load them into a central location where they can be transformed by ELT or ETL processes. Two of the most popular data ingestion tools are Fivetran and Airbyte

Fivetran: A pipeline that allows you to connect your existing database infrastructure into leading data warehouses, visual analytics tools and other applications. Stitch: Connects directly with over 100+ sources in the cloud or on-premise for fast, easy, secure access to all of your critical business data.

Airbyte: An open-source tool created by Netflix that helps with connecting their 500+ databases into one location.

Pro Tip: You may need to use more than one data ingestion tool at the same time in order to get all your data sets into a single place — so don’t put too much effort into just choosing one when there’s no clear winner yet!

Transformation

No self-respecting data engineer can describe the modern data stack without discussing Data Built Tool, abbreviated as DBT. dbt is a language and tool that helps you transform your data into the exact format that you need it in for reporting or analysis. DBT is given raw data. changes it and generates curated datasets for use by analytics tools or machine learning models. You can then schedule this change to run monthly, weekly, daily, or even hourly, depending on your use case.

The fact that everything is done via SQL statements is what makes DBT so impressive. It’s used by companies like Spotify, Lyft, and Airbnb to clean up and prepare their data before loading it into their data warehouses.

Spotify: Uses dbt to help them with transforming their music data into a format that they can use for reporting and analysis.

Lyft: Uses dbt to help them with transforming their ride data into a format that they can use for reporting and analysis.

Airbnb: Uses dbt to help them with transforming their guest data into a format that they can use for reporting and analysis.

The extracted data is subsequently saved centrally in a data warehouse (Redshift, Snowflake, BigQuery), where it is transformed and loaded with the Bata build tool DBT.

Operational Analytics

Once you have your data in a data warehouse, it’s ready for use by BI tools or machine learning models. But some companies also use their data warehouses for operational analytics, which is the process of using real-time data to make decisions about things like product pricing or inventory levels.

Netflix: Uses their data warehouse for operational analytics to make decisions about their content library.

Pro Tip: Beyond just storing the data, one of the biggest benefits of a data warehouse is that it makes querying your data extremely easy — which means you shouldn’t be afraid to ask any question that could possibly come up!

The modern business stack continues with how data is accessed and used. The shift from ETL to ELT allows for data extraction, transformation and loading (ETL) to be mostly automated which saves time by not having to do all the data prep themselves.

Data ingestion tools like Fivetran or Airbyte help get the raw datasets into a central location where they can be transformed by ELT or ETL processes like dbt. Companies then use the data in their data warehouses for operational analytics (making decisions about product pricing, inventory levels) and BI tools/machine learning models to extract insights from it.

Key Takeaways: The modern business stack is made up of loaders that extract data from multiple sources, ELT or ETL processes that centrally store transformed datasets, and data warehouses that use the data for operational analytics, BI tools, or machine learning models. The use of ELT instead of ETL allows for data extraction, transformation, and loading to be mostly automated, which saves time by not having to do all the data prep themselves.

Data ingestion tools like Fivetran or Airbyte help get the raw datasets into a central location where they can be transformed by ELT or ETL processes like dbt. Companies then use the data in their data warehouses for operational analytics (making decisions about product pricing and inventory levels) and BI tools or machine learning models to extract insights.

Verified by MonsterInsights