Understanding File and Folder Structure

Inside a DBT Project: Understanding Its File and Folder Structure

Introduction

In the world of modern data engineering, DBT (Data Build Tool) has emerged as a transformative solution for managing data transformations in a scalable, modular, and version-controlled way. Unlike traditional ETL tools that often rely on opaque workflows and proprietary interfaces, DBT embraces the principles of software engineering—treating data transformations as code.

One of the key reasons DBT is so effective is its well-defined project structure. Every DBT project is organized into a set of folders and configuration files that serve specific purposes. This structure not only promotes clarity and collaboration but also enables automation, testing, documentation, and deployment.

In this blog, we’ll take a deep dive into the anatomy of a DBT project, exploring the purpose and importance of each folder—such as models, seeds, snapshots, and macros—and how they work together to create a robust data transformation pipeline.

Why Project Structure Matters in DBT

Before diving into the folders themselves, it’s important to understand why DBT’s structure is so valuable:

Modularity: Each transformation is isolated in its own file, making it easier to manage, test, and reuse.
Transparency: The structure makes it easy for new team members to understand the flow of data and dependencies.
Version Control: With Git integration, every change is tracked, reviewed, and documented.
Automation: The structure supports CI/CD workflows, enabling automated testing and deployment.
Documentation: DBT can auto-generate documentation based on the structure and metadata defined in the project.

Now, let’s explore the key folders that make up a DBT project.

1. The models/ Folder: The Heart of Your Transformations

The models folder is where the core of your data transformation logic lives. Each file in this folder represents a DBT model—a SQL query that transforms raw data into a clean, usable format.

Models are typically organized into subfolders that reflect the layers of transformation. Common subfolders include:

Staging: Contains models that clean and standardize raw source data.
Intermediate: Holds models that join, filter, or enrich staging data.
Marts: Includes final models used for reporting, dashboards, or business analysis.

This layered approach helps teams maintain clarity and control over complex transformations. It also supports dependency tracking, allowing DBT to build models in the correct order based on their relationships.

Each model can be documented, tested, and materialized as a table, view, or ephemeral structure—depending on performance and use case requirements.

2. The seeds/ Folder: Static Reference Data

The seeds folder is used to store static datasets in CSV format. These datasets are typically small, stable, and used for reference purposes—such as country codes, currency mappings, or business rules.

When DBT runs the seed command, it loads these CSV files into the data warehouse as tables. This allows them to be used in transformations just like any other model.

Seeds are especially useful for:

Lookup tables that enrich transactional data.
Configuration tables that drive dynamic logic.
Testing datasets used to validate transformations.

Because seeds are version-controlled alongside the rest of the project, they offer a reliable and auditable way to manage reference data.

3. The snapshots/ Folder: Tracking Historical Changes

The snapshots folder is where DBT stores logic for capturing historical versions of data. This is essential for implementing Slowly Changing Dimensions (SCDs), where you need to track how records evolve over time.

Snapshots work by comparing the current state of a record to its previous state and storing changes in a dedicated snapshot table. This allows analysts to answer questions like:

What was a customer’s status last month?
How did product pricing change over time?
When did a user upgrade their subscription?

Snapshots are configured with metadata that defines the unique key, update strategy, and timestamp fields. They are particularly valuable in environments where source systems overwrite data and historical context is lost.

4. The macros/ Folder: Reusable SQL Logic

The macros folder is where you define reusable SQL functions using Jinja templating. Macros help you write cleaner, more maintainable code by abstracting repetitive logic into callable functions.

For example, if you frequently calculate a specific metric across multiple models, you can define that logic once in a macro and reuse it wherever needed.

Macros are powerful because they:

Reduce duplication across models.
Encapsulate complex logic into readable functions.
Support dynamic SQL generation based on parameters.

They can be used in models, tests, snapshots, and even documentation—making them one of the most versatile tools in the DBT ecosystem.

5. The tests/ Folder: Custom Data Validations

While DBT supports built-in schema tests (like not null, unique, and relationships), the tests folder allows you to define custom validations using SQL.

These tests are designed to catch business-specific anomalies that generic tests might miss. For example:

Ensuring that revenue values are always positive.
Verifying that event timestamps occur after user signup dates.
Detecting duplicate records based on composite keys.

Custom tests return rows that violate expectations. If any rows are returned, the test fails—alerting the team to investigate.

By storing these tests in a dedicated folder, DBT promotes a culture of proactive data quality assurance.

6. The analysis/ Folder: Exploratory Queries

The analysis folder is used for ad hoc queries and exploratory analysis that aren’t part of the transformation pipeline. These queries can be compiled and run manually but are not materialized or included in DAGs.

This folder is useful for:

Prototyping new models.
Investigating data anomalies.
Performing one-off analyses for stakeholders.

While not essential to every project, the analysis folder provides a sandbox for experimentation without cluttering the production workflow.

7. The docs/ Folder: Enhancing Documentation

DBT automatically generates documentation based on model metadata, tests, and sources. The docs folder allows you to enhance this documentation with custom content—such as markdown files, images, or diagrams.

This is especially helpful for:

Explaining complex business logic.
Providing onboarding guides for new team members.
Sharing context about data sources and usage.

By combining auto-generated and custom documentation, DBT helps teams build a comprehensive knowledge base around their data.

8. The data/ or sources/ Folder: External Source Definitions

Some teams create a dedicated folder for defining external sources in YAML files. These definitions include metadata about raw tables—such as descriptions, column names, and freshness expectations.

Source definitions improve:

Data lineage tracking, showing how raw data flows into models.
Documentation, making it easier to understand upstream systems.
Testing, by enabling freshness checks and schema validations.

While not required, organizing source definitions in a separate folder can improve clarity and maintainability.

Conclusion

DBT’s file and folder structure is more than just a way to organize code—it’s a framework for building reliable, scalable, and collaborative data transformation workflows. Each folder serves a specific purpose, from core modeling to testing, documentation, and historical tracking.

By embracing this structure, teams can:

Write modular and reusable SQL.
Validate data quality automatically.
Track changes and collaborate effectively.
Document their work for transparency and governance.
Scale their pipelines with confidence.

Whether you're just starting with DBT or looking to optimize your existing project, understanding and leveraging its folder structure is a foundational step toward building a modern, trustworthy data stack

Search This Blog

Dataverse_Chronicles