8 ways to flex DuckDB

Header source: https://motherduck.com/blog/six-reasons-duckdb-slaps/

Source: https://duckdb.org/2025/03/12/duckdb-ui.html

DuckDB use case 2️⃣:Crunch JSON files

CSV files aren’t the only type of file that you can query directly with DuckDB.

In additional to tabular file formats (like CSV, Parquet, and other RDMBS), DuckDB also understands JSON and can easily read JSON files:

select property_1, property_2, unnest(property_3)
from 'path/to/file.json'

DuckDB will parse the top-level attributes into columns and top-level array into rows, but you can easily parse nested JSON with the unnest function.

Pro tip: use the recursive:=true argument in the unnest function to recursively unpack nested JSON!

Check out more tips and tricks in DuckDB’s own blog post on shredding JSON in here:

Laurens Kuiper – Shredding Deeply Nested JSON, One Vector at a time

DuckDB use case 3️⃣: Local analysis

Since DuckDB makes it extremely easy to interrogate local files, it’s a great engine to use for any local analysis — replacing what you might do with Pandas.

When you have a set of input files, some transformations to create intermediate files, and some clean and tidy output files, it’s probably a good time to use a database!

Mehdi Ouazza articulated this well in one of his LinkedIn posts (whose image is taken from Max Gabrielsson’s talk at GeoPython):

linkedin – If your data project looks like this, you probably need a database.

But the natural question is: if you pull your transformation logic into separate SQL files, how do easily run them in order?

Answer: there are several open-source SQL orchestrators available, such as:

dbt-core, using the dbt-duckdb adapter (docs)
SQLMesh, which is a strong contender to dbt (docs)
yato, the world’s smallest DuckDB orchestrator (docs)

In fact, this type of DuckDB use case is one of the main reasons the creators built it!

DuckDB use case 4️⃣: Data format parser

If SQL isn’t your main language, you can still reap the benefits of DuckDB provided there’s a client for your language.

For example, it’s fairly common to manipulate Parquet files as part of a data pipeline written with Python. Historically, you’d need to install PyArrow for Parquet support, which is not a trivial dependency to add to your project.

The Python DuckDB client, however, has no dependencies (just Python 3.9 or newer!) so it’s very easy to add to a Python project. The function below is a simple example of how DuckDB can be used inside Python to copy data in any format readable by DuckDB into a Parquet file.

def to_parquet(file: pathlib.Path, compression: str = "zstd"):
    """
    Copy the file to a Parquet file.
    """
    
    file_parquet = file.with_suffix(".parquet")
    duckdb.sql(
        f"""
        copy (from '{file.absolute()}')
        to '{file_parquet.absolute()}' (
            format parquet,
            compression '{compression}'
        )
        """
    )

DuckDB use case 5️⃣: Data warehouse (MotherDuck)

Although it’s not one of the intended DuckDB use cases, you can use it as a data warehouse.

MotherDuck is a separate company to DuckDB Labs (the creators of DuckDB), and their product — also called MotherDuck — is a managed DuckDB service. Despite being a separate product to DuckDB, DuckDB has first-class support for interacting with MotherDuck instances of DuckDB, making it extremely easy to work in the cloud.

MotherDuck also has an extremely generous free tier, making it perfect for individuals and small businesses who are after a managed RDBMS without much hassle or overhead costs.

Just be aware of your security and governance requirements before signing your company up!

Source: https://motherduck.com/product/pricing/

DuckDB use case 6️⃣: Multi-engine data stack

Tools like SQLMesh are starting to support a “multi-engine” approach, which takes the idea of separating storage and compute to another level.

In a multi-engine approach, you could use one database technology for your computation, and another for your storage. Since DuckDB is free and open-source, using it for computation is a method for reducing costs — as well as potentially improving performance, depending on the alternative engines available.

The SQLMesh docs illustrate how you could use SQLMesh to store your data in a PostgreSQL database, but use DuckDB to transform data from an S3 bucket before inserting it into PostgreSQL:

Multi-Engine guide – SQLMesh

There are also some great articles from Julien Hurault describing how to implement a multi-engine data stack:

DuckDB use case 7️⃣: Cross-data-source queries

The first two examples above described that we can use DuckDB to query local files directly. DuckDB can also read some other RDBMS systems, as well as remote URLs over HTTP(S).

DuckDB isn’t limited to reading from a single source per query: we can query multiple data sources in a single query, and DuckDB will read from each of them at run time — meaning that DuckDB is always operating on the freshest data available.

Since DuckDB doesn’t need a server to run, this makes DuckDB an ideal engine for in-process transformation and is perfect for anyone comfortable talking SQL.

You can see an example (illustration only) of a DuckDB query which queries multiple data sources at the same time at:

Gist – DuckDB cross-data-source

DuckDB use case 8️⃣: SQL-first lakehouse (DuckLake)

2024 was an important year for data lakes and cloud data platforms:

DuckDB can read from both Delta Lake and Iceberg tables, but in early 2025, DuckDB Labs released DuckLake as an alternative data lake format aimed to make data lake catalogue management easier.

The DuckLake extension is available as part of DuckDB, giving you full data lake support at no extra cost, and without needing any additional infrastructure.

Source: https://ducklake.select/

DuckDB is an essential part of the modern data toolkit

DuckDB isn’t just a database: it’s a transformation engine that quacks SQL.

These DuckDB use cases represent just a fraction of its capabilities: it fits into the modern data toolkit seamlessly and has countless other applications. There are already loads of tools now powered by DuckDB; many can be found in the list linked below:

GitHub – David Gasquez 🦆 – A curated list of awesome DuckDB libraries, tools and resources.

Whether you’re exploring DuckDB use cases for local analysis or enterprise applications, get in touch via the form below and we can have a chat about how Tasman can help enrich your platform with DuckDB!

News & Views

DuckDB use case 1️⃣: Crunch big CSVs