Welcome to DuckCon #6

On January, 31st DuckDB Labs organized DuckCon #6. DuckCon brings together the ever growing community of users and enthusiasts using DuckDB for ETL or OLAP workloads. A 15-minute walk from Amsterdam central station, we were greeted by the flashy yellow banner stating DuckCon #6 in black font. The colors make you feel right at home if you have been visiting the duckdb.org page on a regular basis.

2:45PM arrival meant we had the luxury to be welcomed by a buzzing crowd. Roughly a hundred developers are chatting about how they find solutions or hope to find these with DuckDB. Everyone got excited when Hannes from DuckDB Labs greeted the audience.

Initiating this conference and igniting the whole community.

DuckDB 1.2.0 and Lake House on Roadmap

So then, Hannes kicked off by referencing the Data Singularity, DuckDB creators envisioned. Data Singularity, the point at which the majority of data scans can be performed on a single node. Citing Redshift and Snowflake papers, we can safely assume the singularity is close, if not here.

DuckDB 1.2.0 “Histrionicus” was announced and released over the course of last week.

Growth is often just numbers. 10M Downloads/Month, 32M Extension installations, and 1.8M Unique Web-Visitors. Being in the room, it was also the excitement of developers craving better tools. Developers hoping to receive insights into their future of an emerging go-to in their toolbelt. The data engineers hammer, so to say.

For Hannes and Mark (Co-Creators of DuckDB) this has to be a new experience. Speaking in front of excited crowds about their creation. From database research to entertaining crowds.

Mark delivered and answered the hopes of the community. One of the main roadmap items will be Lake House integration. More data in our Lake Houses like Delta Lake asks for more convenient ways to process. Having native integration in DuckDB into Lake Houses will accelerate time to insights from raw data.

Two notable features for us at ChillyBytes: C API Extension and Logging + StackTraces. Bringing the ability to have better observability in DuckDB, will give ChillyBytes better tools to serve more resilient applications embedding DuckDB.

Community Talks: From Airports with Flying Birds and Local Data Analytics

Community talks were kicked off with an introduction to the Airport Extension. Airport lets you query data from Apache Arrow Flight servers. Thanks, Rusty Conover for this whirlwind trip of processing various sources of data.

Apache Arrow Flight is a general-purpose client-server framework to simplify high performance transport of large datasets. It transports data over gRPC. Anything you have in-memory with Apache Arrow can be queried from your DuckDB.

Followed up by Apache Ibis, which lets you interface with any data storage solution by offering a Python Dataframe API. Naty Clementi presented how to utilize Apache Ibis + DuckDB to efficiently query GeoParquet and make it beautiful along the way.

Seeing these talks, raises the question of how Apache Arrow and DuckDB are related. Apart from obvious memory data exchange possibility and DuckDB exposing an Arrow handle struct for resulting RecordBatches. In my perception, they have been neighbors and not integrative solutions so far. Let’s see what the future brings. In this regard, Apache Ibis and Apache Datafusion an Apache Arrow subproject serve similar audiences with different purposes, I guess.

Ryan Hamilton presented how DuckDB solves developer / user experience problems today. The users in questions are Financial Analysts. Data engineers with more math skills 😉. Ryan integrated DuckDB into his Application for Analysts called qStudio. Giving his users less latency and more responsiveness than ever before for windowed aggregations.

Lightning Talks: One Honorable Mention

One honorable mention informative Lightning Talks was the SQL/PGQ implementation for DuckDB. SQL 2023 introduced Property Graph Queries. Have fun reading a bit about SQL/PGQ making graph based queries more lightweight in Standard SQL.

Daniel tën Wolde gave some ideas on how he implemented his extension for this. He ended his talk by comparing SQL/PGQ vs SQL:1999 for graph-based relationship queries.

So instead of recursively building the graph like relationships during query time. You define relation VERTEX and PROPERTY tables and get access to new operations. Query operators, which look familiar enough, will greet you with their functionality. The below query would find the SHORTEST between two person via the edges knows.

-- See https://duckdb.org/community_extensions/extensions/duckpgq.html for full query
-- ...
   MATCH p = ANY SHORTEST (a:Person)-[k:knows]->{1,3}(b:Person)
-- ...