Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DuckDB course #48

Open
akshayka opened this issue Mar 4, 2025 · 21 comments
Open

DuckDB course #48

akshayka opened this issue Mar 4, 2025 · 21 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@akshayka
Copy link
Contributor

akshayka commented Mar 4, 2025

DuckDB

We're seeking contributors for a course on DuckDB! Contributors will be credited as authors in the course's directory, and in the notebooks they contribute.

marimo has built-in support for DuckDB through its SQL cells — making it the best interactive environment to experiment with DuckDB and Python.

Claiming a notebook.

Any notebook without an assigned author needs a contributor. To get started, leave a comment to claim a notebook you'd like to contribute, and then create a pull request with your draft. Tutorials with a "🍃" are meant to highlight marimo features that give DuckDB superpowers.

Please let us know if you have feedback on the proposed notebooks; we are open to changing the course structure as well.

Notebook Description Status Author
Why DuckDB An overview of duckdb and why it matters for analytics 🚧 @prrao87
Querying Dataframes 🍃 Using SQL cells (and AI!) to query Dataframes 🚧 @salimmj
Referencing Python values in queries 🍃 How to refer parametrize queries with Python, using SQL cells 🚧
Datasources panel 🍃 Using marimo's data sources panel to introspect tables 🚧
DuckDB functions A tour of some helpful DuckDB functions 🚧 @dhonysilva
DuckDB for Dataframe Users DuckDB for users coming from Pandas or Polars 🚧 @chunhouuu
Loading CSVs Loading CSVs into tables 🚧 @Mustjaab
Loading Parquet files Loading from Parquet files 🚧
Loading JSON Loading from JSON files 🚧 @julius383
Loading from HTTPS and buckets Loading from cloud buckets 🚧
Arrow Working with Apache Arrow 🚧
DuckDB extensions Going Further with DuckDB extensions 🚧 @julius383

Subscribe to this issue to get notified when new notebooks drop.

@akshayka akshayka changed the title Duckdb course DuckDB course Mar 4, 2025
@dhonysilva
Copy link

Hi. I'll stick to the DuckDB functions 😀

@akshayka
Copy link
Contributor Author

akshayka commented Mar 4, 2025

Hi. I'll stick to the DuckDB functions 😀

Thank you @dhonysilva! I've assigned the notebook to you :)

Please make a PR adding a notebook to the duckdb folder, and @Haleshot will help review it. Thank you!

@Haleshot Haleshot added enhancement New feature or request help wanted Extra attention is needed labels Mar 5, 2025
@salimmj
Copy link

salimmj commented Mar 6, 2025

I can do querying dataframes

@Mustjaab
Copy link

Mustjaab commented Mar 8, 2025

I would love to do Loading CSVs!

@Haleshot
Copy link
Collaborator

Haleshot commented Mar 8, 2025

I would love to do Loading CSVs!

Assigned it in the table above!

@prrao87
Copy link

prrao87 commented Mar 10, 2025

Hi, this is a great initiative! Would love to contribute to the DuckDB, Polars and an upcoming Kuzu learning series in Marimo. I wonder if for the first, foundational lesson "Why DuckDB", it makes sense for a core member of DuckDB like @szarnyasg to suggest what points can be covered? Happy to help with creating the lesson once the general message of "Why DuckDB" is explained the way they (the DuckDB team) themselves intend.

@Haleshot
Copy link
Collaborator

Haleshot commented Mar 10, 2025

I wonder if for the first, foundational lesson "Why DuckDB", it makes sense for a core member of DuckDB like @szarnyasg to suggest what points can be covered?

That would be great! We had posted a message on the duckdb discord server few days back; hope to get the series kickstarted soon (and it would be great if contributors/devs over from duckdb help w/ that). Can help assign people in the table above in relevance to what topics they want to contribute (which is also tentative; if duckdb power users/devs feel the need to expand the above table to more relevant topics to help showcase it's features better, that works too) ❤

@prrao87
Copy link

prrao87 commented Mar 10, 2025

That would be great! We had posted a message on the duckdb discord server a while back; hope to get the series kickstarted soon (and it would be great if contributors/devs over from duckdb help w/ that). Can help assign people in the table above in relevance to what topics they want to contribute (which is also tentative; if duckdb power users/devs feel the need to expand the above table to more relevant topics to help showcase it's features better, that works too) ❤

I'd be happy to take on the "Why DuckDB" lesson assuming there hasn't been any traction from the DuckDB Labs team. I just didn't want to step on Gabor's toes in case he was interested in taking it on. It would be great to have the initial lesson out so that the rest of the series is more useful for people who are just getting started with DuckDB and embedded databases.

If the Marimo team here thinks it's apt, I can take it on. I have plenty of experience with embedded databases and I work at Kuzu, an embedded graph database company that's very similar to DuckDB in philosophy, but for graphs instead of relational tables. My eventual goal is to add a graph database tutorial along the lines of the DuckDB one, as a lot of people who are new to graphs and graph databases can benefit by learning in a Marimo environment.

@Haleshot
Copy link
Collaborator

Haleshot commented Mar 10, 2025

It would be great to have the initial lesson out so that the rest of the series is more useful for people who are just getting started with DuckDB and embedded databases.

Makes sense. We also have a similar notebook (Why Polars) for our Polars course and that really helped kickstart the series of notebooks there.

If the Marimo team here thinks it's apt, I can take it on.

Would be great; looking forward to it. Thanks again for taking on the initiative; hope to see more such people follow the lead and having this result in high-quality informational notebooks which can serve as a nice resource to point to (as a reference for beginners and otherwise).

but for graphs instead of relational tables. My eventual goal is to add a graph database tutorial along the lines of the DuckDB one, as a lot of people who are new to graphs and graph databases can benefit by learning in a Marimo environment.

Ooh, this is interesting. Would be glad to see how this turns out! We also encourage maintainers/contributors/devs of data-viz (as evident form our root README desc) libraries to contribute to learning the functionalities offered by their plotting packages.

@szarnyasg
Copy link

Hi everyone, Gabor here from DuckDB Labs. Thanks a lot for this initiative! While I'm really supportive of this, and happy to occasionally review material, I do not have the cycles to become a main contributor. So, @prrao87, feel free to take on the “Why DuckDB” section!

@prrao87
Copy link

prrao87 commented Mar 10, 2025

Ooh, this is interesting. Would be glad to see how this turns out! We also encourage maintainers/contributors/devs of data-viz libraries to contribute to learning the functionalities offered by their plotting packages.

I did take a brief look and I think I can get away with this in the beginning by transforming the Kuzu graph into a NetworkX graph and using Plotly's NetworkX visualizer in marimo. The remaining graph visualization libraries (altair, hvplot etc.) have slim pickings for graph viz., so I think I'll just stick to Plotly for now. Will see if I can explore other options (e.g., Cytoscape or vis.js) for marimo as we go on.

@prrao87
Copy link

prrao87 commented Mar 10, 2025

Hi everyone, Gabor here from DuckDB Labs. Thanks a lot for this initiative! While I'm really supportive of this, and happy to occasionally review material, I do not have the cycles to become a main contributor. So, @prrao87, feel free to take on the “Why DuckDB” section!

Thanks @szarnyasg, will get to it soon. It would be great if you could give the PR a quick look once I submit it (will tag you) just so you know that I captured the key elements. I think I have them but nothing like having your inputs. Cheers :)

@julius383
Copy link

Hi, I would like to work on "Loading JSON".

I also think an additional chapter on some DuckDB extensions like spatial, http_client and gsheets would be useful

@Haleshot
Copy link
Collaborator

Haleshot commented Mar 17, 2025

Hi, I would like to work on "Loading JSON".

Assigned it to you in the table above!

I also think an additional chapter on some DuckDB extensions like spatial, http_client and gsheets would be useful

Great! The list that we propose initially for topics are a tentative outline on important features for the respective libraries. Appreciate you coming forward with more topics 🎉 Care to list a Chapter name for it? Something like "Integrations/Extensions w/ Duckdb" works?

@julius383
Copy link

Care to list a Chapter name for it? Something like "Integrations/Extensions w/ Duckdb" works?

How about "Going Further with DuckDB extensions"?

@Haleshot
Copy link
Collaborator

Haleshot commented Mar 17, 2025

How about "Going Further with DuckDB extensions"?

That works! Assigned it above.

@Haleshot
Copy link
Collaborator

@prrao87 Any updates in regard to the notebooks? Totally understand if things have been busy, no rush at all! Let me know if you need anything from my side.

@Haleshot
Copy link
Collaborator

Also, CC'ing: @dhonysilva, @salimmj

@prrao87
Copy link

prrao87 commented Mar 21, 2025

@prrao87 Any updates in regard to the notebooks? Totally understand if things have been busy, no rush at all! Let me know if you need anything from my side.

Hi @Haleshot yes, it's been busy! But I should be able to put in some time next week, will have a PR ready soon I hope :)

@chunhouuu
Copy link

Hi, I am happy to create a notebook for DuckDB for Dataframe Users.

@Haleshot
Copy link
Collaborator

Hi, I am happy to create a notebook for DuckDB for Dataframe Users.

Great! Assigned it above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

9 participants