@@ -25,3 +25,77 @@ possible. You can find the most up to date version in the [source code].
2525
2626[ crates.io documentation ] : https://docs.rs/datafusion/latest/datafusion/index.html#architecture
2727[ source code ] : https://github.com/apache/datafusion/blob/main/datafusion/core/src/lib.rs
28+
29+ ## Forks vs Extension APIs
30+
31+ DataFusion is a fast moving project, which results in frequent internal changes.
32+ This benefits DataFusion by allowing it to evolve and respond quickly to
33+ requests, but also means that maintaining a fork with major modifications
34+ sometimes requires non trivial work.
35+
36+ The public API (what is accessible if you use the DataFusion releases from
37+ crates.io) is typically much more stable (though it does change from release to
38+ release as well).
39+
40+ Thus, rather than forks, we recommend using one of the many extension APIs (such
41+ as ` TableProvider ` , ` OptimizerRule ` , or ` ExecutionPlan ` ) to customize
42+ DataFusion. If you can not do what you want with the existing APIs, we would
43+ welcome you working with us to add new APIs to enable your use case, as
44+ described in the next section.
45+
46+ ## ` datafusion-contrib `
47+
48+ While DataFusions comes with enough features "out of the box" to quickly start
49+ with a working system, it can't include everything useful feature (e.g.
50+ ` TableProvider ` s for all data formats). The [ ` datafusion-contrib ` ] project
51+ contains a collection of community maintained extensions that are not part of
52+ the core DataFusion project, and not under Apache Software Foundation governance
53+ but may be useful to others in the community. If you are interested adding a
54+ feature to DataFusion, a new extension in ` datafusion-contrib ` is likely a good
55+ place to start. Please [ contact] us via github issue, slack, or Discord and
56+ we'll gladly set up a new repository for your extension.
57+
58+ [ `datafusion-contrib` ] : https://github.com/datafusion-contrib
59+ [ contact ] : ../contributor-guide/communication.md
60+
61+ ## Creating new Extension APIs
62+
63+ DataFusion aims to be a general-purpose query engine, and thus the core crates
64+ contain features that are useful for a wide range of use cases. Use case specific
65+ functionality (such as very specific time series or stream processing features)
66+ are typically implemented using the extension APIs.
67+
68+ If have a use case that is not covered by the existing APIs, we would love to
69+ work with you to design a new general purpose API. There are often others who are
70+ interested in similar extensions and the act of defining the API often improves
71+ the code overall for everyone.
72+
73+ Extension APIs that provide "safe" default behaviors are more likely to be
74+ suitable for inclusion in DataFusion, while APIs that require major changes to
75+ built-in operators are less likely. For example, it might make less sense
76+ to add an API to support a stream processing feature if that would result in
77+ slower performance for built-in operators. It may still make sense to add
78+ extension APIs for such features, but leave implementation of such operators in
79+ downstream projects.
80+
81+ The process to create a new extension API is typically:
82+
83+ - Look for an existing issue describing what you want to do, and file one if it
84+ doesn't yet exist.
85+ - Discuss what the API would look like. Feel free to ask contributors (via ` @ `
86+ mentions) for feedback (you can find such people by looking at the most
87+ recently changed PRs and issues)
88+ - Prototype the new API, typically by adding an example (in
89+ ` datafusion-examples ` or refactoring existing code) to show how it would work
90+ - Create a PR with the new API, and work with the community to get it merged
91+
92+ Some benefits of using an example based approach are
93+
94+ - Any future API changes will also keep your example going ensuring no
95+ regression in functionality
96+ - There will be a blue print of any needed changes to your code if the APIs do change
97+ (just look at what changed in your example)
98+
99+ An example of this process was [ creating a SQL Extension Planning API] .
100+
101+ [ creating a sql extension planning api ] : https://github.com/apache/datafusion/issues/11207
0 commit comments