You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
spark UDF's should be mappable to our UDF's. They use a very similar pickling approach, and we'll likely just need to use their deserializer to deserialize them back into python. Likely a bit more discovery needed.
WRT to the catalogs, @universalmind303 what do you think of starting to unify around the DaftMetaCatalog that I introduced in #3036?
I think we have a few competing standards atm (including the SQLCatalog). It could be good to start having a catalog abstraction that can be shared across our different frontends
WRT to the catalogs, @universalmind303 what do you think of starting to unify around the DaftMetaCatalog that I introduced in #3036?
I think we have a few competing standards atm (including the SQLCatalog). It could be good to start having a catalog abstraction that can be shared across our different frontends
yes, that is something I want to do and have been thinking about. I'll open up an issue to unify daft.catalog and daft.sql.catalog as well
spark connect
distributed execution
for distributed execution we need a ray runner that we can call from rust
We might need this?
compatibility/interop
some of the text based methods (printSchema, show, explain) should have a spark compatibile output.
to_comfy_table
to be able to output a spark compatible df output.Schema
that matches spark'sTreeDisplay
implementation that somewhat matches spark's planspyspark.sql.DataFrame
pyspark.sql.Catalog
TODO (I don't think this is stabilized in spark connect yet)
pyspark.sql.functions
see https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/functions.html for list of functions
UDFS
spark UDF's should be mappable to our UDF's. They use a very similar pickling approach, and we'll likely just need to use their deserializer to deserialize them back into python. Likely a bit more discovery needed.
UX/DX
daft.daft.spark_connect
into a dedicated module #3497Documentation
Issue Tracking
df.explain()
#3577read.rs
,write.rs
#3550.show()
#3498daft.daft.spark_connect
into a dedicated module #3497Upstream Spark issues
The text was updated successfully, but these errors were encountered: