Skip to content

Lance-Python-Dev/batch_vs_stream-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Batch Processing

  • Simulates a daily ETL job:
    I created a DataFrame of transactions, save it as a CSV, clean and transform the data, aggregate totals per customer, and save the results as Parquet files.
  • SQL-like queries:
    I used DuckDB to run SQL queries on the aggregated Parquet data.
  • Result:
    The final customer totals are saved in data/batch_customer_totals.parquet.
    Example (from my file):
    • David: 300
    • Bob: 200
    • Alice: 100
    • Charlie: 50

Stream Processing

  • Simulates real-time transaction events:
    I used the Streamz library to emit random transaction events one by one.
  • Real-time aggregation:
    Each event updates the running total for each customer instantly.
  • Result:
    After all events, the final totals are saved in data/stream_totals.parquet.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages