Design and implement Medallion Architecture on stock data, from source to data analytics.
Stock-related data* is generated every second from multiple sources, including users, the market, and stock exchanges. The Bronze Layer captures raw data, consisting of user transactions, market information, and stock and company registration details from the stock exchange.
After aggregation and cleaning, the Silver Layer contains structured and detailed data at a granular level, making it suitable for further processing.
In the Gold Layer, data is aggregated into daily, monthly, and yearly summaries, optimizing it for reporting, dashboards, and business intelligence applications.
*The data used in this project is artificially generated and does not represent real financial data. However, it is designed to closely mimic real-world scenarios, ensuring that stock prices fluctuate within a reasonable range over time.
Load data/stock_market.sql
into MySQL database (5 points)
Load data/transactions.json
into Mongodb (5 points)
Extract data from mysql and mongodb database(10 points)
Aggregate data together for further data analysis (15 points) Note: only pyspark functions are allowed for data aggregation, other python packages such as pandas may not be used
Hourly data (4)
Daily data (4)
Monthly data (4)
Quarterly Data (Oct - Dec Summary) (3)
Project Demo (15 points)