Objective

Design and implement Medallion Architecture on stock data, from source to data analytics.

Stock data introduction

image.png

Stock-related data* is generated every second from multiple sources, including users, the market, and stock exchanges. The Bronze Layer captures raw data, consisting of user transactions, market information, and stock and company registration details from the stock exchange.

After aggregation and cleaning, the Silver Layer contains structured and detailed data at a granular level, making it suitable for further processing.

In the Gold Layer, data is aggregated into daily, monthly, and yearly summaries, optimizing it for reporting, dashboards, and business intelligence applications.

*The data used in this project is artificially generated and does not represent real financial data. However, it is designed to closely mimic real-world scenarios, ensuring that stock prices fluctuate within a reasonable range over time.

Requirements and Grading Rubric (50 points)

  1. Load data/stock_market.sql into MySQL database (5 points)

    1. Instructions can be found in project README
  2. Load data/transactions.json into Mongodb (5 points)

    1. Instructions can be found in project README
  3. Extract data from mysql and mongodb database(10 points)

  4. Aggregate data together for further data analysis (15 points) Note: only pyspark functions are allowed for data aggregation, other python packages such as pandas may not be used

    1. The final table should contain all information shown below, stock price fluctuates and the table requires average stock prices used for transactions, the volume of the transactions (sum of buy and sell), and the market index. Column sequence and name should follow the example given below.
    2. For further data analysis, aggregate the data into different granular level (see example below)
      1. Hourly data (4)

        image.png

      2. Daily data (4)

        image.png

      3. Monthly data (4)

        image.png

      4. Quarterly Data (Oct - Dec Summary) (3)

        image.png

  5. Project Demo (15 points)

Project will also be evaluated on (details will be released soon):

Important Notes