Introduction

In this course, we will use GitLab’s merge request (MR) feature for both project submissions and grading. This workflow simulates real-world software development practices, giving you hands-on experience with version control and code review processes.

For Project 1 Part 1, the main objectives are:

  1. Set up your local computing environment
  2. Become familiar with the GitLab submission process

Learning to configure your programming environment independently is a crucial software engineering skill. In this project, you are responsible for successfully installing, configuring, and submitting your work on GitLab.

All work will be performed on your personal laptop. We have chosen packages commonly used for local development, including PySpark (for loading data from MySQL and MongoDB), Python (required by PySpark), and Java 17 (for JDBC connections).

Why Do We Need These Packages?

Below is a brief overview of why the listed software and connectors are required:

Python 3.12

PySpark will be primarily used for projects. Python language is easy to learn and you will have time to focus on things other than programming.

Java (OpenJDK 17)

Although you may primarily code in Python, PySpark’s JDBC features occasionally require the Java Virtual Machine (JVM).

MySQL

A popular relational database management system.

MongoDB

A leading NoSQL database solution, helpful for unstructured or semi-structured data storage.

Database Connectors