|
|
The Jupyter Notebooks |
|
|
\ No newline at end of file |
|
|
The Jupyter Notebooks are the core of the repository. They are responsible for cleaning data and generating results. They should be executed in the following order:
|
|
|
|
|
|
```mermaid
|
|
|
graph TD
|
|
|
A[cleanup]
|
|
|
B[old-report]
|
|
|
C[pipeline]
|
|
|
|
|
|
A --> B
|
|
|
A --> C
|
|
|
```
|
|
|
|
|
|
Please note that this file only explains parts where the code is intended to be dynamically expandable. If you want to know more about the full pipeline, check out the markdown cells in the notebooks.
|
|
|
|
|
|
[TOC]
|
|
|
|
|
|
## cleanup.ipynb
|
|
|
|
|
|
This file takes data from `raw_data` and tries to unify the provided data. It contains every step that changes data but doesn't interpret it.
|
|
|
|
|
|
### Removing unnecessary columns
|
|
|
|
|
|
Not every column in the raw data is needed, in order to shrink the file, the pipeline will remove columns deemed useless. It does so in to steps. First it takes an array with fixed column names and simply drops them using `df.drop(columns=array, inplace=True, axis=1)`.
|
|
|
|
|
|
Additionally, it provides the possibility to drop columns that you know exist multiple times and thus end with the `.NUMBER` appended by pandas. It generates column names from `.1` up to `.7` and appends them to the array of columns that will be removed.
|
|
|
If the number of duplicates exceeds seven, simply increase the last number of the range.
|
|
|
|
|
|
### Normalizing Course Types
|
|
|
|
|
|
Course types should be `VL`, `S`, `Ü` and `LeKo`. However, old data may have `Vl`, `V`, `VL+Ü`, etc. The cleanup script applies a function to the course type column, which uses a dictionary to map these into on of the four accepted types. If you encounter more types, you can simply add them to the `normalized_identifiers` in the `normalize_type` function.
|
|
|
|
|
|
### Normalizing Course Category
|
|
|
|
|
|
With the same strategy as normalizing course types, this part will turn singular nouns into plurals and will convert course type short codes into the noun. If you find any more values to clean, append them to the dictionary in the `make_plural` or `normalized_categories` dictionary.
|
|
|
|
|
|
## old-report.ipynb
|
|
|
|
|
|
This file recreates the old report. It takes the files from `clean_data`, generates the report for the specified semester and then puts the results into `outputs`. If you want to change the generated semester, you can do so by updating the `SEMESTER` variable in the first code cell. Please keep in mind that the variable needs to be in german and fully written out, like "Sommersemester 2021" for summer terms or "Wintersemester 2022/23" for winter terms.
|
|
|
|
|
|
## pipeline.ipynb
|
|
|
|
|
|
The Evaluation Report Pipeline file takes files from `clean_data` and processes them into a format for the API. In order to do this, it changes Likert scale values from 1 to 7 into values that reflect the weight of an answer. Neutral answers weigh 0, negative answers tend to weigh from -1 to -3 and positive answers from 1 to 3. These values are then inserted into the API.
|
|
|
Please note that the pipeline will only calculate the score for columns that have been turned into UUIDs by the cleanup script. If you encounter a question you need to be considered, you can add it to the API, by following the steps provided in the API documentation //TODO. The cleanup and pipeline scripts will pick them up automatically. |
|
|
\ No newline at end of file |