TDM 10200: Project 10 — Spring 2024
Motivation: NumPy is the foundation that Pandas is built on. Mastering NumPy’s numerical operations will enrich your understanding for numerical operations and data analysis.
Context: Hopeful you have a solid foundation and understanding of data analysis in Python, and a good introduction to Pandas. In this project we will delve into NumPy, enhancing your skill set for high performance computing, in situations where you do not use Pandas.
Scope: Python, pandas, NumPy
Reading and Resources
Dataset
/anvil/projects/tdm/data/flights/2014.csv
You need to use 2 cores for your Jupyter Lab session for Project 10 this week. |
You can use |
We added six new videos to help you with Project 10. BUT the example videos are about a data set with beer reviews. You need to (instead) work on the flight data given here: |
Questions
Question 1 (2 points)
-
The dataset is 2.5 G. For this project, we will only need the following columns, so let us create a DataFrame with only those columns with corresponding data types.
cols = [
'DepDelay', 'ArrDelay', 'Distance',
'CarrierDelay', 'WeatherDelay',
'DepTime', 'ArrTime', 'Diverted', 'AirTime'
]
col_types = {
'DepDelay': 'float64',
'ArrDelay': 'float64',
'Distance': 'float64',
'CarrierDelay': 'float64',
'WeatherDelay': 'float64',
'DepTime': 'float64',
'ArrTime': 'float64',
'Diverted': 'int64',
'AirTime': 'float64'
}
|
Question 2 (2 points)
-
Use
to_numpy()
to create a numpy array calledmydelays
, containing the information from the columnDepDelay
. -
Display the shape and data type in
mydelays
. -
Use
nan_to_num()
to replace all null values inmydelays
to 0. -
It can be helpful to know how to manipulate the values in an array! Find the average time in
mydelays
by calculating the numpymean()
of this array. Afterwards, add 15 minutes to all of the departure delay times stored inmydelays
. Finally, use the numpymean()
method again, to calculate and display the average of the updated values inmydelays
. How do these two averages compare?
output
The average Departure Delay before adding 15 minutes is: ....... The average Departure Delay after adding 15 minutes is: ....... |
Question 3 (2 points)
-
Calculate and display the maximum arrival delay and the minimum arrival delay.
output
Max Arrival Delay: ...... minutes Min Arrival Delay: ...... minutes |
Question 4 (2 points)
The motivation for questions 4 and 5 is to compare the times needed for calculations in pandas vs. numpy.
In this question, first solve the following 3 questions using pandas (only).
-
Create a data frame named
delayed_flights
that contains the information about the flights that satisfy the condition . -
Calculate the average distance for the flights that you found in question 4a, by taking a mean of the
Distance
column from the pandas data frame. -
Display the time needed to calculate the time used for the calculation.
|
Question 5 (2 points)
Please using numpy
methods to re-create your work from Question 4, as follows:
-
Create 3 numpy arrays for the
DepDelay
,ArrDelay
, andDistance
data. -
Filter the numpy array with the
Distance
stored in it, so that you have only the Distances that satisfy the condition that 'departure delay > 60 minutes or arrival delay > 60 minutes' -
Use numpy
mean()
to calculate the average distances from question 5b. (Your solution should be the same as the average you obtained in question 4b.) -
How long does the program take to get the average?
-
Please state your understanding of pandas vs. numpy from Question 4 and 5 in one or two sentences.
Project 10 Assignment Checklist
-
Jupyter Lab notebook with your code, comments and output for the assignment
-
firstname-lastname-project10.ipynb
.
-
-
Python file with code and comments for the assignment
-
firstname-lastname-project10.py
-
-
Submit files through Gradescope
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |