108

Assignment 2 | Bridges

In Assignment 2, you will work with an open dataset that contains information about bridges in Ontario. You can complete the whole assignment with only the concepts from Weeks 1 through 6 of the course. This handout explains the problem being solved, and the tasks to complete, for the assignment. Please read it carefully and in its entirety.

Logistics
  • Due Date: Thursday, March 11th at 11:59pm (Toronto Time)
  • Submission: You will submit your assignment solution on MarkUs.
  • Late Policy: There is a 1-hour grace period after the due date. Any submissions past the grace period will NOT be accepted.
  • No Remark Requests: No remark requests will be accepted. A syntax error could result in a grade of 0 on the assignment. Before the deadline, you are responsible for running your code and the checker program to identify and resolve any errors that will prevent our tests from running.
Goals of this assignment

The main goal of this assignment is that students will continue to use the Function Design Recipe, with an emphasis on the last two steps of the recipe (Body and Test Your Function). Assignment 2 lets you practice with more programming concepts than before; these are some of the goals for Assignment 2:

  • Students will be able to write loops (i.e., while, for) in the body of functions to implement their description
  • Students will be able to appropriately use a variety of data types (including lists and nested lists) through indexing, methods, etc.
  • Students will be able to apply what they’ve learned about mutability and mutate function inputs only when it is appropriate to do so
  • Students will be able to reuse functions to help them implement other functions according to their docstring description
  • Students will be able to read a real dataset from a file into Python and then use that data to test their functions
  • Students will learn to debug their code and use doctests for testing

Ontario Bridges

The Government of Ontario collects a huge amount of data on provincial programs and infrastructure, and much of it is provided as open data sets for public use. In this assignment, we’ll work with a particular dataset that contains information about provincially owned and maintained bridges in Ontario. All bridges in Ontario are reviewed every 2 years, and their information is collected to help determine when bridges need inspection and maintenance. The data you’ll be working with contains information about all bridges in the Ontario highway network, such as the length of the bridges, their condition over various years, and historical information.

We have included two datasets in your starter files: bridge_data_small.csv and bridge_data_large.csv (do not download the files from the Government of Ontario - use only the datasets in the starter files). These are Comma Separated Value (CSV) files, which contain the data in a table-format similar to a spreadsheet. Each row of the table contains information about a bridge. And each column of the table represents a “feature” of that bridge. For example, the latitude and longitude columns tell us the precise location of the bridge. If you would like to take a peek at the data, we recommend opening the files with a spreadsheet program (e.g., Excel) rather than using a Python IDE to open them.

Here is a screenshot of the data opened in Microsoft Excel (your computer may have a different spreadsheet program installed):

We can see that the first bridge is described on row 3. From row 3, column B, we can see that the bridge is named: Highway 24 Underpass at Highway 403. Subsequent columns include even more information about the bridge.

Inspecting Bridges

Ontario sends inspectors to check the condition of a bridge. The dataset contains a column (LAST INSPECTION DATE) showing the last date a bridge was inspected. When a bridge is inspected, it receives a score based on its condition. This score is called the Bridge Condition Index (BCI). The BCI is a number between 0 and 100, inclusive. You can see the most recent score in the dataset (CURRENT BCI), as well as past scores (the columns with years ranging from 2013 to 2000).

Fixing Bridges

If a bridge is in poor condition, it can be fixed (i.e., “rehabilitated”). These can be major or minor fixes. The dataset includes the year the last major (LAST MAJOR REHAB) or minor (LAST MINOR REHAB) rehabilitation was performed on the bridge.

Bridge Spans

A bridge is made up of one or more spans (# OF SPANS). A span “is the distance between two intermediate supports for a structure, e.g. a beam or a bridge. A span can be closed by a solid beam or by a rope” (Source: Wikipedia). Each span has a length associated with it (SPAN DETAILS). For example, if a bridge has two spans, the SPAN DETAILS data follows the following format:

Total=60.6  (1)=30.3;(2)=30.3;

More generally, the format is:

Total=[total length of all spans] (1)=[the length of the first span];(2)=[the length of the second span]; and so on for each span of the bridge;

Some things to notice about this format:

  • There is no semicolon after the total length
  • There is a semicolon after every span length
  • Each span length has a prefix of the form (x)= where x is a number starting from 1 and increasing by 1 for every span.

Ontario Bridges in Python

We will represent the dataset as a list of lists in Python (i.e., List[list]). The outer list has the same length as the number of bridges in the dataset. Each inner list (i.e., list) corresponds to one row (i.e., bridge) of the dataset. For example, here is what the first bridge of our dataset will look like in our Python program:

>>> MISSING_BCI = -1.0
>>> first_bridge = [
...     1, 'Highway 24 Underpass at Highway 403',
...     '403', 43.167233, -80.275567, '1965', '2014', '2009', 4,
...     [12.0, 19.0, 21.0, 12.0], 65.0, '04/13/2012',
...     [['2013', '2012', '2011', '2010', '2009', '2008', '2007',
...       '2006', '2005', '2004', '2003', '2002', '2001', '2000'],
...      [MISSING_BCI, 72.3, MISSING_BCI, 69.5, MISSING_BCI, 70.0, MISSING_BCI,
...       70.3, MISSING_BCI, 70.5, MISSING_BCI, 70.7, 72.9, MISSING_BCI]]
... ]

The variable first_bridge has the general type list. Notice how the elements inside the list are not all the same type. The list includes:

  • integers (e.g., 1, 4)
  • strings (e.g., '403', '04/13/2012')
  • floats (e.g., 65.0)
  • lists (e.g., [12.0, 19.0, 21.0, 12.0])

You may also notice that first_bridge is different from the first bridge in the dataset file itself. This is because the data has been cleaned to suit our needs. For example:

  • we have replaced the ID with an integer value.
  • we have replaced the spans with a list of floats, omitting the total ([12.0, 19.0, 21.0, 12.0])
  • we have replaced the BCI scores with two lists. The first is a list of strings containing the dates. The second is a parallel list (i.e., has the same length as the list of strings) of floats containing the scores, where empty scores are assigned the value MISSING_BCI

The data is not magically converted into this “clean” format - you will be implementing functions that transform the text data found in the files into a format that is more useful to our program.

Indexing with Constants

The bridge_functions.py file includes many constants to use when indexing the nested data. Much like Assignment 1, you should be using these constants in the bodies of your functions. For example, what should we write if we wanted to access the year a bridge was built?

Consider the following code that does not use the constants:

>>> # Assume that first_bridge is a list containing data
>>> first_bridge[5]
'1965'

How did I know to use index 5? Am I expected to memorize all the indexes? The answer is no; you should not write code like above. Instead, use the constants to provide context into which data feature you are accessing:

>>> # Assume that first_bridge is a list containing data
>>> first_bridge[COLUMN_YEAR_BUILT]
'1965'

The following table shows how the dataset file and Python data are related through constants that begin with the prefix COLUMN_. The table also includes the data type that you should expect to find if you were to index a bridge list (like first_bridge) using that constant.

Column Name Constant to use as Index Data Type
ID COLUMN_ID int
STRUCTURE COLUMN_NAME str
HWY NAME COLUMN_HIGHWAY str
LATITUDE COLUMN_LAT float
LONGITUDE COLUMN_LON float
YEAR BUILT COLUMN_YEAR_BUILT str
LAST MAJOR REHAB COLUMN_LAST_MAJOR_REHAB str
LAST MINOR REHAB COLUMN_LAST_MINOR_REHAB str
# OF SPANS COLUMN_NUM_SPANS int
SPAN DETAILS COLUMN_SPAN_DETAILS List[float]
DECK LENGTH COLUMN_DECK_LENGTH float
LAST INSPECTION DATE COLUMN_LAST_INSPECTED str
CURRENT BCI N/A N/A
Remaining Columns COLUMN_BCI List[list]

Note that the COLUMN_ID in our inner list is an integer, which is very different from the ID column in the dataset.

Storing BCI Scores

Our inner list does not contain the CURRENT BCI column from the dataset (instead, you will implement a function to find the most recent BCI score). Moreover, the remaining columns in the dataset that contain BCI scores are stored in another list (at index COLUMN_BCI) with type List[list]. This list contains exactly two lists:

  • The first list is at INDEX_BCI_YEARS with type List[str] and includes the years in decreasing order.
  • The second list is at INDEX_BCI_SCORES with type List[float] and includes the BCI scores. Empty scores in the dataset have a value of MISSING_BCI.

These two lists are parallel lists and should have the same length. Consider the following example:

>>> # Assume that first_bridge is a list containing data
>>> # Assume that MISSING_BCI refers to the value -1.0
>>> first_bridge[COLUMN_BCI][INDEX_BCI_YEARS]
['2013', '2012', '2011', '2010', '2009', '2008', '2007', '2006', '2005', '2004', '2003', '2002', '2001', '2000']
>>> first_bridge[COLUMN_BCI][INDEX_BCI_SCORES]
[-1.0, 72.3, -1.0, 69.5, -1.0, 70.0, -1.0, 70.3, -1.0, 70.5, -1.0, 70.7, 72.9, -1.0]
>>> len(first_bridge[COLUMN_BCI][INDEX_BCI_YEARS])
14
>>> len(first_bridge[COLUMN_BCI][INDEX_BCI_SCORES])
14

From the example above, we can see that first_bridge has no BCI score in the year 2013 (see index 0 of both lists). But it does have a BCI score of 72.3 in the year 2012 (see index 1 of both lists). Therefore, first_bridge’s most recent BCI score is 72.3.

Locations and Calculating Distance

The bridges have their locations represented as a latitude and longitude, which we typically refer to as (lat, lon) for short. If you are curious, you can always search for a specific location online (e.g., with Google Maps):

It is very convenient to be able to calculate the straight-line distance between two locations. But this is actually a little tricky due to the curvature of the earth. We are providing you with the full implementation of a function, calculate_distance, that will accurately return the distance between two (lat, lon) points. You do not need to know how this function works - you only need to know how to use it.

What to do

At a high-level, your next steps are to:

  1. Open the file bridge_functions.py.
  2. Make sure that the file is in the same directory as a2_checker.py, and pyta.
  3. Complete the function definitions in bridge_functions.py.
  4. Test your Python file by using the Python shell, running the doctest examples, and running the a2_checker.py.

Unlike Assignment 1, we have provided you with the function headers, docstring description, and doctest examples for the functions you need to implement; you do not need to add or change them. The focus of Assignment 2 is implementing the bodies of these functions and testing them.

This assignment is divided into four parts. In parts 1 to 3, you will implement functions and test them using sample data that is already in bridge_functions.py. In part 4, you will implement functions that allow us to clean the data from the original dataset files so that we can use them in Python. Once you are done part 4, you will be able to test your functions from parts 1 to 3 with real data!

Starter Files

Please download the Assignment 2 Files and extract the zip archive. After you extract the zip file, you should see a similar directory structure to:

   a2/
    ├─── pyta/
        ├─── many pyta files...
    ├─── a2_checker.py
    ├─── bridge_functions.py
    ├─── bridge_data_small.csv
    ├─── bridge_data_large.csv

In total, we have provided you with four files and a directory called pyta. We briefly discussed bridge_data_small.csv and bridge_data_large.csv above. The other two files are Python files:

  • bridge_functions.py

    This is the file where you will write your solution. Your job is to complete the file by implementing all the required functions. See below for more details.

  • a2_checker.py

    This is a checker program that you should use to check your code. You will not modify this file. See below for more information about a2_checker.py.

Part 1

In this part, you will focus on working with the data by searching through it. You should not mutate any of the list inputs to these functions. You should refer to the section Indexing with Constants for help.

  1. find_bridge_by_id(List[list], int) -> list

    Notes:

    • This function will be very useful in the rest of your program to get the data about one specific bridge. You should complete it first, then practice using it when you need a specific bridge in the other functions.
    • You must implement the body of this function according to its description.
    • You must not mutate the list argument(s).
  2. find_bridges_in_radius(List[list], float, float, int, List[int]) -> List[int]

    Notes:

    • This function helps us find all the bridges within a certain area. It becomes very useful when you reach Part 3.
    • You must implement the body of this function according to its description.
    • You must not mutate the list argument(s).
    • You should use the calculate_distance function in the body. See the Calculating Distance section for help.
  3. get_bridge_condition(List[list], int) -> float

    Notes:

    • You must implement the body of this function according to its description.
    • You must not mutate the list argument(s).
    • See the Storing BCI Scores section for help.
  4. calculate_average_condition(list, int, int) -> float

    Notes:

    • Be careful; the years are stored as strings, not integers, in our Python lists.
    • You must implement the body of this function according to its description.
    • You must not mutate the list argument(s).
    • See the Storing BCI Scores section for help.
    • Review the Week 6 PCRS module on “Parallel Lists and Strings” for help.
Part 2

In this part, you will focus on mutating the data. Notice how the wording of the docstring descriptions has changed (the descriptions do not begin with the word “Return”). Notice also that the return type for these functions is None, so nothing is returned.

  1. inspect_bridge(List[list], int, str, float) -> None

    Notes:

    • You must implement the body of this function according to its description.
    • Remember that the years a bridge has been inspected are stored in decreasing order (note: You should be adding an element to this list, not changing the value of an existing element.). See the Storing BCI Scores section for help.
    • Review the Week 6 PCRS modules on "List Methods" and “Mutability and Aliasing” for help.
  2. rehabilitate_bridge(List[list], List[int], str, bool) -> None

    Notes:

    • You must implement the body of this function according to its description.
    • See the Fixing Bridges section for help.
    • Review the Week 6 PCRS module on “Mutability and Aliasing” for help.
Part 3

In this part, you will implement an algorithm to help inspectors pick the sequence of bridges they should inspect. You will do this in two parts. First, by implementing a function that finds the bridge (from a subset of bridges) that is in the worst condition. Second, by implementing a function that targets the worst bridge within a certain radius, then moving on to the next bridge, until the desired number of bridges have been inspected. These functions will take time - make sure you start early and, if you are stuck, visit us in office hours.

  1. find_worst_bci(List[list], List[int]) -> int

    Notes:

    • You must implement the body of this function according to its description.
    • You must not mutate the list argument(s).
    • See the Storing BCI Scores section for help.
  2. map_route(List[list], float, float, int, int) -> List[int]

    Notes:

    • You must implement the body of this function according to its description.
    • You must not mutate the list argument(s).
    • Hint: use a while loop
Part 4

In this part, we will finally start working with the real data files. The clean_data function is already implemented for you. However, it won’t work correctly until you implement the functions below. Once you are done Part 4, then you can start loading the dataset files we have provided you. After that, test your functions using real data instead of just the docstring examples.

  1. clean_length_data(str) -> float

    Notes:

    • You must implement the body of this function according to its description.
  2. trim_from_end(list, int) -> None

    Notes:

    • You must implement the body of this function according to its description.
  3. clean_span_data(str) -> List[float]

    Notes:

    • You must implement the body of this function according to its description.
    • See the Bridge Spans section for help.
  4. clean_bci_data(list) -> None

    Notes:

    • You must implement the body of this function according to its description.
    • See the Storing BCI Scores section for help.

Testing Your Solutions

The last step in the Function Design Recipe is to test your function. You can use the a2_checker.py file to check for style and type contracts. You can use the doctest module to test the examples we have provided you in the docstrings. If you pass all of these tests, it does not mean that your function is 100% correct! You must do your own additional testing (e.g., by calling your functions with different arguments in the shell).

Using a2_checker.py

We are providing a checker module (a2_checker.py) that tests three things:

  1. Whether your code follows the Python style guidelines,
  2. Whether your functions are named correctly, have the correct number of parameters, and return the correct types
  3. Whether your functions are appropriately mutating their inputs (some functions should mutate, others should not)

To run the checker, open a2_checker.py and run it. Note: the checker file should be in the same directory as bridge_functions.py and pyta, as provided in the starter code zip file. After running the checker, be sure to scroll up to the top of the shell and read all the messages!

Using doctests

In this assignment, we have provided you with several doctest examples and sample data. These can be used as a quick test to see if your function works for a specific example. However, please note that being correct for one example does NOT mean your function is 100% correct. Be sure to test your code in other ways – as with Assignment 1, our own tests that evaluate your solution are hidden.

A quick way to run all the doctest examples automatically is by importing the doctest module and calling one of its functions. We have already included the code to do this for you at the bottom of the starter file:

if __name__ == '__main__':
    # Automatically run all doctest examples to see if any fail
    import doctest
    doctest.testmod()

Marking Scheme

This section describes the aspects of your work that may be marked for Assignment 2.

Coding Style (20%)

Make sure that you follow Python style guidelines that we have introduced and the Python coding conventions that we have been using throughout the semester. Although we don’t provide an exhaustive list of style rules, the checker tests for style are complete. So if your code passes the checker, then it will earn full marks for coding style with one exception: function reuse may be evaluated separately. For each occurrence of a PyTA error, a mark deduction will be applied. Make sure you review the CSC108 Python Style Guidelines for the rules on how to write a docstring description.

Correctness (80%):

Your functions should perform as specified. Correctness, as measured by our tests, will count for the largest single portion of your marks. Once your assignment is submitted, we will run additional tests not provided in the checker. Passing the checker does not mean that your code will earn full marks for correctness.

No Remark Requests

As mentioned earlier: No remark requests will be accepted for code that didn't run and thus failed all the tests. This means a simple syntax error could result in a grade of 0 on the assignment. Before the deadline, you are responsible for running your code and the checker program to identify and resolve any errors that will prevent our tests from running.

What to Hand In

The very last thing you do before submitting should be to run the checker program one last time. Otherwise, you could make a small error in your final changes before submitting that causes your code to receive zero for correctness. Submit bridge_functions.py to Assignment 2 (i.e., A2) on MarkUs. Remember that spelling of filenames, including case, counts: your file must be named exactly as above.