TDM 30100: Project 2 — 2022
Motivation: Documentation is one of the most critical parts of a project. There are so many tools that are specifically designed to help document a project, and each have their own set of pros and cons. Depending on the scope and scale of the project, different tools will be more or less appropriate. For documenting Python code, however, you can’t go wrong with tools like Sphinx, or pdoc.
Context: This is the first project in a 3-project series where we explore thoroughly documenting Python code, while solving data-driven problems.
Scope: Python, documentation
Dataset(s)
The following questions will use the following dataset(s):
-
/anvil/projects/tdm/data/apple/health/watch_dump.xml
Questions
In this project we will work with pdoc
to build some simple documentation, review some Python skills that may be rusty, and learn about a serialization and deserialization of data — a common component to many data science and computer science projects, and a key topics to understand when working with APIs.
For the sake of clarity, this project will have more deliverables than the "standard" .ipynb
notebook, .py
file containing Python code, and PDF. In this project, we will ask you to submit an additional PDF showing the documentation webpage that you will have built by the end of the project. How to do this will be made clear in the given question.
Make sure to select 4096 MB of RAM for this project. Otherwise you may get an issue reading the dataset in question 3. |
Question 1
Let’s start by navigating to ondemand.anvil.rcac.purdue.edu, and launching a Jupyter Lab instance. In the previous project, you learned how to run various types of code in a Jupyter notebook (the .ipynb
file). Jupyter Lab is actually much more useful. You can open terminals on Anvil (the cluster), as well as open a an editor for .R
files, .py
files, or any other text-based file.
Give it a try. In the "Other" category in the Jupyter Lab home page, where you would normally select the "f2022-s2023" kernel, instead select the "Python File" option. Upon clicking the square, you will be presented with a file called untitled.py
. Rename this file to firstname-lastname-project02.py
(where firstname
and lastname
are your first and last name, respectively).
Make sure you are in your |
Read the "3.8.2 Modules" section of Google’s Python Style Guide. Each individual .py
file is called a Python "module". It is good practice to include a module-level docstring at the top of each module. Create a module-level docstring for your new module. Rather than giving an explanation of the module, and usage examples, instead include a short description (in your own words, 3-4 sentences) of the terms "serialization" and "deserialization". In addition, list a few (at least 2) examples of different serialization formats, and include a brief description of the format, and some advantages and disadvantages of each. Lastly, if you could break all serialization formats into 2 broad categories, what would those categories be, and why?
Any good answer for the "2 broad categories" will be accepted. With that being said, a hint would be to think of what the serialized data looks like (if you tried to open it in a text editor, for example), or how it is read. |
Save your module.
Relevant topics: pdoc, Sphinx, Docstrings & Comments
-
Code used to solve this problem.
-
Output from running the code.
Question 2
Now, in Jupyter Lab, open a new notebook using the "f2022-s2023" kernel.
You can have both the Python file and the notebook open in separate Jupyter Lab tabs for easier navigation. |
Fill in a code cell for question 1 with a Python comment.
# See firstname-lastname-project02.py
For this question, read the pdoc section, and run a bash
command to generate the documentation for your module that you created in the previous question, firstname-lastname-project02.py
. To do this, look at the example provided in the book. Everywhere in the example in the pdoc section of the book where you see "mymodule.py" replace it with your module’s name — firstname-lastname-project02.py
.
Use We are expecting you to run the command in a
Then you can run your command.
|
Use the For example, I used |
Once complete, on the left-hand side of the Jupyter Lab interface, navigate to your output directory. You should see something called firstname-lastname-project02.html
. To view this file in your browser, right click on the file, and select Open in New Browser Tab. A new browser tab should open with your freshly made documentation. Pretty cool!
Ignore the |
You may have noticed that the docstrings are (partially) markdown-friendly. Try introducing some markdown formatting in your docstring for more appealing documentation. |
Relevant topics: pdoc, Sphinx, Docstrings & Comments
-
Code used to solve this problem.
-
Output from running the code.
Question 3
When I refer to "watch data" I just mean the dataset for this project. |
Write a function to called get_records_for_date
that accepts an lxml
etree (of our watch data, via etree.parse
), and a datetime.date
, and returns a list of Record Elements, for a given date. Raise a TypeError
if the date is not a datetime.date
, or if the etree is not an lxml.etree
.
Use the Google Python Style Guide’s "Functions and Methods" section to write the docstring for this function. Be sure to include type annotations for the parameters and return value.
Re-generate your documentation. How does the updated documentation look? You may notice that the formatting is pretty ugly and things like "Args" or "Returns" are not really formatted in a way that makes it easy to read.
Use the -d
flag to specify the format as "google", and re-generate your documentation. How does the updated documentation look?
The following code should help get you started.
output
[<Element Record at 0x7ffb7c27a440>, <Element Record at 0x7ffb7c27a480>, <Element Record at 0x7ffb7c27a4c0>, <Element Record at 0x7ffb7c27a500>, <Element Record at 0x7ffb7c27a540>, <Element Record at 0x7ffb7c27a580>, <Element Record at 0x7ffb7c27a5c0>, <Element Record at 0x7ffb7c27a600>, <Element Record at 0x7ffb7764e3c0>, <Element Record at 0x7ffb7764e400>, <Element Record at 0x7ffb7764e440>, <Element Record at 0x7ffb7764e480>, .... |
The following is some code that will be helpful to test the types.
|
To loop through records, you can use the
|
Relevant topics: pdoc, Sphinx, Docstrings & Comments
-
Code used to solve this problem.
-
Output from running the code.
Question 4
This was hopefully a not-too-difficult project that gave you some exposure to tools in the Python ecosystem, as well as chipped away at any rust you may have had with writing Python code.
Finally, investigate the official pdoc documentation, and make at least 2 changes/customizations to your module. Some examples are below — feel free to get creative and do something with pdoc outside of this list of options:
-
Modify the module so you do not need to pass the
-d
flag in order to let pdoc know that you are using Google-style docstrings. -
Change the logo of the documentation to your own logo (or any logo you’d like).
-
Add some math formulas and change the output accordingly.
-
Edit and customize pdoc’s jinja2 template (or CSS).
For this project, please submit the following files:
# read in the watch data tree = lxml.etree.parse('/anvil/projects/tdm/data/apple/health/watch_dump.xml') chosen_date = datetime.strptime('2019/01/01', '%Y/%m/%d').date() my_records = get_records_for_date(tree, chosen_date) my_records
|
Relevant topics: pdoc, Sphinx, Docstrings & Comments
-
Code used to solve this problem.
-
Output from running the code.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |