TDM 10200: Project 1 — 2024

Motivation: In this project we are going to jump head first into The Data Mine. We will load datasets into our environment, and introduce some core programming concepts like variables, vectors, types, etc. As we will be "living" primarily in an IDE called Jupyter Lab, we will take some time to learn how to connect to it, configure it, and run code.

Context: This is our first project this spring semester. We will get situated, configure the environment we will be using throughout our time with The Data Mine, and jump straight into working with data!

Scope: Python, Jupyter Lab, Anvil

Learning Objectives
  • Learn how to run Python code in Jupyter Lab on Anvil.

  • Read and write basic (csv) data using Python.

Dataset(s)

The questions in this project will use the following dataset(s):

  • /anvil/projects/tdm/data/forest/ENTIRE_COUNTY.csv

Readings and Resources

Login to Anvil. Navigate and login to ondemand.anvil.rcac.purdue.edu using your ACCESS credentials (and Duo Mobile). You may refer to Anvil Introduction and Access Setup to find out how to Login to Anvil.

Questions

Question 1 (2 pts)

  1. Copy the template into your home directory and save it, to start your Project 1.

This year, the first step to starting any project should be to get a copy of our project template using File / Open from URL and downloading the template from: the-examples-book.com/projects/current-projects/_attachments/project_template.ipynb

Open the project template and then save it into your home directory, in a new notebook named firstname-lastname-project01.ipynb.

Fill out the project template, replacing the default text with your own information. If a category is not applicable to you (for example, if you did not work on this project with someone else), put N/A.

To learn more about how to run various types of code using this kernel, see our template page.

Question 2 (2 pts)

In the upper right-hand corner of your Jupyter Lab notebook, you will see the current kernel for the notebook, seminar (do not use seminar-r for this project). If you click on the name of the kernel, then you will have the option to change kernels.

  1. In the first cell, write: print("Hello World!") and then type Control-Return to run the Python code in the cell. The output: Hello World! should appear below the cell after you run it.

print("Hello World!")

Question 3 (2 pts)

  1. Ask the user to input an integer, and assign it to a variable named num1 like this: num1 = int(input("Enter an integer: "))

  2. Ask the user to input a second integer, using a second prompt, and store this second result into a variable named num2.

  3. Add the values of num1 and num2 and print a string that says: The sum of the two numbers is: [result here]

There are different data types in Python. Some of the built in types include:

  • Integer (int)

  • Float (float)

  • string (str)

  • types can include list, tuple, range

  • Mapping data type (dict)

  • Boolean type (bool)

Numeric

  1. int - holds signed integers of non-limited length.

  2. long- holds long integers (exists in Python 2.x, deprecated in Python 3.x).

  3. float- holds floating precision numbers and it is accurate up to 15 decimal places.

  4. complex- holds complex numbers.

String - a sequence of characters, generally strings are represented by single or double-quotes

List - ordered sequence of data written using square brackets [] and commas (,).

Tuple - similar to a list but immutable. Data is written using a parenthesis () and commas (,).

Dictionary - an unordered sequence of key-value pairs.

Question 4 (2 pts)

Read this StackOverflow page:

  1. Now declare a list named myfruits and make a loop that asks the user for the names of 5 fruits, and adds each fruit to the list.

  2. Print the list of the 5 fruits. (Any format of output is OK, as long as you print all 5 fruits.)

When you have a range in Python, the last number in the range is not used. So, for instance, if you use for i in range(1,5): then Python will only ask for 4 fruits. In Python, you would need to use for i in range(1,6): since the last number in the range is ignored!

Question 5 (2 pts)

  1. Read "/anvil/projects/tdm/data/forest/ENTIRE_COUNTY.csv" dataset in Python and assign it to a variable named forest

  2. Use different methods to display the dataset information, for instance:

    1. info, info(), shape, size, columns, len

It is OK to use Google to find webpages to help with our projects. Please document any webpages that you use for help, when you are working on the project. You need to list all such webpages in the project, either at the start of the template or within the question directly. For instance, when Dr Ward used Google to help with this question, this webpage was useful:

To import the dataset for this question, this code should work:

import pandas as pd
forest = pd.read_csv("/anvil/projects/tdm/data/forest/ENTIRE_COUNTY.csv")

Submit your completed Project 1: one Jupyter notebook and one Python script file.

Now that you are done with the project. For this course, we will turn in a variety of files, depending on the project.

We will always require a Jupyter Notebook file built from the template described above. Jupyter Notebook files always end in an extension .ipynb. This file is our "source of truth", and it is what the graders will look at first, when grading the projects.

If we are working Python, we will also need you to build a Python file (ending with a .py extension too). Please see the note below.

An .ipynb file is generated by first running every cell in the notebook, and then clicking the "Download" button from File  Download.

In addition to the .ipynb, if a project uses Python code., you will need to also submit a Python script. A Python script is just a text file with the extension .py.

Let’s practice. Take the Python code from this project and copy and paste it into a text file with the .py extension. Call it firstname-lastname-project01.py. Download your .ipynb file — making sure that the output from all of your code is present and in the notebook. (The .ipynb file will also be referred to as "your notebook" or "Jupyter notebook".)

Once complete, submit your notebook, and submit your Python script. You need to submit them to Gradescope together, as one submission, at the same time, because Gradescope only keeps track of the last submission to each project.

You must double check your .ipynb after submitting it in gradescope. A very common mistake is to assume that your .ipynb file has been rendered properly and contains your code, markdown, and code output, when in fact it does not.

You will not receive full credit if your .ipynb file does not contain all of the information you expect it to, or it does not render properly in gradescope. Please ask a TA if you need help with this.

Project 01 Assignment Checklist

  • Jupyter Lab notebook with your code, comments, and output for the assignment

    • firstname-lastname-project01.ipynb.

  • Python file for the assignment

    • firstname-lastname-project01.py.

  • Submit your files through Gradescope

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.