Data Exploration in Python
Abstract
This course provides an introduction to the fundamentals of programming in Python. Students will gain experience designing, implementing, and testing their Python code, as well as in using Jupyter Notebooks, and IPython for statistics and data analysis. Multiple programming paradigms will be explored. The course covers Python data types, input, and output, and control flow in the context of preparing, cleaning, transforming, and manipulating data. In addition, students will use Python to conduct exploratory data analyses, including computing descriptive statistics.
Coordinates and Contact
- Instructor: Tom McAndrew
- Email: mcandrew@lehigh.edu
- Office coordinates: HST 175
- Office hours: To be voted on by students | By appointment
Class Logistics and Resources
Class Time and Location
- Tuesdays, Thursdays 1:35pm - 2:50pm in Maginnes 103
Tentative Timeline
Topic | Timeline | Optional/Extra Materials | Data Camp Lesson | Homework |
---|---|---|---|---|
Exploratory Data Analysis | ||||
Exploratory data Analysis Part I | Week 1 | Refer to Notes | Homework 01 | |
Data, Data Frames, and Pandas | Week 2 | Refer to Notes | Homework 02 | |
Data visualization | Week 3 | Refer to Notes | Homework 03 | |
Split-apply-combine | Week 4 | Refer to Notes | Homework 04 | |
Probability and Simulation | ||||
Set Theory and properties of probability; Conditional Probability, and Discrete Random variables | Week 5 and 6 | Refer to Notes | XX | Homework 05 |
Random variables | Week 7 and 8 | Refer to Notes | Hogg, McKean, Craig 1.7-1.10 | |
Bernoulli, Binomial, Poisson, Exponential, Gamma, Normal Densities | Week 9 | Refer to Notes | ||
Multivariate Normal Density | Week 10 | Refer to Notes | ||
Exam | Week 11 | |||
Simulation of Statistical Processes (Inverse CDF, Monte Carlo, and Accept/Reject) | Week 12 | Refer to Notes | ||
The Forward problem for several processes | Week 13 | Refer to Notes | ||
Web application building | ||||
Web-application and data viz | Week 14 | Refer to Notes | Data: FluView For Subtypes |
Textbook
The course does not require a textbook. However, there are many texts that can be suggested such as
- Introduction to Mathematical Statistics by Hogg, McKean, Craig, 8th Edition.
- Introduction to Probabiltiy Models by Sheldon Ross 10th Edition.
- an-introduction-to-probability-theory-and-its-applications
by William Feller
The course may use different books that are all available to Lehigh University students free of charge. The materials can be accessed for free via Lehigh Unviersity Library. The textbooks/materials are above in the Tentative timeline.
Policies
Attendance
Attendance is crucial. If sick, let the instructor know and stay home. An excused absence allows for makeup evaluations.
Collaboration
Work together, but submit your own answers. Collaboration on quizzes, midterms, and finals is not allowed.
Frustration
If frustrated, take a break, return later, and ask for help if needed.
Technology
Computing
We will use Python 3 for simulations and statistical training. There are several options for an integreated development environment to support Python programming. Popular choices are:
Jupyter Cloud provided by Lehigh: Lehigh University provides a Jupyter Cloud, an easy to use platform for programmign in Python here (Link ). Note to use Lehigh’s Jupyter platform while not on campus Wi-Fi, you will need to logon to the University VPN. Instructions for the VPN are here = Link
DataCamp: Most, if not all, weeks of this course will ask students to use DataCamp for Python programming training. DataCamp is provided by the University free of charge.
Assignments
Grading Breakdown
Item | Weight |
---|---|
Quizzes | 12.5% |
Homework | 65.0% |
Exam | 10.0% |
Final Project | 12.5% |
Homework
Homework is due in person, one week after assignment. Late homework grades are reduced as follows: \begin{align*} f(\text{grade}, \text{days late}) = \text{grade} \times e^{-0.35 \cdot \text{days late}} \end{align*} Late assignments beyond three days receive zero.
Exams
The Exam is an in-class exam on content up until that time. The Final is a take home project that asks students to build a fully functional web-application in python that will draw on all aspects of the course. Quizzes are due by midnight on class days. Quizes are quick and meant to test class engagement.
Datacamp
Datacamp assignments may accompany homeworks. For now, Datacamp is free for Lehigh students.
Extra Credit
Extra credit involves attending seminars and writing reflections, contributing as an additional quiz score. These opportunities will be mentioned during the course.
Accommodations for Students with Disabilities
Lehigh University provides accommodations through Disability Support Services. More details are available here .
Principles of Our Equitable Community
Lehigh University endorses The Principles of Our Equitable Community .