Final Project Guidelines
Course: Thinking Machines
Duration: 6 Weeks
Project Overview
Create a comprehensive data analysis project using Python, culminating in an published webpage including data, visualization, and analytical techniques.
Objectives
- Data Analysis: Find a relevant data set and convert into a machine-readable dataframe.
- Visualization: Visualize the data set using graphics and/or plotting libraries.
- Communication: Compile findings into a structured report.
Project Requirements
1. Topic Selection
- Choose a research question allowing meaningful analysis, using public data.
- Does the electoral college favor large, small, rich, poor, white, or diverse states?
- Is gender demographics of a college major associated with higher lifetime expected earnings?
- Does raising the minimum wage raise unemployment rate?
- Why did East Germany win so many gold medals?
- Get instructor approval by Friday, 18 Oct
- Shouvik recommends the following places to find datasets: Click!
2. Data Analysis & Modeling
- Use NumPy and Pandas to make the data set machine readable.
- Use SciPy and SciKit to uncover patterns and insights.
- Apply at least one statistical or machine learning model to answer research questions.
3. Visualization
- Use tools like MatPlotLib or Plotly to create at least three visualizations.
4. Report Creation
- Document work in Colab and convert the Colab documentation to an HTML report.
- Include prose explanations, code snippets, outputs, and visualizations.
5. Technical Implementation
- Use Git and GitHub for version control.
- Host the final report using GitHub Pages.
- Ensure well-documented and clean code.
Timeline and Milestones
Week 1: Project Proposal (Due: Friday, 18 Oct)
Deliverable: 1 page proposal including topic, significance, data source, research question, and what help you will need from instructors.
Shouvik provides the following sample proposal: Click!
Week 2: Data Acquisition & Preliminary Analysis (Due: Friday, 25 Oct)
Deliverable: Colab document containing preliminary dataset and initial results (summary stats, visualizations).
Week 3: In-Depth Analysis & Visualization (Due: Friday, 1 Nov)
Deliverable: Colab document containubg draft of analysis with models, findings, and supporting visualizations.
Week 4: Draft Report & Peer Review (Due: Wednesday, 6 Nov)
Deliverable: Full draft HTML report for peer review and feedback.
Week 5: Final Report (Due: Friday, 15 Nov)
Deliverables: Final HTML report on Github Pages.
Week 6: Final Presentation (Week of 18 Nov)
Deliverables: 10-minute presentation.
Grading Rubric (100 Points Total)
Criteria |
Points |
Project Proposal |
10 |
Data Analysis & Modeling |
10 |
Visualizations |
10 |
Report Quality |
20 |
Technical Implementation |
30 |
Presentation |
20 |
Collaboration and Tools
- Schedule regular team meetings, divide responsibilities clearly, and resolve conflicts promptly.
- Use Python libraries for data manipulation (pandas, NumPy), visualization (Matplotlib, Seaborn, Plotly), and modeling (scikit-learn, statsmodels).
- Use Jupyter Notebooks for code integration and nbconvert/Jupyter Book for HTML reports.
- Host the report on GitHub Pages, Netlify, or similar.
Suggested Data Sources
- Open data repositories: Kaggle, UCI Machine Learning Repository, Data.gov.
- APIs for data collection: Twitter API, OpenWeatherMap API.
The final project should demonstrate your ability to answer a meaningful question or hypothesis about your dataset through analysis, modeling, and visualization.