Training Research Seminar - Jason Anastasopoulos - Friday, February 5th, 2021

Data Visualization - Light Blue Seminar.

  • Date: 05 February 2021 from 14:00 to 17:00

  • Event location: On line with TEAMS

  • Access Details: Free admission

Introduction 

 

Python is rapidly becoming the preferred language of data scientists in both industry and academia. It’s used by Google, Facebook and other tech giants to perform data analysis and run machine learning algorithms that can handle hundreds of thousands of terabytes of data per day. 

 

Python can be used for:

  • Storing and analyzing large and small datasets. 
  • Web scraping and data collection using APIs.
  • Beautiful data visualization.
  • Natural language processing and text analysis. 
  • General machine learning.
  • Deep learning.
  • Image analysis and much, much more...

 

How you will benefit from this seminar

 This seminar is an intermediate course on statistical computing with Python. The goal is to get participants to learn about advanced data analysis and visualization applications of the Python language. 

 

By the end of this seminar you will be able to do: 

  • Advanced data visualization: Advanced Python plotting functionality. This includes: plotting geospatial data and plotting text data.
  • Statistical inference: Perform data analysis and basic statistical inference with Python, including: GLMs, ANOVA and hypothesis testing.


WHO SHOULD ATTEND

This seminar is designed for students who already have basic programming skills in Python and want to learn more advanced applications typically used by data scientists and academic researchers.

 

This course assumes that you have already completed Python for Data Analysis or a similar introduction to Python course. 

 

COMPUTING

This is a hands-on class that will involve at least two hours of structured and supervised assignments. To ensure that you are prepared, you must do the following BEFORE the first class:

 

You should also know how to access the command prompt (Windows users) or the terminal (Mac users). We will briefly review how to access these in class, but it will save you time and effort if you come already knowing these basics. You can get resources on the internet that will help you get started with the Windows Command Prompt or the Mac Terminal .

 

MATERIALS

Participants receive access to a private repository containing all of the lecture notes, code and data needed for the class.


SEMINAR OUTLINE

Advanced Data Analysis and Visualization

   I. General statistical inference and model visualization

       °  Linear regression.

       °  Generalized Linear Models.

       °  Time series analysis.

  II. Data visualization:

       °  Data visualization in Pandas 

       °  Basic plots: Scatterplots, line plots, heatmaps. 

       °  Distributions: Densities, box plots, histograms.

III. Advanced topics in data visualization

       °  Making beautiful plots with Seaborn.

       °  Geospatial data visualization.