Welcome to BANA 4373 :: Advanced Analytics (BANA 4373)

Welcome to BANA 4373

[Welcome Video]

This will be fast-paced class, requiring a substantial amount of time invested outside of class in mastering basic coding and data analytical skills, through simple but progressively more demanding exercises. The class will introduce the application of data visualization, statistical, econometric, and machine learning methods to real world data, by creating demonstrations and simulations using the Python software platform and its data science-oriented data visualization and analysis packages. The learning goal is for each student to have a working knowledge of both key concepts, and basic software tools, needed to apply advanced visualization and analysis tools to real world data, to better inform real world decision making.

The Python software tools in which all students will gain some proficiency are components in the most widely used, non-proprietary, open data science software platform, and readily allow access to excellent visualization, statistical, and econometric analysis tools capable of handling even the largest public datasets. The same software platform can also be integrated with, and run, R code for statistical analysis.

The intention is to make this class doubly valuable to a student interested in financial analysis. First, the class will introduce you to cutting edge computer software tools that can be applied to real data for practical purposes. Second, the class is designed to motivate further learning of statistical, econometric, and machine learning concepts by showing that they can be simply and practically applied to real world data, and to give you some first-hand experience in doing this this.

Much of the learning will be structured as completion of data analysis exercises.

The Python data science software platform is increasingly being used by organizations and businesses to undertake finance-relevant analysis. For examples of some interesting and useful Jupyter notebooks documenting policy relevant data analysis reported by online journalists, see BuzzFeed.

Examples:

The Jupyter notebooks in these archives can also give you valuable insights on how to do useful things when analyzing and visualizing large scale data.

The class assumes you have previously taken an introductory statistics course as a prerequisite. Lectures will be based on interactive Python notebooks (aka Jupyter notebooks). Students will follow along class lectures using a training environment in AWS. All students must read all assigned reading, since this will be assumed as background to all the Jupyter notebook content on statistical, econometric, and machine learning concepts we go through in class. There are no computer programming prerequisites, but you will need to bring to class a personal computer with the Anaconda distribution of Python installed (more specific instructions will be distributed prior to the first class).

This class will introduce the application of data visualization, statistical, econometric, and machine learning methods to real world data, using the Python software platform and its data science-oriented data visualization and analysis packages. The learning goal is for each student to finish the class with a working knowledge of basic financial analysis concepts and data science software tools, and their application to real world data, to better inform real world decision making.

The Python software tools we will use are components in the most widely used, non-proprietary, open data science software platform, and readily allow access to excellent visualization, statistical, and econometric analysis tools capable of handling even the largest datasets. The same software platform can also be integrated with, and run, R code for statistical analysis.

This class is intended to be doubly valuable to a student interested in public policy. First, the class will introduce you to cutting edge computer software tools that can be applied to real data for practical policy purposes (and hopefully both give you some advantages in post-graduation job markets, and facilitate future acquisition of even more advanced skills over the rest of your careers). Second, the class is designed to motivate learning statistics and econometric concepts by showing that they can be simply and practically applied to real world data, and to give you some first-hand experience in doing so.

Much of the learning will be structured as completion of data analysis exercises. In addition to these exercises, every student will undertake two small group analysis projects, with in-class presentation, discussion, and critique. Every student will also present and submit an individual final empirical data analysis project, in the form of a Jupyter (interactive Python) notebook.

The class assumes you have previously taken an introductory statistics course as a prerequisite. Lectures will be based on interactive Python notebooks (aka Jupyter notebooks). Students will follow along class lectures using open source data science software installed on a personal laptop computer (Windows, Mac, or Linux). All students must read all assigned reading, since this will be assumed as background to all the Jupyter notebook content on statistical, econometric, and machine learning concepts we go through in class.

There are no computer programming prerequisites, but you will need to bring to class a personal computer with the Anaconda distribution of Python installed (more specific instructions are given below). In addition to other course requirements, this class will require completion of approximately 12 hours of introductory online courses covering basic skills in the Python analysis and visualization software we will be using. The online course modules assume no software experience or other prerequisites.

The class itself will be structured as a group walkthrough of a Jupyter notebook containing the key concepts and examples that serve as the foundation for that week’s material. Time will be reserved for in-class laptop “data analysis lab” exercises, to provide you with real time feedback on conceptual or programming questions you may have.

A warning

This class is likely to be a substantial amount of work if this is the first time you have ever tried to do simple programming on a computer. In addition, working through and understanding statistical and econometric concepts can be demanding. But there should be a real payoff in useful things you know about, and know how to do, by the end of semester.

Every student will be asked to participate in three project presentations to the class. These exercises will be oral presentations of analyses and solutions to a real world data analysis problem. Two of these will be group projects, and one an individual project. In addition, each student will be asked to complete problem sets, and to submit their final empirical project. The two group presentations will count for 30% of the final grade, the problem sets for 20%, the individual project presentation for 20%, and the final project submission for 30%.