pre-release: DePy meeting announcement

Please take a moment to review your details and reply with OK or edits.
Subject and below is what will go out and also will be used to title the videos.

Subject: 
ANN: DePy at Daley 505 Fri May 6, 9:30p


DePy
=========================
When: 9:30 AM Friday May 6, 2016
Where: Daley 505


http://mdp.cdm.depaul.edu/DePy2016/default/

Topics
------
1. Introduction to Data Science in Python
Jonathan Gemmell

In recent years, many data science tools have been built for Python users.  This hands-on tutorial will introduce data structures such as Python lists, Numpy arrays and Pandas DataFrames. Common data mining algorithms will be presented including regression, naive Bayes, random forests and more. We will conclude the tutorial by exploring common data visualization techniques.
 recording release: yes license:   

2. From 0 to 60 in Web2py
Massimo Di Pierro

Web2py is a web framework for rapid development of database driven web applications. In this tutorial we will dive in some of the main web2py features including the Web based IDE, the Model-View-Controller architecture, the Authenti-
cation/Authorization API, the Database Abstraction Layer, and some some more advanced features, including components, CAS, and integration with Ractive JS.
 recording release: yes license:   

3. Introduction to NetworkX: Social Network Analysis in Python
Robin Burke

NetworkX is a well-known Python package for manipulating and computing with network structures. In this tutorial, we will explore NetworkX's capabilities (and some limitations) for social network analysis.
 recording release: yes license:   

4. 1A First Course in Deep Learning
Jeremy Watt, Reza Borhani

Abstract: Due to their wide applicability, the tools of Deep Learning have quickly become some of the most important for today’s researchers in computer vision, machine learning, robotics, and related fields. In particular, over the past several years the tools of Deep Learning have been used to great effect in both academia and industry, producing state of the art results on a variety of challenging computer vision and speech recognition problems1,2,3,4,5. However understanding the foundations of deep learning can be intimidating to the uninitiated, as can comprehending the details of their implementation which demands an understanding of numerical optimization often unfamiliar to those with a traditional computer science or engineering background. In this tutorial we provide a user-friendly introduction to the basic tools of Deep Learning, describe their many applications, discuss how they relate to more traditional ideas in machine learning, and provide an introduction to most useful techniques from numerical optimization crucial to their implementation. To make full use of this tutorial one only needs a basic understanding of linear algebra and vector calculus. No prior knowledge of numerical optimization or machine learning is expected. Additionally, we intend to provide Python code for all demos presented in this talk.
 recording release: yes license:   

5. Breakfast


(Needs description.) 
 recording release: yes license:   

6. Opening Remarks


(Needs description.) 
 recording release: yes license:   

7. Mathematical Optimization for Machine Learning
Jeremy Watt, Reza Borhani

In this talk we provide a user-friendly introduction to mathematical optimization for machine learning by essentially answering three important questions: (i) what is mathematical optimization, (ii) why should a machine learning researcher/practitioner learn it, and (iii) how does it actually work?

Every machine learning problem has parameters that must be tuned properly to ensure optimal learning. As a simple example consider the case of linear regression with one dimensional input, where the two parameters slope and intercept of the linear model are tuned by forming a 'cost function' - a continuous function in both parameters - that measures how well the linear model fits a dataset given a value for its slope and intercept. The proper tuning of these parameters via the cost function corresponds geometrically to finding the values for the parameters that make the cost function as small as possible or, in other words, ’minimize’ the cost function. The tuning of these parameters is accomplished by a set of tools known collectively as mathematical optimization.

Mathematical optimization, as the formal study of how to properly minimize cost functions, is used not only in solving virtually every machine learning problem (regression, classification, clustering, etc.), but reasons in a variety of other fields including operations, logistics, and physics. As a result, a mere working knowledge of how to use existing pre-packaged solvers will not be adequate for any serious machine learning developer who wants to code-up their own implementation or tailor existing algorithms to a specific application.

The lion’s share of this talk is dedicated to showing how to implement widely-used optimization schemes in Python. We plan to do so by introducing the concept of iterative methods and presenting two extremely popular iterative schemes: gradient descent and Newton’s method. This will be followed by a discussion of stochastic gradient descent – a variant of gradient descent often referred to as the Backpropagation algorithm, most suitable for today’s large datasets. Live Python demos will be run for all algorithms discussed here.

This talk is based on a forthcoming machine learning textbook (Machine Learning Refined; Cambridge University Press, 2016) co-authored by the speakers: Reza Borhani and Jeremy Watt (PhD, Computer Science, Northwestern University). This text has also been the source for a number of quarter length university courses on machine learning, deep learning, and numerical optimization for graduate and senior level undergraduate students. The speakers have also given/plan to give a number of tutorials on deep learning at major computer vision and AI conferences including CVPR, AAAI, ICIP, WACV, and more.
 recording release: yes license:   

8. Pre-Modeling: Data Preprocessing and Feature Exploration in Python
April Chen

Data preprocessing and feature exploration are crucial steps in a modeling workflow. In this tutorial, I will demonstrate how to use Python libraries such as scikit-learn, statsmodels, and matplotlib to perform pre-modeling steps. Topics that will be covered include: missing values, variable types, outlier detection, multicollinearity, interaction terms, and visualizing variable distributions. Finally, I will show the impact of utilizing these techniques on model performance. Interactive Jupyter notebooks will be provided.
 recording release: yes license:   

9. The UCSC IoT Python Platform
Luca De Alfaro

Prof. Luca de Alfaro works in the areas of reputation systems for ecommerce and collaboration, crowdsourcing, game theory, and formal methods for system design and verification. Luca is author of many tools including Crowdgrader and Similcheck. He is also the CTO of Camio.com. This talk is about a new tool for the Internet of Things that he is about to release.
 recording release: yes license:   

10. Lunch


(Needs description.) 
 recording release: yes license:   

11.  Pystan: Bayesian Inference for Fun and Profit
Stephen Hoover

Probabilistic programming languages offer a flexible and expressive way to model data by treating random variables as first-class objects. Stan is a popular and well-supported library which allows users to write models in the Stan programming language and use MCMC methods to perform Bayesian inference. Stan itself is written in C++, and has a Python interface through the PyStan package. In this talk, I'll show off some of the capabilities of PyStan and go through a simple practical example of Bayesian inference in Python.
 recording release: yes license:   

12. Mapping Collaborations and Knowledge in Scientific Research and Wikipedia: A How-to Approach
Angel Yanguas-Gil, Elsa Alvaro

While our primary job as researchers is to generate new knowledge, it is equally important for us to understand what it is already out there. In this talk we present two examples in which Python was the enabling tool that allowed us to crunch that information in a time-efficient way: in the first example, our goal was to map a field of scientific research based on the analysis of bibliographic data to understand who, where, when, and what is being published. In particular we focused on the field of Atomic Layer Deposition, a materials synthesis technique that, among other things, has become a key component of semiconductor manufacturing, and it is one of the author's (AY) key areas of expertise. The second example focuses on mapping the evolution of Wikipedia's published content around a particular scientific discipline.

In both cases we used a similar approach: we took the simplest possible approach that minimized development/learning time, prioritizing lightweight, native code and the use of standard libraries over performance. In this talk we will begin by emphasizing the methodology that we followed, addressing questions such as how to transform bibliographic data into dataframes and graphs, and the approach that we took to parse Wikipedia. We will then provide an overview of the results to exemplify Python's capabilities for this kind of data analysis.
 recording release: yes license:   

13. Business Data Processing Using Python
Gregory Dover

Businesses of all sizes can benefit from Python and its associated frameworks. The developer tools are extensive but the community of software engineers, programmers and analyst need a systematic approach that will guarantee a successful deployment on behalf of the business.

Business Data Processing is a term that is not used regularly. It is relegated to the background while buzzwords like Apps, Full Stack Programming, Web Services and Cloud Computing dominate the landscape. Emphasis need to be placed back on Business Data Processing due to the ever changing needs of business and the flexibility afforded the technical community that utilize Python and its associated framework and tools to solve business problems.

A worthy approach to business data processing using Python will be explored. Simplistic discussion on business values, requirements, agile development, user experience and support will be reviewed.
 recording release: yes license:   

14. A Universal Carving Approach for Database Forensic Analysis
James Wagner

Forensic tools assist analysts with recovery of data and understanding system events, even when working with corrupted data storage. These tools rely on "file carving" techniques to restore files with damaged metadata by analyzing raw file content. While much of the sensitive data is stored and processed by databases, file carving tools for databases are practically non-existent because most databases (particularly commercial ones) do not document their storage formats. Internally, database content is kept in individual "pages" and follows a unique, yet consistent, set of rules for storage and maintenance. By directly accessing raw database storage, we can recover corrupted contents and reveal user activities that are hidden even from database administrators.

There are a number of database-specific tools developed for recovery and monitoring purposes but they are surprisingly limited in face of corruption or "unintentional" side-effects caused by normal database execution. In this talk, we present a universal tool that seamlessly supports many different databases, rebuilding table and other data content from any remaining storage fragments on disk or in memory. We also demonstrate just how much activity takes place under the hood of a database and present an overview of some things that can be discovered by directly investigating database internals.
 recording release: yes license:   

15. Coffee Break


(Needs description.) 
 recording release: yes license:   

16. Python Powered Restful API Basics with Web2py
Mark Graves

Web2py's rapid prototyping capabilities can be leveraged to bootstrap quickly through initial prototypes into full production environments. RESTful API construction can be a powerful developer tool to modularizing applications for maintenance and optimization. Basics of RESTful API design and implementation will be covered with a case study tutorial. Concepts will be expanded upon to leverage built in web2py functionality for rapid prototyping.
 recording release: yes license:   

17. Building and Distributing Python Software with Conda
Jonathan Helmus

Conda is a cross platform, package management system widely used in the scientific and data science Python communities. Conda can be used to package and distribute software written in any language but has first class support for Python packages. This talk will briefly cover how to use conda to install and manage data science packages as well as how conda can be used to create isolated computing environments. The main focus of the talk will be an in-depth look at how to easily and reproducibly create conda packages for your own Python software, and options for how to share these packages with others. Finally, combining a collection of conda packages into custom cross-platform installable conda-based Python distributions will be explored.
 recording release: yes license:   

18. Deploying Django to Heroku
Joseph Jasinski

Deploying a Python web application can be an involved process. It often requires you to be familiar with systems administration, database administration, networking, security and more. Platform as a service (PaaS) offering, helps reduce this complexity by removing the need to manage the infrastructure on your own, and enables you to focus more on software development. This talk will cover what is needed and how to run a Django application on the popular PaaS, Heroku. It will detail project architectural and configuration changes needed to allow your app to run on Heroku, how to configure external services (such as a database and media file hosting), and the basic commands needed to deploy and manage your site. Throughout this talk, we will use a working codebase to demonstrate.
 recording release: yes license:   

19. Building Alexa Skills w/ AWS Lambda & Python
Aleksandar Velkoski

In this talk, I'll demonstrate the process of building Alexa Skills w/ AWS Lambda and Python. I'll talk about an Alexa Skill that I built called "Fred", which leverages the FRB Package in Python to query the Federal Reserve Bank's FRED API and return information about economic data series.
 recording release: yes license:   

20. To Catch a Cheat
Casey Schroeder

There are various game theoretic models of cheating. I give results from simulations of various models of cheating-in-game-play collected from my own open source python code (MIT). These results are extended to a "society at large" model of the result of cheating-at-game-play. I discuss some overt and less overt counter-measures to cheating-at-game-play in the context of online test taking.
 recording release: yes license:   

21. A Quick peek at Netlogo
Mike Tamillow

Mike fills in some spare time by show off Netlogo at DePy 2016
 recording release: yes license: CC BY-SA  

22. Closing Remarks


(Needs description.) 
 recording release: yes license:   



Location
--------
Daley 505


About the group
---------------