pre-release: DePy meeting announcement

Please take a moment to review your details and reply with OK or edits.
Subject and below is what will go out and also will be used to title the videos.

Subject: 
ANN: DePy at Room LL104 Fri May 29, 10p


DePy
=========================
When: 10 AM Friday May 29, 2015
Where: Room LL104

http://mdp.cdm.depaul.edu/DePy2015/

Topics
------
1. Tuning Machine Learning Parameters using scikit-learn Gridsearch
Kevin Goetsch

At Braintree we rely on scikit-learn's Pipeline and Gridsearch concepts to build and tune our predictive models. I'll talk through how this looks, the benefits, and the custom classes we've built out to augment this process to integrate Pandas dataframes. 
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3536/tuning-machine-learning-parameters-using-scikit-l 
2. Speeding Up Python Data Analysis Using Cython
Jonathan Helmus

Python is an ideal language for developing software for the analysis of scientific data.  Although packages such as NumPy, SciPy, and pandas can offer execution speeds similar to those possible using statically typed, compiled languages, oftentimes Python code is too slow for the task at hand.  Cython is a static compiler for Python which, with the addition of a few type declarations, allows Python code to execute at significantly faster speeds.  This talk will detail how to use Cython to optimize the run time of Python code for analysis of scientific data with examples taken from the development of algorithms in the Python ARM Radar Toolkit (Py-ART), an open source library for working with weather radar data in Python.
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3538/speeding-up-python-data-analysis-using-cython 
3. pyDAL: a pure python Database Abstraction Layer
Giovanni Barillari

This talk will provide an overview and quick demonstration of the features and the APIs of pyDAL, the python Database Abstraction Layer.  It dynamically generates the SQL in real time using the specified dialect for the database back end, so that you do not have to write SQL code or learn different SQL dialects (the term SQL is used generically), and your code will be portable among different types of databases.  pyDAL comes from the original web2py's DAL, with the aim of being wide-compatible. pyDAL doesn't require web2py and can be used in any Python context.
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3541/pydal-a-pure-python-database-abstraction-layer 
4. Python In The Enterprise
David Freedman

In recent years, Creative Artists Agency has gone open source on public cloud for all of its enterprise software development.  In this talk David discusses the power and flexibility of Python for rapid prototyping, SaaS integration, web app development, devops, predictive analytics and building smart APIs.  He also covers enterprise implications of leveraging open source tools vs their commercial counterparts.
 recording release: no  
 Video: http://www.pyvideo.org/video/3543/python-in-the-enterprise 
5. Teaching web app development with web2py and crowdsourced grading
Luca de Alfaro

In this talk, we describe how web2py together with crowdsourced grading enable teaching web applications at scale.  Web2py is unique in allowing the easy packaging and sharing of applications.  A student can write a web app, package it into a single file, then share it; other students can with the click of a mouse unpack and load the application, and use it. This allows teaching web development in a peer-driven fashion: students submit solutions to homework assignments, then review and grade each other's work, in the process learning from studying the solutions that other students submitted for the same problem. We have successfully used this approach at UC Santa Cruz for three years now, in the process building a tool, CrowdGrader, for the peer review and grading of the applications.  The talk will describe the peer-driven approach and the tool, and present results on tool accuracy and student satisfaction.
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3544/teaching-web-app-development-with-web2py-and-crow 
6. Having Fun with Recommender Systems
Bamshad Mobasher

(Needs description.) 
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3559/having-fun-with-recommender-systems 
7. Python and Data Processing and Visualization
Eric Pershey

Argonne National Laboratory's Leadership Computing Facility uses Python in many ways from developing tools, testing supercomputers, completing real science, and much more all using Python.  We will dive into a few of these systems and how Python has benefited us.
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3537/python-and-data-processing-and-visualization 
8. Cloudmesh Virtual Cluster Management for Data Intensive Applications
Gregor von Laszewski

Cloudmesh is an important component to deliver a software-defined system  encompassing virtualized and bare-metal infrastructure, networks, application, systems and platform software  with a unifying goal of providing Cloud Testbeds as a Service (CTaaS). Cloudmesh federates a number of resources from academia and industry. This includes existing FutureSystems, Amazon Web Services, Azure, HP Cloud, Karlsruhe using various technologies. An high level architectural image is provided at {http://cloudmesh.github.io/introduction_to_cloud_computing/_images/cloudmesh-arch-2013.png
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3539/cloudmesh-virtual-cluster-management-for-data-int 
9. PyHSPF: Data integration software for hydrologic and water quality modeling
David Lampert

The Hydrological Simulation Program in Fortran (HSPF) is used extensively for the assessment of water quantity and water quality issues. This presentation will discuss the development of an open-source Python package (PyHSPF) for gathering necessary data from the internet and using it to build HSPF input files, performing HSPF simulations, postprocess results and calibrate parameters. An example application of the software will be presented for a model of the North Skunk River Basin, Iowa, that simulates monthly flows over a 30-year time period with a high degree of explanatory power (R squared = 0.85).
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3540/pyhspf-data-integration-software-for-hydrologic 
10. Realtor Search: Elasticsearch and Python in Practice
Aleksandar Velkoski

Part of our Master Member Profile project, the REALTOR search is a Web2py-based application, leveraging Elasticsearch, that aims to provide users (staff and members) with a means to query comprehensive member profiles. With relevant data gathered and presented via an easy-to-use centralized platform, staff can leverage information to enhance services provided to members and members to enhance productivity.
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3545/realtor-search-elasticsearch-and-python-practice 
11. Maximum Viable Integrity with web2py
Mark Graves

Creating an enterprise grade application to meet static business specifications is an easy task for a developer using web2py.  Keeping up with ever changing needs of a business, while maintaining core application logic can be more difficult. At Suits + Tables, we have found this process facilitated using a combination of Unit Testing, Behavior Driven Development, and Continuous Integration. Best practices for maintaining Maximum Viable Integrity during iteration and deployment of web2py applications will be discussed in the context of Suits + Tables and other case studies.
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3546/maximum-viable-integrity-with-web2py 
12. Eight years of web2py
Massimo Di Pierro

This is a compressed tutorial about web2py taking an historical prospective, going from its older features to some of the most recent ones. No previous experience required.
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3547/eight-years-of-web2py 
13. Classification using Pandas and Scikit-Learn
Skipper Seabold

This will be a tutorial-style talk demonstrating how to use pandas and scikit-learn to do classification tasks. We will do some data munging and visualization using pandas and matplotlib. Then we will introduce some of the different classifiers in scikit-learn and show how to include them into a classification pipeline to produce the best predictive model. Interactive IPython/Jupyter notebooks will be provided.
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3548/classification-using-pandas-and-scikit-learn 
14. Getting Started with Cassandra and Python
Amber Doctor, Philip Doctor

This talk will covers why to use Cassandra, basic Cassandra architecture, setting up Cassandra, and basic crud operations with a python web framework.  The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.  Cassandra's data model offers the convenience of column indexes with the performance of log-structured updates, strong support for denormalization and materialized views, and powerful built-in caching.
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3549/getting-started-with-cassandra-and-python 
15. Work work money money
Tanya Schlusser

Public census and economics data are combined with job board postings to paint a portrait of industry growth and salary differences across the United States. Apache Spark's Python shell is used for analytics; Heroku hosts the visualizations.
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3550/work-work-money-money 
16. Jupyter Project: Interacting between Python and R Libraries for Data Mining
Myles Gartland

This talk will cover using ipython (Jupyter Project) for python and non-python projects, and how to interact Python and R through rmagic (rpy2) package. 
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3542/jupyter-project 
17. Bringing Research to Real Life
Safia Abdalla

Machine learning is a fascinating field with numerous applications in data science. New research is always emerging in the field of machine learning but it can take years for this research to be converted into usable software libraries. In this talk, I'll be providing a technique for reading (actually reading) research papers and using the powerful syntax and data-focused libraries of Python to write up quick and dirty implementations of techniques written out in research papers. This talk is relevant to any practical professional who occasionally strays into the field of academics to discover new techniques that are applicable to the data problems that they are attempting to solve.
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3552/bringing-research-to-real-life 
18. Learning by doing: Predicting West Nile Virus in Chicago.
Jonathan Gemmell

Challenging students with interesting projects is an effective way to motivate them.  In this talk we will first discuss Kaggle, a platform for predictive modelling and analytics competitions on which companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models.  In particular, we will discuss the West Nile Virus Prediction challenge and how it is being used in the classroom.  Finally, this talk will get interested competitors started with some sample code.
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3553/learning-by-doing-predicting-west-nile-virus-in 
19. Extending versatility of Python to nonprogrammers
Mikhail Lakirovich, Armand Ruiz

The explosive availability of data is creating unprecedented opportunities in the analytics space. However, not everyone performing analytics is a data scientist and has the appropriate technical expertise for creating, debugging and running code.  This so-called business analyst brings a broad understanding of data analysis and the business problem at hand and does best with user-friendly tools.  What happens, however, if the business user needs to run a non-standard or unique analysis that is not available natively in their analytics software?  Join our session to learn how python code can be integrated with IBM SPSS and how business users can employ the versatility and power of python without the need to run code.
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3554/extending-versatility-of-python-to-nonprogrammers 
20. Document Classification with Machine Learning
Jordan "Vladimir'' Myers

The presentation will discuss how Python was used to implement a machine-learning algorithm that accepts a training set of documents and then classifies documents based on word vector similarity. This project inspired by faculty at DePaul is now a fully-fledged start-up.
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3555/document-classification-with-machine-learning 
21. Python-Powered Machine Learning in the Cloud
Stephen Hoover

Python is a powerful, easy-to-use language which now has a wide range of numerical and machine-learning open source libraries. At Civis Analytics, we've built a cloud-based platform for data science which empowers analysts to extract insights from their data with less effort. The platform itself runs on Amazon Web Services, and the machine learning workflows at the core of the platform are coded in Python. Open-source Python libraries such as pandas, numpy, statsmodels, and scikit-learn let our data scientists focus on high-level workflows and greatly accelerate our development process. In this talk, I'll give an overview of Civis's new data science platform, focusing on the machine-learning aspects. I'll talk about how we use Python open-source libraries to help with data analysis, and some of the challenges we've overcome along the way.
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3556/python-powered-machine-learning-in-the-cloud 
22. The Use of Web Games to Train Data Analysts
Casey Schroeder

There has been recent work on the taxonomy of games which are based, one way or another, on real world data. Typically these games help people learn that data or how to cope with it. The traditional examples are simulation games (flight, driving, etc.), while other games incorporate data in such a way that it is beneficial to learn the real world data in the game play (trivia). These types of data-games commonly have a domain specific focus.  We intend to explore the possibility of interactive games which help people to learn data analysis, in general, implementing some such games in python using web2py.
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3557/the-use-of-web-games-to-train-data-analysts 
23. Python in Computer Security - Student's Perspective
Li-Wey Lu, Ryan Haley

As students of computer security, we are often the subject of academic discussion regarding the best topics to study. We hope to show how we have utilized python to within our area of study and why it is important for future students. Our goal is to encourage future classes of students to enjoy scripting. We will introduce how we as security students utilize Python as an everyday tool with specific examples of how we used Python in various security competitions and gave us a noticeable advantage over other teams.  This will include scripts used as defenders against various red team attacks
 recording release: yes license:   
 Video: http://www.pyvideo.org/video/3558/python-in-computer-security-students-perspecti 


Location
--------
Room LL104


About the group
---------------