veyepar: Stream processing made easy with riko

Hi user

Admin Login:

Name: Stream processing made easy with riko
--client pyconza --show pyconza2016 --room congo_room 11465 --force
Next: 1 Thursday Lightning Talks

show more... Marks

Author(s):	Reuben Cummings
Location	Congo Room
Date	oct Thu 06	Days Raw Files
Start	16:15	First Raw Start	error-in-template
Duration	00:45:00	Offset	None
End	17:00	Last Raw End
Chapters
Total cuts_time	None min.

https://za.pycon.org/talks/35/
raw-playlist raw-mp4-playlist encoded-files-playlist mp4 svg png

final audio

assets release.pdf Stream_processing_made_easy_with_riko.json
logs
Admin: episode episode list cut list raw files day marks day marks day image_files

State:
Locked:	clear this to unlock
Locked by:	user/process that locked.
Start:	initially scheduled time from master, adjusted to match reality
Duration:	length in hh:mm:ss
Name:	Video Title (shows in video search results)
Emails:	email(s) of the presenter(s)
Released:	has someone authorised pubication
Normalise:
Channelcopy:	m=mono, 01=copy left to right, 10=right to left, 00=ignore.
Thumbnail:	filename.png
Description:	# AUDIENCE - data scientists (current and aspiring) - those who want to know more about data processing - those who are intimidate by "big data" (java) frameworks and are interested in a simpler, pure python alternative - those interested in async and/or parallel programming # DESCRIPTION Big data processing is all the rage these days. Heavyweight frameworks such as Spark, Storm, Kafka, Samza, and Flink have taken the spotlight despite their complex setup, java dependency, and intense computer resource usage. Those interested in simple, pure python solutions have limited options. Most alternative software is synchronous, doesn't perform well on large data sets, or is poorly documented. This talk aims to explain stream processing and its uses, and introduce riko: a pure python stream processing library built with simplicity in mind. Complete with various examples, you’ll get to see how riko lazily processes streams via its synchronous, asynchronous, and parallel processing APIs. # OBJECTIVES Attendees will learn what streams are, how to process them, and the benefits of stream processing. They will also see that most data isn't "big data" and therefore doesn't require complex (java) systems (\cough\ spark and storm \cough\) to process it. # DETAILED ABSTRACT ## Stream processing? ### What are streams? A stream is a sequence of data. The sequence can be as simple as a list of integers or as complex as a generator of dictionaries. ### How do you process streams? Stream processing is the act of taking a data stream through a series of operations that apply a (usually pure) function to each element in the stream. These operations are pipelined so that the output of one function is the input of the next one. By using pure functions, the processing becomes embarrassingly parallel: you can split the items of the stream into separate processes (or threads) which then perform the operations simultaneously (without the need for communicating between processes/threads). [1-4] ### What can stream processing do? Stream processing allows you to efficiently manipulate large data sets. Through the use of lazy evaluation, you can process data stream too large to fit into memory all at once. Additionally, stream processing has several real world applications including: - parsing rss feeds (rss readers, think [feedly](http://feedly.com/)) - combining different types data from multiple sources in innovative ways (mashups, think [trendsmap](http://trendsmap.com/)) - taking data from multiple sources, manipulating the data into a homogeneous structure, and storing the result in a database (extracting, transforming, and loading data; aka ETL, data wrangling...) - aggregating similarly structured data from siloed sources and presenting it via a unified interface (aggregators, think [kayak](kayak.com)) [5, 6] ## Stream processing frameworks If you've heard anything about stream processing, chances are you've also heard about frameworks such as Spark, Storm, Kafka, Samza, and Flink. While popular, these frameworks have a complex setup and installation process, and are usually overkill for the amount of data typical python users deal with. Using a few examples, I will show basic Storm usage and how it stacks up against BASH. ## Introducing riko Supporting both Python 2 and 3, riko is the first pure python stream processing library to support synchronous, asynchronous, and parallel processing. It's built using functional programming methodology and lazy evaluation by default. ### Basic riko usage Using a series of examples, I will show basic riko usage. Examples will include counting words, fetching streams, and rss feed manipulation. I will highlight the key features which make riko a better stream processing alternative to Storm and the like. ### riko's many paradigms Depending on the type of data being processed; a synchronous, asynchronous, or parallel processing method may be ideal. Fetching data from multiple sources is suited for asynchronous or thread based parallel processing. Computational intensive tasks are suited for processor based parallel processing. And asynchronous processing is best suited for debugging or low latency environments. riko is designed to support all of these paradigms using the same api. This means switching between paradigms requires trivial code changes such as adding a yield statement or changing a keyword argument. Using a series of examples, I will show each of these paradigms in action. markdown
Comment:	production notes

Rf filename:	root is .../show/dv/location/, example: 2013-03-13/13:13:30.dv
Sequence:
get this:	check and save to add this

Veyepar Video Eyeball Processor and Review