Hi
user
Admin Login:
Username:
Password:
Name:
Stream processing made easy with riko
--client
pyconza
--show
pyconza2016
--room congo_room 11465 --force
Next: 1 Thursday Lightning Talks
show more...
Marks
Author(s):
Reuben Cummings
Location
Congo Room
Date
oct Thu 06
Days Raw Files
Start
16:15
First Raw Start
error-in-template
Duration
00:45:00
Offset
None
End
17:00
Last Raw End
Chapters
Total cuts_time
None min.
https://za.pycon.org/talks/35/
raw-playlist
raw-mp4-playlist
encoded-files-playlist
mp4
svg
png
assets
release.pdf
Stream_processing_made_easy_with_riko.json
logs
Admin:
episode
episode list
cut list
raw files day
marks day
marks day
image_files
State:
---------
borked
edit
encode
push to queue
post
richard
review 1
email
review 2
make public
tweet
to-miror
conf
done
Locked:
clear this to unlock
Locked by:
user/process that locked.
Start:
initially scheduled time from master, adjusted to match reality
Duration:
length in hh:mm:ss
Name:
Video Title (shows in video search results)
Emails:
email(s) of the presenter(s)
Released:
Unknown
Yes
No
has someone authorised pubication
Normalise:
Channelcopy:
m=mono, 01=copy left to right, 10=right to left, 00=ignore.
Thumbnail:
filename.png
Description:
# AUDIENCE - data scientists (current and aspiring) - those who want to know more about data processing - those who are intimidate by "big data" (java) frameworks and are interested in a simpler, pure python alternative - those interested in async and/or parallel programming # DESCRIPTION Big data processing is all the rage these days. Heavyweight frameworks such as Spark, Storm, Kafka, Samza, and Flink have taken the spotlight despite their complex setup, java dependency, and intense computer resource usage. Those interested in simple, pure python solutions have limited options. Most alternative software is synchronous, doesn't perform well on large data sets, or is poorly documented. This talk aims to explain stream processing and its uses, and introduce riko: a pure python stream processing library built with simplicity in mind. Complete with various examples, you’ll get to see how riko lazily processes streams via its synchronous, asynchronous, and parallel processing APIs. # OBJECTIVES Attendees will learn what streams are, how to process them, and the benefits of stream processing. They will also see that most data isn't "big data" and therefore doesn't require complex (java) systems (\**cough\** spark and storm \**cough\**) to process it. # DETAILED ABSTRACT ## Stream processing? ### What are streams? A stream is a sequence of data. The sequence can be as simple as a list of integers or as complex as a generator of dictionaries. ### How do you process streams? Stream processing is the act of taking a data stream through a series of operations that apply a (usually pure) function to each element in the stream. These operations are pipelined so that the output of one function is the input of the next one. By using pure functions, the processing becomes embarrassingly parallel: you can split the items of the stream into separate processes (or threads) which then perform the operations simultaneously (without the need for communicating between processes/threads). [1-4] ### What can stream processing do? Stream processing allows you to efficiently manipulate large data sets. Through the use of lazy evaluation, you can process data stream too large to fit into memory all at once. Additionally, stream processing has several real world applications including: - parsing rss feeds (rss readers, think [feedly](http://feedly.com/)) - combining different types data from multiple sources in innovative ways (mashups, think [trendsmap](http://trendsmap.com/)) - taking data from multiple sources, manipulating the data into a homogeneous structure, and storing the result in a database (extracting, transforming, and loading data; aka ETL, data wrangling...) - aggregating similarly structured data from siloed sources and presenting it via a unified interface (aggregators, think [kayak](kayak.com)) [5, 6] ## Stream processing frameworks If you've heard anything about stream processing, chances are you've also heard about frameworks such as Spark, Storm, Kafka, Samza, and Flink. While popular, these frameworks have a complex setup and installation process, and are usually overkill for the amount of data typical python users deal with. Using a few examples, I will show basic Storm usage and how it stacks up against BASH. ## Introducing riko Supporting both Python 2 and 3, riko is the first pure python stream processing library to support synchronous, asynchronous, and parallel processing. It's built using functional programming methodology and lazy evaluation by default. ### Basic riko usage Using a series of examples, I will show basic riko usage. Examples will include counting words, fetching streams, and rss feed manipulation. I will highlight the key features which make riko a better stream processing alternative to Storm and the like. ### riko's many paradigms Depending on the type of data being processed; a synchronous, asynchronous, or parallel processing method may be ideal. Fetching data from multiple sources is suited for asynchronous or thread based parallel processing. Computational intensive tasks are suited for processor based parallel processing. And asynchronous processing is best suited for debugging or low latency environments. riko is designed to support all of these paradigms using the same api. This means switching between paradigms requires trivial code changes such as adding a yield statement or changing a keyword argument. Using a series of examples, I will show each of these paradigms in action.
markdown
Comment:
production notes
Rf filename:
root is .../show/dv/location/, example: 2013-03-13/13:13:30.dv
Sequence:
get this:
check and save to add this
Veyepar
Video Eyeball Processor and Review