Hi
user
Admin Login:
Username:
Password:
Name:
An Intuitive Grasp of RegEx’s in Python
--client
pyohio
--show
pyohio_2018
--room cartoon2 14191 --force
Next: 11 Python in Serverless Architectures
show more...
Marks
Author(s):
Tom Fetherston
Location
Cartoon 2
Date
jul Sun 29
Days Raw Files
Start
12:45
First Raw Start
12:30
Duration
0:30:0
Offset
0:14:24
End
13:15
Last Raw End
13:30
Chapters
00:00
0:15:39
Total cuts_time
34 min.
https://pyohio.org/2018/schedule/presentation/33/
raw-playlist
raw-mp4-playlist
encoded-files-playlist
host
archive
tweet
mp4
svg
png
assets
release.pdf
An_Intuitive_Grasp_of_RegExs_in_Python.json
logs
Admin:
episode
episode list
cut list
raw files day
marks day
marks day
image_files
State:
---------
borked
edit
encode
push to queue
post
richard
review 1
email
review 2
make public
tweet
to-miror
conf
done
Locked:
clear this to unlock
Locked by:
user/process that locked.
Start:
initially scheduled time from master, adjusted to match reality
Duration:
length in hh:mm:ss
Name:
Video Title (shows in video search results)
Emails:
email(s) of the presenter(s)
Released:
Unknown
Yes
No
has someone authorised pubication
Normalise:
Channelcopy:
m=mono, 01=copy left to right, 10=right to left, 00=ignore.
Thumbnail:
filename.png
Description:
## Overview & Purpose Regular expressions are used to define search patterns and are an important technique for validating data, scraping data, data wrangling, (i.e re-formatting.), the content of strings. Additionally, the’re used to enable syntax highlighting in some applications. Python provides regular expressions via the built in ‘re’ module, and there is a third party ‘regex’ module with added functionality. The problem is, writing regex patterns to do what you want is hard, and even when you’ve got one, figuring out what it is or isn’t going to match can be baffling. This talk will give you two tools conquer regex’s, a mental model, (demonstrated with props), of how they work, and a mini-language, “Simple Regex Language”, to create readable regex’s that easily translate into Python’s regex syntax. ## A Physical Model of RegEx’s 1. Picture the string we are searching as a line of tiles, (like those in scrabble), where the character each represent has been routed into its surface. 2. This lets us talk about the two categories of places a regex start or continue a match: 1. At a character specified in the regex: (modeled by a vacuum-formed sheet of plastic whose profile can nest in the character’s incised relief). 2. At a position called an anchor, specified in the regex: (represented by the insertion of a strip of plastic into the crack between tiles) 1. Note: whether a give ‘crack’ matches the given anchor is determined by what is to its left and right; more specifically, the categories they belong to, i.e. whitespace, printable, alphabetic, numeric, eol, buffer-wall, etc. 3. This model lets us illustrate how the regex engine goes about making a match; e.g., if our pattern wants to match ‘ABC’, and our string contains ‘ABD’, we slide along a piece of plastic with a ‘A’ profile, from the start of the buffer to where we encounter the ‘A’ tile. The plastic will sink into ‘A’ tile, allowing us to swing down ‘B’ plastic that is taped the the right edge of ‘A’ overlay which also sinks down flush matching the ‘B’ tile. When we try to swing down the next taped on plastic overlay, ‘C’, it crashes into the surface of the ‘D’ tile and instead levers out the ‘B’ overlay, which levers out the ‘A’ overlay and gets us back to sliding along the ‘A’ overlay looking for the next place to pause and try for a match. 4. At this point we introduce SRL, (below), then show how its patterns translate into Python regex’s, then we return to this model and extend it to cover all the different regex ‘atoms’ we can now write. 5. This “Tile and Overlay” model provides a visual metaphor to see how the regex engine works, but there are no tiles and overlay chains in the computer, there are only strings of bytes and double-bytes, (if we are talking UTF-8), so we briefly introduce a model that use height to represent characters. This lets us talk about Unicode strings, and hints at the kind of optimizations compiling regex’s might allow Python to do. 6. For the presentation, there will be a physical model to show the example in covered in point three above, but to make things manageable, we'll then switch to illustrations done in Skecthup, (maybe even animations). ## SRL: Simple Regex Language 1. SRL is what is known as a “Little Language”, or a “Domain Specific Language” which are built to handle a small problem area. In SRL’s case, the problem is that of the unreadability of regex’s, and that each language has a different way of writing them. 1. You’d think we could skip this as we are only concerned with Python here, but it is useful to have this level of abstraction, even if you only do Python. You are likely to find that your editor uses a different flavor of regex’s. 1. An overview, live demos, and documentation can be found at the project’s website, <https://simple-regex.com>. I won’t duplicate them here,I’ll just say that the exposition of what we need for this talk will follow this source material, and include a SRL to Python cheat sheet that covers their translation and how they are expressed in the “Tile and Overlay” model. 1. To give reviewers a feel for what the illustration of SRL will look like, I intend either add them to the proposal, or provide a link to them on my github so you can look at them as they are created for this talk.
markdown
Comment:
14:28
production notes
2018-07-29/12_30_36.ts
Apply:
12:30:36 - 12:44:57 ( 00:14:21 )
S:
12:30:36 -
E:
13:00:36
D:
00:30:00
(
End:
861.0)
show more...
vlc ~/Videos/veyepar/pyohio/pyohio_2018/dv/cartoon2/2018-07-29/12_30_36.ts :start-time=00.0 --audio-desync=0
Raw File
Cut List
12:30:36
seconds: 0.0
Wall: 12:30:36
Duration
00:30:00
13:00:36
seconds: 861.0
Wall: 12:44:57
Comments:
mp4
mp4.m3u
dv.m3u
Split:
Sequence:
:
delete
2018-07-29/12_30_36.ts
Apply:
12:44:57 - 13:00:36 ( 00:15:39 )
S:
12:30:36 -
E:
13:00:36
D:
00:30:00
(
Start:
861.0)
show more...
vlc ~/Videos/veyepar/pyohio/pyohio_2018/dv/cartoon2/2018-07-29/12_30_36.ts :start-time=0861.0 --audio-desync=0
Raw File
Cut List
12:30:36
seconds: 861.0
Wall: 12:44:57
Duration
00:30:00
13:00:36
seconds: 0.0
Wall: 12:30:36
Comments:
mp4
mp4.m3u
dv.m3u
Split:
Sequence:
:
delete
2018-07-29/13_00_36.ts
Apply:
13:00:36 - 13:19:00 ( 00:18:24 )
S:
13:00:36 -
E:
13:30:35
D:
00:29:59
(
End:
1104.0)
show more...
vlc ~/Videos/veyepar/pyohio/pyohio_2018/dv/cartoon2/2018-07-29/13_00_36.ts :start-time=00.0 --audio-desync=0
Raw File
Cut List
13:00:36
seconds: 0.0
Wall: 13:00:36
Duration
00:29:59
13:30:35
seconds: 1104.0
Wall: 13:19:00
Comments:
mp4
mp4.m3u
dv.m3u
Split:
Sequence:
:
delete
2018-07-29/13_00_36.ts
Apply:
13:19:00 - 13:30:35 ( 00:11:35 )
S:
13:00:36 -
E:
13:30:35
D:
00:29:59
(
Start:
1104.0)
show more...
vlc ~/Videos/veyepar/pyohio/pyohio_2018/dv/cartoon2/2018-07-29/13_00_36.ts :start-time=01104.0 --audio-desync=0
Raw File
Cut List
13:00:36
seconds: 1104.0
Wall: 13:19:00
Duration
00:29:59
13:30:35
seconds: 0.0
Wall: 13:00:36
Comments:
mp4
mp4.m3u
dv.m3u
Split:
Sequence:
:
delete
Rf filename:
root is .../show/dv/location/, example: 2013-03-13/13:13:30.dv
Sequence:
get this:
check and save to add this
2018-07-29/12_30_36.ts
2018-07-29/13_00_36.ts
Veyepar
Video Eyeball Processor and Review