pre-release: SF Python meeting announcement

Please take a moment to review your details and reply with OK or edits.
Subject and below is what will go out and also will be used to title the videos.

ANN: SF Python at Bungalo East Sun October 8, 12:15p

SF Python
When: 12:15 PM Sunday October 8, 2023
Where: Bungalo East
PyBay features the most influential speakers presenting the most crucial technologies to help beginners and seasoned developers alike get up-to-date quickly, in a single-track format. Whether you’re interested in web technologies, data, devops, Python internals, or performance, PyBay will help you stay on top of your game AND network with engineers at companies that are hiring!

Working remotely and want to meet your teammates to boost team cohesiveness? Leverage the platform we’ve built. There are great talks, yummy food, fresh air, vitamin D... all the elements developers crave for these days. If there are talks that don’t interest your team, take the opportunity to talk to speakers, create your own team activities or book a tee-time at the adjacent miniature golf course!

1. Pants: Cargo for Python
Benjy Weinberger

Python has a thriving ecosystem of single-purpose tools such as pytest, mypy, black and so on, but no standard orchestration tool to manage them efficiently. This makes it difficult to scale up Python codebases without a lot of bespoke scripting.

As a result, Python repos tend to be small, focused on building a single library or binary. Dependencies are managed by publishing versioned artifacts from one repo and consuming them in another repo by download.

But in the age of microservices, cloud functions, continuous delivery, and rapid iteration, this can be untenable. We often need to repeatedly build and deploy many small, interdependent parts out of a single large repo, and the sequential publishing cycle is too slow and cumbersome. 

Pants is a build system with a focus on Python. It aims to be for Python what Cargo is for Rust: the one-stop shop for efficiently testing, typechecking, formatting, packaging and deploying code. Pants uses static analysis to grok your code's dependencies automatically, so you don't have to maintain large amounts of metadata. It uses this dependency data, along with its local and remote caching and concurrency capabilities, to dramatically speed up the development and CI cycle. 

This talk will explain what Pants is and how it works. It will provide canonical examples of how to use Pants effectively with Python code, such Django apps and AWS Lambdas. And how to use it to package your code as a standalone binary or a Docker image.
 recording release: yes license: CC BY-NC-SA  

2. Programming Your Computer With Python
Glyph Lefkowitz

Using Python to write code for web applications, scientific applications, and data analysis is extremely popular. If you're here at PyBay, you're probably doing it.  And while there are desktop applications in Python, it's far less popular for that.

Those of us who write that back-end code are typically sitting in front of desktop or laptop computers for 6-10 hours a day.  And yet, while we may want those machines to do certain tasks for us, for some reason it rarely occurs to many of us to use Python to solve problems on *those* computers rather than the ones in the cloud.

In this talk, we'll explore some of the capabilities that local computation can give you which cloud and web applications can't, and look at some of the ways that Python can help you leverage that power.
 recording release: yes license: CC BY-NC-SA  

3. Ranking and Retrieval Techniques for Retrieval Augmented Generation with Haystack
Tuana Celik

Retrieval augmented generation has proven to be quite an effective technique to achieve good results with LLMs, so that they may provide answers based on your own data.

While retrieval is a key step in such applications, other step have also started to show promise for various use cases: Ranking.
In this session we will discuss why retrieval and ranking play important roles to build effective applications with LLMs. In particular, we will see how we can use Lost in the Middle and Diversity Rankers with Haystack, an open source LLM framework, to improve the quality of our RAG pipeline results. We will also briefly discuss the role of hybrid retrieval
 recording release: yes license: CC BY-NC-SA  

4. Data Science beasts (failures) and where to find them
Grishma Jena

The nature of the field of Data Science encourages trial and error, but we can do a better job of destigmatizing failure and learn from our collective experiences. Join me as I take us on an adventure to find the beasts i.e. the different ways Data Science projects can fail. I will be talking about 4 major reasons for failure (data, infrastructure, implementation, and culture), their different aspects, and supplementing it with my experiences and case studies. I will also share how to control these beasts and recommend actions to be taken to ensure a successful end-to-end Data Science project.
 recording release: yes license: CC BY-NC-SA  

5. Type safe data validation using Pydantic v2
Adrian Garcia Badaracco

This talk will introduce Pydantic users, old or new, to the new APIs available in Pydantic v2, best practices for using them, and some of the powerful new features we added support for, like PEP 593's `Annotated` and PEP 695's `TypeAliasType`.

We'll then dive deeper into how Pydantic v2 interacts with Python's type system, what we've learned from that, and how we can improve runtime <-> static typing interactions even more.

Finally, we'll touch on some of the internals of Pydantic, including our use of Rust and how we've essentially ended up building a DSL that translates type hints and snippets of arbitrary user-defined logic into a DAG of computations in Rust (i.e. how we accidentally built a compiler).
 recording release: yes license: CC BY-NC-SA  

6. Craft Complex Mock Data
Jason Koo

Existing mock data generators can only create individual, unrelated tables of fake data. Synthetic data services that can produce interwoven datasets require real data to anonymize. This leaves only error-prone custom scripts to create realistic, interdependent datasets for development and testing.

In this session learn how to define a .json configuration file and leverage the graph-data-generator PyPi package to quickly create custom, deeply interconnected fake datasets for your own Python projects.
 recording release: yes license: CC BY-NC-SA  

7. Embeddings: What they are and why they matter
Simon Willison

Embeddings are a Large-Language-Model-adjacent technology that allow data such as text or images to be represented as an array of floating point numbers, representing a location in a weird, multi-dimensional space.

They are surprisingly powerful. Embeddings can be used to implement semantic search, find related content and even build text search against image data.

I'll explain how they work, show you how to use them and teach you how to build weird and wonderful things with them that you couldn't build any other way.
 recording release: yes license: CC BY-NC-SA  

8. Python deployment with Docker and Poetry
Cristian Heredia

Docker and Poetry are tools often used to deploy Python code. Docker containerizes code, making it easy to deploy in the cloud. Poetry manages dependencies. However, did you ever consider how the two could work in concert to build slim, repeatable production containers for your code? 

This talk aims to give developers the basics of setting up a Poetry project inside a Python Docker environment. The goal is to generate a secure container with only source code present—no docs, tests, or secrets.

What this talk isn't: A deep dive into Docker, Poetry, or virtual environments. 

 recording release: yes license: CC BY-NC-SA  

9. Let's talk about JWT
Jessica Temporal

JSON Web Tokens, or JWTs for short, are all over the web. They can be used to track bits of information about a user in a very compact way and can be used in APIs for authorization purposes. Join me and learn what JWTs are, what problems it solves, how you can use JWTs, and how to be safer when using JWTs on your applications. All of that with some examples on how to validate and deal with JWTs in Python.
 recording release: yes license: CC BY-NC-SA  

10. Infrastructure as a Product: Lessons in Platform Engineering
Nick DiRienzo

Platform Engineering teams face unique challenges in product development organizations. They have a big mission—enabling the rest of the engineering organization to move fast without breaking things—while usually lacking product managers on the team. However, applying product principles can be useful in achieving that goal.

One key area Platform Engineering owns is how services are built and which tools are used. In this talk, we'll explore how a product-focused approach can guide creating principled developer products. Pulling from my own experiences, I'll share real-world insights and lessons learned as a Platform Engineer.
 recording release: yes license: CC BY-NC-SA  

11. FORKS? POOLS? ASYNC? Solving Wordle with Python’s concurrency tools
Christopher Neugebauer

I’ve played Wordle most days since late 2021. Maybe you have too? One thing I wonder after solving the puzzle for the day is whether I made a bad choice of words. Should I have chosen SMASH, or STASH? Just how lucky was I to solve a puzzle?

This talk will explore how to implement a Wordle statistics bot using Python's concurrent processing tools. No spoilers, I promise.
 recording release: yes license: CC BY-NC-SA  

12. Using pandas and pyspark to address challenges in processing and storing time series instrument data
Aaron Wiegel

Time series data from scientific instruments for fermentation, environmental sensors, or spectroscopy often comes in proprietary or unusual formats that are require custom logic to process. In addition, processing data at scale is challenge since enterprise laboratory information management systems (LIMS) typically rely on transactional, row-oriented databases that are not designed to handle millions of records at a time. However, with clever use of pandas for unusually formatted files or pyspark (via Databricks) for large numbers of records, this data can be processed into cleaner, more useful forms for further analysis.
 recording release: yes license: CC BY-NC-SA  

13. Elevating Python Development with Nix Package Manager
Salar Rahmanian

In the ever-evolving landscape of Python development, managing dependencies and ensuring reproducibility remain pivotal challenges. Enter the Nix Package Manager – a powerful tool that transcends conventional package management approaches. Join us in this talk as we embark on a journey through the intricacies of Nix and its profound impact on Python projects.

Dive into the heart of Nix as we demystify its functionality and reveal its potential to transform your Python development workflow. Uncover how Nix transcends the limitations of traditional package managers by providing declarative configuration, fine-grained control over dependencies, and unmatched reproducibility.

Our discussion delves deep into Nix's utility for Python projects, demonstrating how it streamlines package management and safeguards your projects against the pitfalls of dependency chaos. Witness how Nix ensures consistent environments across development, testing, and deployment, fostering collaboration and expediting development cycles.

Drawing upon a decade of Python expertise, our speaker brings firsthand insights into how Nix can enhance the Python ecosystem. From managing intricate dependency graphs to crafting resilient virtual environments, Nix empowers you to focus on code rather than package wrangling.

Throughout this talk, we will showcase practical examples and real-world scenarios, illuminating how Nix orchestrates Python projects with elegance and precision. Whether you're a seasoned Pythonista or a curious newcomer, this talk equips you with the knowledge to integrate Nix into your workflow, revolutionizing the way you approach Python development.

 recording release: yes license: CC BY-NC-SA  

14. Python in Hardware & Embedded Systems: A Deep Dive
Sriram Vamsi Ilapakurthy

"Python in Hardware & Embedded Systems: A Deep Dive" offers a comprehensive exploration of Python's growing influence in the realm of embedded systems, challenging the traditional dominance of languages like C. Beginning with specialized Python implementations such as MicroPython and CircuitPython, the talk illuminates Python's capability to interface with the physical world, from sensors to actuators. As we delve into robotics, attendees will discover Python's role in sensor fusion, computer vision, and advanced robotic applications. The discourse also sheds light on real-world Python-driven innovations, from drones to wearables, while addressing performance and memory challenges. Concluding with development tools and debugging techniques, this talk serves as both a testament to Python's versatility and a guidebook for its effective deployment in embedded contexts.
 recording release: yes license: CC BY-NC-SA  

15. Contain Yourself
Moshe Zadka

Building good containers for Python applications means dealing with several niggling pieces. Where do you get your Python? How do you install third-party packages? What kind of pinning should, and shouldn’t you do? How do you configure your app?

The talk will cover how to containerize Python applications. It will start from going over choices for a base image, how to install Python on base images which do not include it, and how to get the requirements installed. The talk will cover the various trade-offs involved: building speed, how often to upgrade the image. It will also cover security best practices like setting the right permission on directories and running with the correct user.
 recording release: yes license: CC BY-NC-SA  

16. No More Nitpicks: effortless cleanup with fixers, formatters, and codemods
Zac Hatfield-Dodds

Contemporary idioms and a consistent style can make code a pleasure to work with - but fixing a stream of comments from linters or colleagues is less fun.  Let's see how to have a computer do that instead!

I'll explain my favorite tools for formatting, updating, refactoring, and generally cleaning up code; and workflows that make them easy to use - from editor integrations, to pre-commit and continuous integration, to regularly scheduled or one-off cleanup campaigns.  

 recording release: yes license: CC BY-NC-SA  

17. Better Together: Unleashing the Synergy of Pandas, Polars, and Apache Arrow
Chris Brousseau

Supercharge your data engineering workflows by merging the robustness of Pandas with the high-speed capabilities of Polars, all underpinned by Apache Arrow's in-memory technology. This technical deep-dive will unravel the nuances between Pandas and Polars, showcase their newest features, and demonstrate how to integrate them for optimal performance. Learn actionable techniques to make your data pipelines faster, more efficient, and ready for scale. Join us to see how you might elevate your data engineering toolkit!
 recording release: yes license: CC BY-NC-SA  

18. Python, Planets, and Portals: Designing Web Apps for Modern Astronomers
Dan Burger

Have you ever thought about the tech that powers our search for alien worlds? From 2010 to 2022 I worked in the astronomy department at Vanderbilt University designing Python-based web applications for the unique needs of astronomers searching for exoplanets: planets outside our solar system orbiting other stars. In this talk I will go over the innovative web-based astronomy tools I built at Vanderbilt and the unique challenges in building these tools. My focus will be on Filtergraph, a cutting-edge service for building web-based data visualization portals. Not only has it been used extensively by multiple NASA missions, it also caught the attention of mainstream media, being showcased in an episode of 60 Minutes with Anderson Cooper ( While I'll discuss my experience in astronomy, the lessons I've gathered will be invaluable for anyone working in a complex, technologically advanced field
 recording release: yes license: CC BY-NC-SA  

19. Scale Data Science by Pandas API on Spark
Xinrong Meng

As Python has become the go-to language for data science, pandas has quickly evolved into a standard library in the field. However, one key drawback of pandas is its inability to linearly scale with increasing data volumes, primarily due to its reliance on single-machine processing. Pandas API on Spark addresses this issue, empowering users to handle vast datasets by leveraging Apache Spark while preserving the pandas APIs.

In this talk, I will introduce the Pandas API on Spark, explain how it enables the scaling of data science workloads, and explore the reasons behind its highly optimized performance. By the end of the session, you will have the knowledge to scale your existing data science workloads seamlessly using this powerful tool.
 recording release: yes license: CC BY-NC-SA  

20. Beyond Conventional: Embracing Python & LLMs for Quality Assurance
Paul Pereyda Karayan

What do you do when your test suite is not fit for purpose? At Opto, we felt like we were locked in a daily battle against the tests written for our Java and TypeScript services. But, when we paired the adaptability of Python with the power of LLMs, we were able to enhance & extend what we had into something that really worked!

Join us on a journey of transformation where we'll cover:
- Proactive Monitoring: Implementing "production probes" - lightweight and fast Python request "tests"- to actively ensure our services were up and executing core functionalities.
- LLMs in Action: The unexpected efficacy of LLMs in aiding the creation of tests. We'll focus on our Python/TypeScript deployment tools here.
- Quality Over Quantity: Recognizing that just writing tests isn’t the end game. We harnessed FastAPI LLMs to swiftly assess our functional test coverage, helping us identify and address gaps.
- Living Documentation: A sneak peek into how we've sown the seeds for dynamic, ever-evolving documentation using Python.

Throughout the session, we’ll touch on the trade-offs, democratization & ownership of tests, and how this little endeavor set the stage for Opto's broader embrace of Python.
 recording release: yes license: CC BY-NC-SA  

21. Testing Strategies for Python
Liz Acosta

We all know testing is good for you just like we know eating your vegetables is good for you, but let's face it: Eating your vegetables isn't always that fun. This talk hopes to change your mind about testing and offer strategies to make those vegetables taste a little better.
 recording release: yes license: CC BY-NC-SA  

22. Understanding LangChain Agents and Tools with Twilio (or with SMS)
Lizzie Siegle

With LangChain, developers “chain” together different LLM components to create more advanced use cases around LLMs. Agents use LLMs to decide what actions should be taken. Get introduced to LangChain about what you can do with Agents, Tools, and communication APIs!
 recording release: yes license: CC BY-NC-SA  

23. Shiny: Data-centric web applications in Python
Joe Cheng

Shiny is a web framework that is designed to let you create data dashboards, interactive visualizations, and workflow apps in pure Python or R. Shiny doesn't require knowledge of HTML, CSS, and JavaScript, and lets you create data-centric applications in a fraction of the time and effort of traditional web stacks.

Of course, Python already has several popular and high-quality options for creating data-centric web applications. So it's fair to ask what Shiny can offer the Python community.

In this talk, I will introduce Shiny for Python and answer that question. I'll start with some basic demos that show how Shiny apps are constructed. Next, I'll explain Transparent Reactive Programming (TRP), which is the animating concept behind Shiny, and the reason it occupies such an interesting place on the ease-vs-power tradeoff frontier. Finally, I'll wrap up with additional demos that feature interesting functionality that is made trivial with TRP.

This talk should be interesting to anyone who uses Python to analyze or visualize data, and does not require experience with Shiny or any other web frameworks.
 recording release: yes license: CC BY-NC-SA  

24. Unleashing Python's Power: Serverless Innovations with AWS Lambda
Mayank Jindal

In recent years, serverless computing has brought about a paradigm shift in the realm of software development and deployment. This transformation has not only revolutionized application construction but has also introduced unparalleled ease in scaling, cost-effectiveness, and maintenance. As the domain of serverless continues to grow, AWS Lambda has risen as a platform that grants developers an exceptional chance to deploy Python applications without the burden of traditional server administration.

This session aims to offer insights and practical expertise in fully utilizing the capabilities of serverless architecture. This will empower attendees to create Python applications that effortlessly combine innovation with efficiency, using the capabilities of AWS Lambda through the boto3 library.
 recording release: yes license: CC BY-NC-SA  

25. Python's Types: 5 Amazing Ways Python Type Hints Will Supercharge Your Code
Michael Kennedy

When Python's Type Hints were introduced in 2015, they were met with guarded optimism. Some people were excited for the added functionality and safety they brought to the language. Others saw the Java-ification of Python and thought, "no thanks". Eight years later, the Python community has generally embraced Python types. We've seen powerful and popular frameworks built upon them (Pydantic and FastAPI for example) and tools to fully analyze your code from a typing angle.

This talk will cover the foundations and history of Python typing. Then we will see some of the common syntax and examples for easily bringing typing into your programming habits. Then we will dive into 5 amazing ways typing can help you write and run better code. Time permitting, we'll close out the session with advice on Python typing guidance, patterns, and best practices.
 recording release: yes license: CC BY-NC-SA  

26. Design Patterns for Data Pipelines
Lisa Dusseault

Do you go beyond Extract, Transform, Load or wish you had when the Transform step gets super complicated? Let's share design patterns that help manage complex data pipelines in Python, with mentions of Django.
 recording release: yes license: CC BY-NC-SA  

Bungalo East

About the group
PyBay is the regional Python conference for the San Francisco Bay Area, bringing together Pythonistas from around the Bay Area and beyond. It is a volunteer-run organization dedicated to building a stronger Python community. PyBay offers deep-dive talks and networking opportunities that aim to enrich and empower the Python community. PyBay is part of BAPyA (Bay Area Python Association). BAPyA member organizations are the SF Python, Pyninsula, and BayPIGgies meetups.