EuroPython 2018 - My notes

After two year I had opportunity to attend EuroPython again. Bellow are my notes from the talks.

Wednesday

David Beazley – Die Threads

What if Python threading got nicer (not Java-like) interface. Professional talk, super jokes, lovely details in examples. Great design is when complex things look simply.

Michele Simionato – Python in scientific computing: what works and what doesn't

Lessons from studying earthquake simulations (working with 100-200 GB of data).

h5py/hdf5 – Complex and buggy
rabbitmq – For small messages, not huge data
zmq – Good experience
dask – Not used in production
No C extensions of Cython, pure Python only
xarray – Worth a look (mentioned during questions)

Jiri Benes – Reliability in distributed systems

Talk sponsored by Kiwi (great party btw:)). Unfortunately having knowledge is not enough to pass it on. Especially in so complex topics.

Lesson learned: Do not use horror stories from former employers in your talks.

Peter Hoffmann – Using Pandas and Dask to work with large columnar datasets in Apache Parquet

Parquet in Python – it works and it’s production ready!

Two implementations now: fastparquet and Arrow
Integration in Pandas.
Predicate pushdown using with Parquet:
- Using dictionaries
- Using min/max (best on sorted data)
Dask can take advantage of Parquet partitioning
turboodbc – NumPy and Arrow from database without unnecessary overhead
Data saved in Azure. Can be accessed using standard tools when file-like interface is implemented. I have to try this with S3!

For myself, one of the most useful talks of the conference.

Emmanuel Leblond – Trio: A pythonic way to do async programming

Trio – asyncio with promises removed
Inspired by Curio by David Beazley, the first keynote speaker.
Read Nathaniel’s article (I recommend that too).
Unrelated promo - look at Hypothesis

Great show.

Dimiter Naydenov – All You Need is Pandas: Unexpected Success Stories

People use tools they now, whether it is Pandas or something else. Unfortunately, this talk didn’t show much.

Isabel Lopez – ETL pipeline to achieve reliability at scale

Nice talks explaining Smarkets ETL pipeline.

Data stored in Parquet format in S3
Luigi in Docker started from Jenkins. Run AWS EMR cluster.
Data downloaded for development.

Almar Klein – Let’s embrace WebAssembly!

I feel ashamed that I haven’t time to look at WebAssembly before. Three possible use cases for Python:

Compile Python interpreter to WebAssembly.
Compile Python code to WebAssembly.
Run WebAssembly from Python.

Demo of the third approach: Game written in Rust, AI written in C, run from Python in on Windows (no web browser involved). Nice example of dependency injection in WebAssembly using imports/exports.

Marco Buttu – White Mars: living far away from any form of life

Live from Antarctica, 3233 m above sea level.

Thursday

Nicole Harris – PyPI: Past, Present and Future

PyPI has a designer! There is a talk about PyPI!

The new implementation is called Warehouse and EuroPython sprints were full of people willing to contribute to it. I’ve also closed one issue.

If you wish to help, you can start by verifying your email address at PyPI.

Hynek Schlawack – How to Write Deployment-friendly Applications

I read few articles from the speaker before and all of them were great, so I was looking forward to this talk. It definitely met my expectations and I highly recommend watching it. In summarizes engineering best practices and adds references to relevant tools.

Recommendations:

Use exec (to preserve process ID and respond to signals)
Log to stdout (or stderr)
Expose port 8000
Use environment variables instead of .ini files.
But do not pass sensitive data (passwords) in environment variables.
Handle sigterm.
Expose /-/readiness /-/liveness.
Look at Warehouse (PyPI) for a good example.

Jose Manuel Ortega – Microservices and Serverless in Python projects

This session was a proof that microservices and serverless attract people. The room was full of people sitting on the floor or standing around on the walls. Unfortunately, most of them, including me, were disappointed, and significant part of audience left during the talk.

Radoslav Georgiev – Django structure for scale and longevity

Django’s division of code to models / views / forms is not sufficient for large applications. I’ve seen multiple examples of real projects where the standard structure got out of control. Don’t be afraid to implement business logic in separate modules. From the talk: “We need more boxes.”

The speaker excellently illustrates all the problems and suggests a solution – new modules services.py and selectors.py. I have problems with naming of those modules, but I like the distinction between read and write operations.

Lynn Root – asyncio in Practice: We Did It Wrong

It’s easy to run into unexpected problems when adopting new technologies. And to be honest, I’m still afraid of asyncio. Listening to real experience could help a lot, but this talk haven’t met my expectations.

Alejandro Saucedo – Industrial Machine Learning Pipelines with Python & Airflow

Catchy presentation prepared for one hour, but presented in 30 minutes. Buzzwords with irony.

When you need to run task asynchronously, you can add Celery. When Celery gets messy and unmanageable, you need tool like Apache Airflow. An introductory talk.

I didn’t know that Airflow uses Celery under the hood.

Sarah Bird – The Web is Terrifying! Using the PyData stack to spy on the spies.

When I read various articles explaining possibilities for user tracking, I considered most of them too obscure to be used in the wild. This talk by a Mozilla employee show that I was naive. That “bad guys” will use whatever is possible.

Includes nice visualization how tracking IDs are shared between domains (companies), including respectable ones.

Develops and recommends Bokeh.

Martin Angelov – Proper Django Testing

Follow up of the “Django structure for scale and longevity” talk. Business logic refactored to separate modules is easier to test.

They are using django.test module which extends unitttest, not pytest.
Faker – generates mock data for testing

Ines Montani – How to Ignore Most Startup Advice and Build a Decent Software Business

Impressive talk, opposing today’s startup culture.

Optimize for median (not mean!) outcome.
Specialists vs. generalists. A compromise is having complementary people.
It’s not possible to A/B test everything.
Profit is the best KPI.

Friday

Ian Ozsvald – Citizen Science with Python

A few examples data-driven humanitarian and healthcare projects developed using Python:

air quality in Macedonia,
wife’s sneezing,
updating outdated medical results about pregnancy,
searching for Orangutans using drones.

Shows that companies or organizations often have data they are not aware of.

Victor Stinner – Python 3: ten years later

History of migration from Python 2 to Python 3 can be an excellent lesson from software management.

The key problem was that running 2to3 and switching all at once to Python 3 was unfeasible in most use cases. Writing code compatible with both versions was troublesome without real advantage, because you cannot use Python 3 goodies when you have to support Python2.

When Python developers realized that 2to3 is not the solution, enhancements for writing compatible code appeared – unicode prefixes in Python 3, bytes formatting, or various Python 3 backports.

Some of bugs (typically race conditions) fixed in Python 3 marked as wontfix in Python 2.

There are new modules in standard library in Python 3. For example zippapp or faulthandler.

Pietro Mascolo – Good features beat algorithms

This talk wasn’t about feature engineering but about feature selection – removing useless features to improve performance of machine learning algorithms. Introduces basic options and shows how to combine them.

Chase Stevens – Exploring the Python AST Ecosystem

Superb talk and stand up comedy. Despite the fact that touching AST is a bad idea in most cases.

Some useful tools:

Sarah Diot-Girard – Trust me, I'm a Data Scientist - ethics for builders of data-based Applications

Trending and important topic – how social biases are unintentionally learnt by machines. Show good and understandable examples using imaginary application for illustration.

Marco Bonzanini – Lies, damned lies, and statistics

Statistics show that eating ice cream causes death by drowning.

Compelling talk showing common misuses of statistics. Includes both textbook and real-world examples.

Sven-Hendrik Haase – Rust and Python - Oxidize Your Snake

Introduction to PyO3. Really cool how simple Rust binding to Python can be, unfortunately not very stable now.

Alec MacQueen – Python and GraphQL

Summary of GraphQL, that I wanted to get but had no time before. Worth consideration when designing next API.

Sprints

For the first time, I attended EuroPython sprints which was a remarkable experience. I selected Warehouse (PyPI), because it's something I know and it's written in Python (unlike CPython).

I arrived a little bit late and the room was crowded by people. I guess that the organizers did not even expected that so many attendees will be willing to work on Warehouse.

I was surprised, how welcoming Warehouse is for the first-time contributors. The excellent Docker setup allows anybody to start hacking and friendly atmosphere makes contribution pleasure.

The ticket I resolved shows that problems are sometimes a lot of deeper than they appear.

Summary

I was slightly disappointed when I found out that there will be only three days of talks this year. In retrospective, I must admit that those three days still provide tons of knowledge and are definitely worth it.

For me personally, many attractive talks were part of the PyData track. I hope that this EuroPython/PyData connection will continue. I would appreciate more advanced and expert topics in other rooms.

Thanks to Akamai for allowing me to be part of this event:)

Photo of Edinburgh

Miloslav Pojman