Iterators for Curious Minds

My talk given at PyCon CZ 2019.

Abstract

Do you know the difference between iterators and iterables? Are you able to implement a custom collection? Why something so basic as a for-loop got its chapter in the famous Gang of Four book?

The iterator protocol is a prime example of Python language design.

I will not only show how to implement custom iterators but also illustrate how other Python features are provided using similar idioms. I will informally introduce the iterator design pattern and show how it is built into the language. After mentioning some examples of iterables, we will look into Python data model, implement some generators, and maybe touch async frameworks.

Video

Transcript

Iterators for Curious Minds - Slide 1

Here is a transcript of my talk.

PDF version of my slides is available too, but the slide deck is meant for illustration, so it may not be useful alone.

Iterators for Curious Minds - Slide 2

Hi, thank you for coming. It’s great to see so many curious minds.

My name is Miloslav Pojman, and my job to make the Internet faster. I’m very curious, especially about Python.

In this talk, I would like to share with you what my curiosity taught me about Python for loops.

Iterators for Curious Minds - Slide 3

You know, the for loops: For something in something else. Seriously, we often forget what smart software design is hidden beneath that statement.

After this talk, you should understand what’s happening under the hood when you write a for loop. With this knowledge, you should be able to write custom classes that can be iterated over. I will show you

Iterators for Curious Minds - Slide 4

how iteration fits into Python data model.

You probably know the iter() function, but you may not be why the following expression is useful.

Iterators for Curious Minds - Slide 5

Let’s make a short poll:

Who knows what does the iter(v) is v condition return? Who knows why this expressions can be useful, please raise your hand.

Iterators for Curious Minds - Slide 6

I have one more poll for you:

Who read the “Gang of Four” book? Who read “Design Patterns: Elements of Reusable Object-Oriented Software”, please raise your hand.

The Gang of Four describes 23 classic design patterns, including the iterator. I would not be afraid to call it a bible of software design.

Iterators for Curious Minds - Slide 7

Martin Fowler describes it as “the best book ever written on object-oriented design.”, maybe on software design overall.

Iterators for Curious Minds - Slide 8

Even so, many Python developers don’t feel the need to read it. I assume that we do not encounter the problems that the book is solving. I think that it is because Python solved those problems for us by incorporating design patterns into the language.

The iterator is a great example: we use iterators every day, but we may not realize it because they are hidden.

Language Comparison

Iterators for Curious Minds - Slide 9

Design patterns, including the iterator, are not specific to one programming language. So, for motivation, let’s look at examples of iteration in other languages.

Iterators for Curious Minds - Slide 10

In C++, maps are an equivalent of Python dictionaries. The maps in C++ are implemented using red-black trees. You don’t have to know about the red-black trees, the iterator does that.

The code example on the slide is quite compact if you consider what logic is necessary. But still, the example on the slide is longer than it would be in Python because the iterator is visible.

Iterators for Curious Minds - Slide 11

For comparison, look at JavaScript. JavaScript supports a for-in statement for iterating over objects.

Unfortunately, the statement is almost useless because its behavior is hardcoded and it cannot be customized. If you wrote a map class, it would not be supported by the for-in statement.

In C++, we can see iterators without syntactic sugar. In JavaScript, we can see syntactic sugar without iterators.

Both these states are suboptimal. Both C++ and JavaScript addressed that by introducing a new syntax that uses iterators.

Back to Python.

Iterators for Curious Minds - Slide 12

We should appreciate that Python since its beginning provides a nice for-loop syntax.

And since Python 2.2, since year 2001, the for-loop syntax uses iterators under the hood.

Iterators for Curious Minds - Slide 13

If you are interested about iterators’ history, I recommend you David Beazley’s talk on that topic. What I found most interesting is that the current model started as a hack abusing indexing of sequences.

Iterator Pattern

Iterators for Curious Minds - Slide 14

If this has to be a proper software engineering talk, we have to start with the serious stuff.

Iterators for Curious Minds - Slide 15

The UML.

You can see an UML diagram of the iterator pattern. A very similar one can be found in the Gang of Four book. Do you understand it? Do you like it? Don’t worry, it’s not as bad as it looks like

First, ignore that the diagram has two rows.

Iterators for Curious Minds - Slide 16

The top row is exactly same as the bottom one. The top row describes how it should look like, and the bottom says that it looks like it should:)

Now look what’s on the left and what’s on the right.

Iterators for Curious Minds - Slide 17

On the left is a container, aka an aggregate, aka a collection. The container is an object that holds other objects. It can be a list, a set, a dictionary, you name it.

On the right is an iterator. The iterator is what you may not see, but what is necessary, because the iteration is implemented there. The iterator is created when a for loop is encountered.

So the whole UML diagram shrinks to an observation that there is something hidden called iterator, and it is created when necessary.

Let’s look how this is translated to Python code.

Iterators for Curious Minds - Slide 18

In Python, the iter() function returns an iterator for a given container. The iter() function is called once for each for loop, at the begging of the iteration.

For getting individual items, the next() function is available. When next() is called for the first time, it returns the first item. When you call it again, it returns subsequent items. When there are no more items to be produced, the next() function raises StopIteration.

The next() function is called once for each loop cycle, plus once on top that of that to receive the StopIteration exception.

Iterators for Curious Minds - Slide 19

In Python, there is no method to check whether a next item is available. You have ask for that item and catch possible exception.

This approach is called EAFP, “Easier to ask for forgiveness than permission”. Some may argue that throwing exceptions for normal control flow is a bad practice, but this coding style is common in Python.

Iterators for Curious Minds - Slide 20

A curious mind should ask: What superpowers does the iter() function have? How is it able to construct an iterator for any container it gets?

A curious mind should ask: How can the next() function understand to all possible iterators?

Iterators for Curious Minds - Slide 21

Answers to previous questions will be obvious if we observe a common pattern in the the Python data model.

The str() builtin calls __str__(),
len()calls __len__(),
bool() calls __bool__(),
and so on.

It won’t surprise you that the iter() function calls __iter__() and next() calls __next()__. Just be careful, in Python 2, they forgot the underscores in __next()__, so upgrade to Python 3 to fix that bug.

Iterators for Curious Minds - Slide 22

I you saw some Python talks before, you probably noticed that these double-underscore methods are often abbreviated “dunder”. So, dunder-iter, dunder-next.

Python provides convenient syntax and builtins which delegate to double-underscore methods which can be overridden to customize Python behavior.

Containers vs. Iterators

Iterators for Curious Minds - Slide 23

A curious mind should ask: Why do we need iterators? Isn’t iteration something that should be implemented in containers? Why do we need separate classes for that?

I will give you two reasons.

Iterators for Curious Minds - Slide 24

The first reason is a software-engineering design principle called separation of concerns.

Let’s look at a Python list. I can call the iter() function and I will get items in order. Or, I can decide to call the reversed() function, and I will get items from the last one to the first one. Thanks to separation between containers and iterators, you can come up with any iteration order and write an iterator that will execute it.

Iterators for Curious Minds - Slide 25

The even more important reason for separating iterators from containers is that iteration state has to be stored somewhere.

Python strings or tuples are immutable, they cannot be changed after creation. Something that is immutable cannot hold a current index.

Even if we ignored immutability, it would not be a good idea to store position in containers. One container can be used in two nested loops, so storing the position with the container would not be sufficient.

That’s where iterators become handy. For each for loop, for each iteration, a new iterator with a new variable storing a position is created.

Iterators for Curious Minds - Slide 26

I have shown you that we need two kinds of things: the containers and the iterators. What I haven’t told you is that both the containers and the iterators are iterable.

We use containers in for loops routinely.

You can add an iter() call to a for loop and its behavior will not change.

Or more importantly, you request an alternative iterator, for example using the reversed() function.

Iterators for Curious Minds - Slide 27

This is specific for Python. For example in Java, iterators are not iterable. Also the Gang of Four book does not mention this tweak.

How does this work?

Iterators for Curious Minds - Slide 28

It’s quite simple. When you call iter() on an existing iterator, the iterator is returned unmodified.

Do you remember when I was explaining that each for loop creates a fresh iterator? Well, it’s not true if you use an existing iterator. It means that iterators cannot be reused.

Iterators for Curious Minds - Slide 29

Reuse of an existing iterator can lead to unexpected bug. For example, look at the slide. The second for loop does not print anything. The reversed iterator is exhausted after the first loop.

Iterators for Curious Minds - Slide 30

Now you have enough information to answer my question from the first poll: What does the iter(v) is v condition return and why it can be useful?

It checks whether the given value is an iterator. It can useful because when we know that an object is an iterator, then we know that it cannot be used more than once. So, if we plan multiple iteration, we can take some countermeasures, for example store product of the iterator to a list.

Iterators for Curious Minds - Slide 31

Speaking about iterables, special mention belongs to files and file like objects. You probably know that you can iterate over a file object to get its content line by line.

The question is: Are file objects containers or iterators? Can they be reused?

The correct answer is that files are iterators. They are exhausted after the first iteration. When you open a file, the operating system tracks position in that file for you, so you cannot have more than one position.

Think of the standard input stream. When you iterate over it, a user has to type one the keyboard. She will not retype all input just because you added another for loop to your code.

Custom Iterables

Iterators for Curious Minds - Slide 32

Now, you should understand now what happens when you write a for loop. Let’s use this knowledge to write a custom class, that can be iterated over.

Iterators for Curious Minds - Slide 33

Let’s write a DateRange class. The DateRange class will behave very similarly to the Python built-in range, but it will produce date instances instead of numbers.

The class will need the start and the stop of the range.

Iterators for Curious Minds - Slide 34

If the class has to be iterable, it has to implement the __iter__() method. This method has to return an iterator. Let’s call that iterator DateRangeIterator.

What you can see is a half of an UML diagram. We implemented the left side of it, the container. Let’s move on to the right side.

Iterators for Curious Minds - Slide 35

The DateRangeIterator has to remember the state of the iteration. Let’s store what date value should be produced next. And of course, we have to remember when to stop.

Don’t forget that iterators are iterables too. We have to implement the __iter__() method which should return self.

You should remember that the iterators are advanced by calling the __next__() method.

The __next__() method should raise StopIteration when the stop is reached.

Iterators for Curious Minds - Slide 36

In other cases, the current date should be returned and advanced by one day. And that’s all. We are done. We implemented the whole UML diagram. Was that difficult? What do you think? Can we simplify it? Do I hear generators?

Generators

Iterators for Curious Minds - Slide 37

Yes, we can do better. The magic keyword is called yield.

Iterators for Curious Minds - Slide 38

The DateRangeIterator class, the right side of the UML diagram, can be replaced by the __iter__ method that you can see on the slide.

Any function that contains the yield keyword returns a generator, which is a special type of an iterator. When next is called on that iterator, the function is executed to the yield statement. Then it is suspended, waiting to be resumed by another next call.

Iterators for Curious Minds - Slide 39

Our example can be simplified even more. You don’t need a class to return a generator. A simple function is enough. Five lines of code can replace both classes we defined a few slides ago.

But don’t forget that without a container, you are directly creating a generator, and generators are iterators, and iterators are exhausted after the first use.

Iterators for Curious Minds - Slide 40

Generators would be a great topic for a separate talk, or more likely a workshop. I won’t go into details here.

My recommendation is: When you see a list, consider replacing it by a generator.

The code on the slide somehow reads a file and appends processed lines to a list.

Iterators for Curious Minds - Slide 41

If I use a generator instead, I save two lines of code and potentially lots of memory.

Iterators for Curious Minds - Slide 42

List comprehensions can be replaced by generator comprehensions. Just replace square brackets by rounded parentheses.

Iterators for Curious Minds - Slide 43

Iterators and generators can be piped into each other, composing large data flows with almost no memory requirements.

This is a really powerful construct. The whole async/await support, which landed to Python recently, is built on top generators.

Summary

Iterators for Curious Minds - Slide 44

Time is running, so let’s sum it up. Iterators for Curious Minds - Slide 45

We have two kind of iterables:

Containers
Iterators

Iterators for Curious Minds - Slide 46

Containers, which can be iterated again and again.

Iterators for Curious Minds - Slide 47

Iterators are one-off, they are exhausted after the first use.

Iterators for Curious Minds - Slide 48

I hope that you learnt something new. Thank you for your attention.

License

Slides are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.