Python Generator

created:

updated:

tags: python

I remember I was very confused for a while when I first encountered yield in Python code. Once I learn more about what generators are, I think it’s one of the coolest things about Python.

What are Generators?

Introduced with PEP 255, generator functions are a special kind of function that return a lazy iterator. These are objects that you can loop over like a list. However, unlike lists, lazy iterators do not store their contents in memory.

Example of Reading Large Files

def csv_reader(file_name):
    file = open(file_name)
    result = file.read().split("\n")
    return result

The above snippet opens the given file and read the whole file and split them line by line into a list. This can be a problem when the file size is very large as it can cause MemoryError.

Generator way of Reading Large Files

Below is a generator function that we can use to achieve the same thing: open the file and read line by line.

def csv_reader(file_name):
    for row in open(file_name, "r"):
        yield row

In the above snippet, it opens the given file, loop through it line by line, and yield a row. In this case, csv_reader() is a generator function that yields each row instead of returning it. This way, however large the file may be, we’ll be able to loop through it line by line.

  • “Using yield will result in a generator object”
  • “Using return will result in the first line of the file only.”

Example of Generating an Infinite Sequence

Our computer’s memory is finite and at some point, it may run out of it especially if we have an infinite amount of data. Generator comes in handy in this case as well:

def infinite_seq():
    num = 0
    while True:
        yield num
        num += 1

for s in infinite_seq():
    print(s, end=" ")

>>> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
[... continue until we stop manually]

We can also use next() to iterate over a generator manually:

>>> gen = infinite_seq()
>>> next(gen)
0
>>> next(gen)
1
>>> next(gen)
2

Generator Expression

Generator expression (also known as a generator comprehension) is similar to list expression in syntax.

csv_gen = (row for row in open(file_name))

What is yield?

yield indicates where a value is sent back to the caller, but unlike return, you don’t exit the function afterward. Instead, the state of the function is remembered. That way, when next() is called on a generator object (either explicitly or implicitly within a for loop), the previously yielded variable num is incremented, and then yielded again. Since generator functions look like other functions and act very similarly to them, you can assume that generator expressions are very similar to other comprehensions available in Python.

yield from for Nested Generators

yield from enables to yield all values from nested generators before returning value from the parent generator.

Description from Python Docs

PEP 380 adds the yield from expression, allowing a generator to delegate part of its operations to another generator. This allows a section of code containing yield to be factored out and placed in another generator. Additionally, the subgenerator is allowed to return with a value, and the value is made available to the delegating generator.

# Example from Python docs
def g(x):
    yield from range(x, 0, -1)
    yield from range(x)

list(g(5))
[5, 4, 3, 2, 1, 0, 1, 2, 3, 4]
# Example from Effective Python
def child():
    for i in range(1_000_000):
        yield i

def parent_using_yield_from():
    yield from child()

def parent_yield():
    for i in child():
        yield i

Advantages of using yield from

  • The code is clearer and more intuitive by using yield from
  • yield from provides better performance than iterating nested generators and yielding their outputs

References