Python: range is not an iterator

[-]

Noxitu@reddit

I learned something from this article, but I feel some of readers might end up confused in a different way after reading it.

In Python 3, enumerate, zip, reversed, and a number of other built-in functions return iterators

This statement is my main issue. While factualy correct, objects returned by these functions are both iterators and iterables. And in vast majority of cases they are used as iterables - in fact direct use of iterators is very rare in Python. The fact that they are iterators at all feels a bit like an implementation detail that you shouldn't rely on.

[-]

JanEric1@reddit

But you definitely need to be aware that they are iterators, because you can consume an iterator only once. While a (non-iterator) iterable can be consumed many times.

[-]

PeaSlight6601@reddit

I think you can (and should) argue that python iterators are NOT iterable.

We all agree that: thing = (x for x in range(10)) should have 10 members right?

So why does this print 0:

 thing = (x for x in range(10))
 for _ in iter(thing):
    pass
 print(len(list(thing)))

How can walking down the iterator of an iterable modify the iterable?

[-]

theferrit32@reddit

It's wild to me that this code block is completely different if you change it to

thing = [x for x in range(10)]

I don't think I like that.

[-]

pojska@reddit

It's not that suprising. Different brackets mean different things in python. thing = {x for x in range(10)} yields a set, and thing = {x: true for x in range(10)} yields a dict.

[-]

theferrit32@reddit

Yes but the expressions (1, 2, 3) and [1, 2, 3] both construct sequential collections in Python. One being a tuple and one being a list. I expected (x for x in range(10)) to result in a tuple. range(10) is a generator and the syntax x for x in thing, normally, when it appears in code (since most instances I've seen are list or dict or set comprehension), iterates through whatever thing is, creating another realized collection.

[-]

pojska@reddit

That's fair, it's certainly not obvious that it would create a generator. A tuple would make sense.

[-]

ProfessorFakas@reddit

You... don't like that different syntax... does a different thing? Do you want one to be a duplicate of the other?

[-]

theferrit32@reddit

I wouldn't expect it to do the exact same thing, I expected (x for x in range(10)) to construct a tuple like (1, 2, 3) does, analogous to [x for x in range(10)] constructing a list like [1, 2, 3] does.

[-]

PeaSlight6601@reddit

They don't have to be duplicates. One could still have square brackets compute and store in memory and parens be lazy and compute at run time.

[-]

curien@reddit

How can walking down the iterator of an iterable modify the iterable?

Because that's just how some data structures work. If you "walk through" a traditional stack or queue, you end up with an empty container. If iterators had to be reusable, then some types of data structures can't be iterated at all.

[-]

PeaSlight6601@reddit

I can walk through stacks and queues all day, but they will still exist after I walk them.

The act of merely calculating the location of the elements of the stack/queue doesn't pop/unlink them.

[-]

curien@reddit

I can walk through stacks and queues all day, but they will still exist after I walk them.

They'll exist, but with nothing in them.

In a traditional stack, for example, you have two or three operations: push, pop, and sometimes top. If a stack has more than one element, you cannot get to its second element without removing the top one.

[-]

PeaSlight6601@reddit

If your stack had no way to walk it but popping you probably wouldn't make it iterable. A true stack like that is conceptually very different from iterables.

[-]

curien@reddit

If your stack had no way to walk it but popping you probably wouldn't make it iterable.

Only if we use your definition of iterable (which is different from what most us actually use). That's what I said: "If iterators had to be reusable, then some types of data structures couldn't have iterators at all."

This isn't just a Python thing. Back in the 90s, the C++ STL (which wasn't even originally written in C++ but in Ada) used iterator terminology, and it defined classes of iterators based on capability. Input, Output, and Forward iterators were not re-useable, but "Bidirectional" and "Random" iterators were.

[-]

PeaSlight6601@reddit

Input and Output iterators are not exhausted by walking them. They are still valid iterators. Other users of those iterators may observe different contents, but they will observe something.

I think that is very different from the generator in python which once walked by one thread is forever empty.

Ultimately it just isn't necessary. It would be easy enough for generators expressions to be implemented as something that returns a new invocation of the iteration when iter is called on them. This would be much less surprising to most novice programmers.

[-]

curien@reddit

Input and Output iterators are not exhausted by walking them.

If you walk an istream_iterator to EOF, it's exhausted.

Other users of those iterators may observe different contents

Yes, that's what we're talking about. That is the behavior that you have said, repeatedly, should not be allowed for something called an "iterator". So I'm not sure what point you think you're making now.

[-]

PeaSlight6601@reddit

Sorry, I wrote "iterator" when I meant to write "iterable." I hadn't had my coffee yet.

As I have been saying the python generators are not "iterable." I don't object to an iterator that reaches its end, that is expected of an iterator, the question is if the iterator itself is an iterable thing. I don't think it is.

A file is an iterable thing, when you open the file you get an iterator on that file. You can exhaust the iterator, that is fine and expected (when you iterate across a list you eventually reach the end), but the thing you iterated, the file or list respectively, still exists and is still iterable.

You cannot do that with a generator.

[-]

curien@reddit

the thing you iterated, the file or list respectively, still exists and is still iterable.

If the file is stdin piped from another program, you can try, but it'll still be at EOF.

Even for a normal file, you could lose access.

If the file is something like /dev/urandom, you can try going back to the beginning, but you won't get the same data.

[-]

PeaSlight6601@reddit

Yes, you could lose access, or some other thread/process could remove all the members of a list, or any number of things could happen, but they aren't caused by the act of you iterating the iterable.

Python generators, and data piped in from standard in, are unique in that it is the act of iterating across the thing itself which causes it to empty. That isn't a behavior that I think is appropriate for an iterable, and I don't think of those things as truly being iterable.

Programming concepts don't have to be perfectly followed, and I'm not such a perfectionist to suggest that reading from standard in should have some distinct syntax to mark it as being a "single-use iterator," but I do think that should be the exception not the norm. I think it is a mistake in the python language to have generators return self when iter is called on them. The vast majority of programmers would expect such a call to restart the execution of the closure from the beginning, not to pick up in the middle.

[-]

jdehesa@reddit

Python generators are primarily iterators. They are also iterables because all iterators are, for ergonomic reasons, but their iterator is themselves. In your example, thing and iter(thing) are literally the same object.

thing = (x for x in range(10))
print(thing is iter(thing))
# True

So it is not that an iterator is modifying an underlying collection or anything like that. Although it is not uncommon to have an iterator that consumes another one (in fact, technically, thing is consuming an iterator of range(10)).

[-]

PeaSlight6601@reddit

I called iter and walked what was returned. The original thing was modified in the process.

Python is duck typed right... So if it walks like a duck...

Don't give bullshit about how it was the same object, it's not the programmers responsibility to call id on something after calling iter

[-]

ProfessorFakas@reddit

You got a pointer to an object, operated on it, and therefore operated on the object. I'm not sure why that's problematic. If you need it to be static, turn it into a list or something similar beforehand.

If you're handing off an iterator to some other code and you don't know what's going to happen to it, then you shouldn't depend on it. If you need it to be static, then put it in a list or similar.

Iterators are typically iterators for a good reason. Implicitly making them static and allocating memory for every possible item would be a terrible idea in many cases.

I really don't think this is a problem. They're a tool, use them in the right way. IMO the only issue is that the two words sound very similar and that can be confusing.

[-]

PeaSlight6601@reddit

I didn't do anything to the object.

In C terminology I got a pointer to a list, and it's length, and then calculated the location of all the elements in the list.... And the list disappeared?

[-]

JanEric1@reddit

Just, the car would be driveable. However it would be driveable just once... Just like iterators are iterables that can only be iterated over once. But being able to iterate over something multiple times is not required for an iterable.

In python things are iterable if they implement __iter__ which gives you and object that implements __next__. And iterators do that.

[-]

PeaSlight6601@reddit

I disagree. The act of iterating on an iterable should NOT change the iterable.

[-]

pojska@reddit

Too bad for you, I suppose.

[-]

Kache@reddit

Iterators maintain the state of walking through an iterable. They don't "modify" the iterable at all.

[-]

PeaSlight6601@reddit

Funny, because I just gave you code where walking the iterator did modify the "iterable" object.

[-]

Kache@reddit

Your mental model of your own code is incorrect, as demonstrated:

iterable = range(10)  # stateless
iterator = (x for x in range(10))  # maintains state
for _ in iter(iterator):
    pass
print(len(list(thing)))  # 0, b/c state was modified
print(len(list(iterable)))  # iterable is still the same & unmodified

[-]

PeaSlight6601@reddit

It is not a mental model issue. Python generator expressions return self when __iter__ is called.

Any function that responds to iter is "iterable" in the basic sense of the english language. You can call iter and get an iterator from it, therefore you can iterate it.

Obviously one is not going to write (x for x in range(10)) and this is just short-hand for some more complex statement like (compute(foo) for foo in something if condition(foo)) where you definitely cannot just drop in the list-like something.

[-]

masklinn@reddit

While a (non-iterator) iterable (or more specifically a sequence) can be consumed many times.

The caveat is critical, because the sentence is otherwise just wrong, there is no guarantee that an iterable is repeatable in the general case e.g. files are iterable.

[-]

Mysterious-Rent7233@reddit

All iterators are iterables, so your statement applies to all of them.

[-]

jdehesa@reddit

The key insight is that iterators are iterables - by means of returning themselves on __iter__. The opposite is, in general, not true. The nice thing is that for most purposes wherever an iterable is expected (such as in a for loop) you can use an iterable or an iterator.

So, for example, if you need to process the elements of a list in a loop but want to skip the first element in the list, you can do the ugly thing which is to enumerate it and check the index is not zero on every iteration. Or you can make an iterator from the list with iter, call next on it once and then loop through the rest of the iterator. But the distinction between the original iterable (the list) and the iterator is important. For example, typically the iterator of a collection will raise an exception or work incorrectly if the underlying collection has been modified mid iteration.

[-]

zjm555@reddit

The article goes out of its way to explain that all iterators are themselves iterables (by virtue of returning themselves via __iter__).

[-]

jdehesa@reddit

In fact, it is a Sequence.

[-]

BleakBeaches@reddit

Which itself is an iterable

[-]

lood9phee2Ri@reddit

Well, the key point is that Iterable and Iterator are different things in Python, there just are various things in python that are Iterable while not being Iterator.

    r = range(0, 9)
    isinstance(r, collections.abc.Iterator)
False
    isinstance(r, collections.abc.Iterable)
True

Funny enough, the names Iterator and Iterable also identify distinct things in Java too - though not the exact same as the Python case. Something to bear in mind coming to Python from Java or vice-versa.

[-]

BleakBeaches@reddit

Right, iterators ingest and/or operate on iterables. This is a common object model across most OOP languages no?

[-]

lood9phee2Ri@reddit

Well, in general terms. But worth noting in context Python gets especially weird about things, because duck typing, historical python-specific idiosyncratic meanings of "abstract base class" etc. As the Python documentation says "The only reliable way to determine whether an object is iterable is to call iter(obj).".

So you can also easily construct iterables that also aren't instances of Iterable. Fun for all the family.

    class Blah(object):        
        def __init__(self):
            pass
        def __getitem__(self, key):
            if key == 0:
                return "blorp"
            else:
                raise IndexError(key)
        def __len__(self):
            return 1

    b = Blah()

    isinstance(b, collections.abc.Sequence)
False

     isinstance(b, collections.abc.Iterable)
False

    for i in b:
        print(i)
blorp

Intradesting.

[-]

BleakBeaches@reddit

So kinda like manually implementing an Interface by declaring the function overrides directly in the class?

[-]

yanitrix@reddit

the compiler doesn't care for the type, it just cares for methods __getitem__ and __len__ when it comes to for loop.

Fun fact: foreach and async/await works the same way in c#

[-]

PeaSlight6601@reddit

foreach is at least a different keyword from for and it makes it a bit clearer exactly what is going on.

In a traditional c-style for loop one is directly tracking the index. In other words you build your own iterator by declaring i and incrementing it.

In a foreach loop you are asking the object/compiler to do that work for you. In principle foreach would allow the object to be indexed without using a sequence of integers as the keys.

[-]

lood9phee2Ri@reddit

note Python in particular simply has no c-style for though, its for is akin to some other languages' foreach, the subject of the original article (range) is used in the normal python idiom

for i in range(0, 10): print(i)

Note range() is lazy (in modern python, once upon a time range() was eager and xrange() was lazy), this is not especially inefficient.

So long as you're not trying to learn Python based on analogy from some other language you already know, it's fine ;-)

[-]

PeaSlight6601@reddit

Agreed.

The problem of course is that everyone learns every language by analogy. So lots of people come to python from other languages and get confused by the behavior of generator expressions and for.

If python had used foreach things might be a little clearer (and then perhaps they could reserve for with an explicit call to iter for other cases.

  foreach x in thing: # allowed and sugar for 
  for x in iter(thing):

Then generators could potentially distinguish between themselves and the iteration of themselves.

[-]

lood9phee2Ri@reddit

Meh. Just the way it is in python, what amounts to implicit iter() is defined to always be there.

https://docs.python.org/3/glossary.html#term-iterable

When using iterables, it is usually not necessary to call iter() or deal with iterator objects yourself. The for statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop.

Then generators could potentially distinguish between themselves and the iteration of themselves

If you want a new pass, can always just get a new generator instance anyway, like that's exactly what a new call to your generator-function gives you anyway. it's just not a big deal.

def foo(up): yield from (x for x in range(0, up) if x % 3 == 0)

a = foo(20)

for i in a:
    print(i)

b = foo(30)

for i in b:
    print(b)

of course beware trying to use shortcut generator expression syntax overly nontrivially, you're expected to switch to named generator-functions for anything complicated, stylistically, and the threshold should be pretty low.

For more complex applications, full generator definitions are always superior in terms of being obvious about scope, lifetime, and binding - PEP289

Think a bit like the way Python lambda has also been deliberately constrained. Python tends to stylistically favor giving names to even the little things.

I don't necessarily like that personally even, I use a lot more anonymous stuff in Lisp, but in a work context best not fight basic idioms of the language you're using.

[-]

lood9phee2Ri@reddit

technically it doesn't even care for __len__, I guess I just added that out of habit. The protocol that signals end of iteration in the example is the raising of the IndexError, that the for is then defined to swallow.

https://docs.python.org/3/reference/datamodel.html#object.getitem

Note: for loops expect that an IndexError will be raised for illegal indexes to allow proper detection of the end of the sequence.

If implementing an iterator properly instead of the __getitem__() fallback you should also use StopIteration not IndexError.

https://docs.python.org/3/library/stdtypes.html#iterator.next
https://docs.python.org/3/library/exceptions.html#StopIteration

[-]

lood9phee2Ri@reddit

Eeeh, it doesn't really matter if an interface is implemented in any static nominative typing Java-like sense though.

Note in the python case a thing is iterable when at runtime iter() can return an iterator for it, and that really only depends on a __getitem__() existing (it's better to implement __iter__(), but for reasons python has fallback to __getitem__() ) at runtime.

I'm not saying this is wrong, but it is different to what some people may be used to.

Underneath, Python is very much not doing what programmers coming other more static languages often seem to think it is, it is usually far more dynamic. That can have significant performance implications of course but e.g. the below also works fine (probably best not do it in your code without exceptionally good reason, just illustrative).

     class Foo:
        pass

    f1 = Foo()

   for i in f1:
        print(i)

TypeError: 'Foo' object is not iterable

    f2 = Foo()

    def wangle(obj):
        def my_get_item(self, key):
            match key:
                case 0: return "blip"
                case 1: return "bloop"
                case _: raise IndexError(key)
      # yes you can change an existing instance's class at runtime
      obj.__class__ = type(
            'Wangled', (obj.__class__,), {'__getitem__': my_get_item}
        )

    wangle(f2)

    for i in f2:
        print(i)

blip
bloop

[-]

slaymaker1907@reddit

I’d say it’s more that iteratable objects produce iterators.

[-]

PeaSlight6601@reddit

Java is correct. Python is wrong.

This is almost universally true of things related to the type-system. Python's type hierarchy is trash.

Obviously there are some downsides of the Java type system and it can be a bit excessive, but at least it follows the english language.

An iterable is something that is "able to be iterated". An iterator is something that does the walking. So what you should do is construct an iterator from an iterable, by calling iter on it. But in python the iter becomes syntactic sugar and leads to some ambiguity, particularly around generator expressions.

What is the difference between:

  list_thing = [compute(v) for v in range(1000)]
  gen_thing = (compute(v) for v in range(1000))

In principle very little. We are expressing a preference between storing list_thing in memory, and computing gen_thing as needed, but there is no fundamental reason for them to behave differently... but they do:

def run(thing):
    for _ in thing:
        if condition(_):
            break
    for _ in thing:
        frobnicate(_)

Will do very different things when given list_thing vs gen_thing. If you want a list_thing to behave like a gen_thing you can do so by calling iter on it, but to make a get_thing behave like a list either requires lots of work with itertools, or loading the whole thing into memory by converting to a true list.

[-]

backfire10z@reddit

To require a gen thing to work like a list thing you just call list(gen_thing)

[-]

PeaSlight6601@reddit

Maybe you can't hold all that stuff in memory? Maybe it is easy to compute, but too large to store in memory.

You end up having to do things like: thing = lambda: (compute(x) for x in ...) and then calling it as for y in thing().

Sure you can work around it, but this is something that commonly confuses python newbies, and it is all because generators return self in their __iter__ method so as to support some syntactic sugar in for loops.

I'm not sure that the convenience of the for loop outweighs the type confusion.

[-]

abotoe@reddit

Why wouldn’t generators return themselves if you want an iterator? Generators are iterators. Seems pretty logical to me. I’m sorry but if you’re getting confused because a language provides different mechanisms to do that allow you to efficiently implement a common need (like iteration) that appears in circumstances with different design constraints, that sounds more like a “you” problem rather than a problem with the language.

[-]

NoInkling@reddit

I guess the alternative is that __iter__ could return a new iterator instead of the existing one, in effect resetting the iteration (allowing you to use for on it multiple times). But in that case why wouldn't you just define a generator function and call it each time as needed instead?

[-]

backfire10z@reddit

Maybe you can’t hold all that stuff in memory?

Then how would you use a list? This is my point.

[-]

PeaSlight6601@reddit

You can't. That is the point.

I have a thing that is:

Very long
Possibly unknown length
Computing the next value is very fast and easy
Storing everything in memory is impossible due to size
Random access is not allowed.
No restriction or reason it could not be traversed multiple times.

But for the last requirement this thing is perfect for a generator expression. However that last requirement means you can't really use a generator expression.

[-]

lood9phee2Ri@reddit

Can also use a generator to generate generators. ;-)

def gengen(up):
   while True:
        yield (x for x in range(0, up))

gs = gengen(3)

or there's

gs = iter(lambda: (x for x in range(0, 3)), None)

That's slightly different insofar as the latter is technically not a generator, but you can then do

gs = (y for y in iter(lambda: (x for x in range(0, 3)), None))

if you really just gotta make it a generator generator.

Anyway, then you can do

for i, js in zip(range(0, 2), gs):
    for j in js:
        print(i, j)

0 0
0 1
0 2
1 0
1 1
1 2

Shrug.

[-]

pojska@reddit

```python

for i in create_gen_thing():

print(i)

for j in create_gen_thing():

print(j)

```

[-]

backfire10z@reddit

Sorry I made some edits to my comment not sure if you saw, I will ask again: is there something in Java that does this better? What would you suggest?

lambda: (compute(x) for…)

Seems like a fine option to me, but I’m not that experienced.

[-]

netfeed@reddit

if it quacks like a duck and walks like a duck?

[-]

smelly-dorothy@reddit

A sequence is an iterable that implements __getitem__ and __len__. Two of the iterator types are sequences and generators.

But yeah, someone made a blog article about reading the documentation..

[-]

treyhunner@reddit

This is a confusing section of documentation, mostly because the title is misleading. The first part talks about the one method that iterables need, __iter__, which must return an iterator. The next part talks about the actual iterator type objects.

It seems like this documentation section should either be split into two sections (iterable types and iterator types) or it should be renamed to "Iterable Types".

[-]

tenken01@reddit

Don’t care. I have a CS degree (not a boot camper) and use statically typed languages for things that matter and python for throw away scripts like it was designed for.

[-]

joey_nottawa@reddit

Much of the AI work I witnessed at Google was done using Python.

[-]

tenken01@reddit

Who cares

[-]

joey_nottawa@reddit

Curious, are you a Java programmer?

[-]

nikvid@reddit

Your mom, she does a lot of work with my python.

[-]

ptousig@reddit

"do not use the information below as an excuse to be unkind to anyone"

More programming articles should include this line. We tend to be an arrogant bunch.

[-]

XtremeGoose@reddit

Not sure how you can write a whole article about iterables and iterators and not give the definitions (straight from collections.abc)

class Iterable[T](ABC):
    @abstractmethod
    def __iter__(self) -> Iterator[T]: ...

class Iterator[T](Iterable[T])
    @abstractmethod
    def __next__(self) -> T: ...

    def __iter__(self) -> Self:
        return self

[-]

probability_of_meme@reddit

It might seem like I’m nitpicking in saying that range isn’t an iterator, but I really don’t think I am.

I do. We're really just talking about the difference between iterator and iterable and 99.9% of the time it's just pedantry.

[-]

Mysterious-Rent7233@reddit

There's a huge difference between a list and an iter(list). You can loop over the former multiple times and the later only once. Not understanding this can cause very confusing bugs.

[-]

probability_of_meme@reddit

Sure, and a programmer should be aware of this difference for the 0.1% of times when it matters when we talk about iterators. Your example is pretty basic and something I don't think veteran programmers get wrong very often.

[-]

Mysterious-Rent7233@reddit

Veteran programmers don't get it wrong often because they know the difference between an iterator and an iterable!

How else would you avoid this error?

[-]

probability_of_meme@reddit

How about by not calling iter() when there's no need to? Like the author of this blog, I feel like you're trying to make it sound more important than it is.

[-]

Mysterious-Rent7233@reddit

As soon as you enumerate a list you'll end up with an iterator.

[-]

probability_of_meme@reddit

Maybe you're just learning so I'll excuse your confusion, but 99.9% of the time, you use the results of these functions to drive a loop and there's no need to delve further into the fine details.

for i, obj in enumerate(somelist):
    ...

It seems very important to you that everyone be super-aware of the remaining 0.1% and at the end of the day, you got to do you.

But this article and your arguments supporting its critical importance to me are very close to ridiculous. And that's just me. You're free to disagree and downvote but it doesn't make me wronger.

[-]

LIGHTNINGBOLT23@reddit

This article is clearly for those who do work with Python outside of making the trivial scripts that you're used to. It's not a finer detail but a basic explanation of Python objects. It doesn't even go into __iter__ and __next__.

[-]

Mysterious-Rent7233@reddit

No, I've been programming Python since v 1.5.2 and I implemented one of the lazy functions that we would now call iterators.

I'm not sure why you think it is unusual to assign the results of e.g. zip to a variable.

This is the sample code describing zip from the python.org documentation:

>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> zipped = zip(x, y)
>>> list(zipped)
[(1, 4), (2, 5), (3, 6)]
>>> x2, y2 = zip(*zip(x, y))
>>> x == list(x2) and y == list(y2)
True

I guess the people who wrote that code are also "new to Python".

One important reason that even someone extremely new to Python needs to know the difference is because if you assign the value to a variable for debugging, and then you print the value out, you will break your program because now the iterator is exhausted before it gets to your for-loop.

[-]

happyscrappy@reddit

Far more than 0.1% of the time. The change to have iterators and not just lists was a huge part of why python3 couldn't run so many python2 scripts.

[-]

nekokattt@reddit

Unlike other programming languages, Python muddies the waters here by saying that iterators are iterable across themselves. That is the issue here. Iterators are iterable but iterables are not iterators, and iterating across an iterable uses the iterable itself (usually).

[-]

JanEric1@reddit

iterating across the iterator uses the iterator itself. And that is actually required according to the python docs

Iterators are required to have an iter() method that returns the iterator object itself

though obviously not enforced.

[-]

augustusalpha@reddit

Type-vangelists, assemble!!

[-]

BearBearBearUrsus@reddit

Nice article.