I learned something from this article, but I feel some of readers might end up confused in a different way after reading it.
In Python 3, enumerate, zip, reversed, and a number of other built-in functions return iterators
This statement is my main issue. While factualy correct, objects returned by these functions are both iterators and iterables. And in vast majority of cases they are used as iterables - in fact direct use of iterators is very rare in Python. The fact that they are iterators at all feels a bit like an implementation detail that you shouldn't rely on.
But you definitely need to be aware that they are iterators, because you can consume an iterator only once. While a (non-iterator) iterable can be consumed many times.
It's not that suprising. Different brackets mean different things in python.
thing = {x for x in range(10)} yields a set, and thing = {x: true for x in range(10)} yields a dict.
Yes but the expressions (1, 2, 3) and [1, 2, 3] both construct sequential collections in Python. One being a tuple and one being a list. I expected (x for x in range(10)) to result in a tuple. range(10) is a generator and the syntax x for x in thing, normally, when it appears in code (since most instances I've seen are list or dict or set comprehension), iterates through whatever thing is, creating another realized collection.
I wouldn't expect it to do the exact same thing, I expected (x for x in range(10)) to construct a tuple like (1, 2, 3) does, analogous to [x for x in range(10)] constructing a list like [1, 2, 3] does.
How can walking down the iterator of an iterable modify the iterable?
Because that's just how some data structures work. If you "walk through" a traditional stack or queue, you end up with an empty container. If iterators had to be reusable, then some types of data structures can't be iterated at all.
I can walk through stacks and queues all day, but they will still exist after I walk them.
They'll exist, but with nothing in them.
In a traditional stack, for example, you have two or three operations: push, pop, and sometimes top. If a stack has more than one element, you cannot get to its second element without removing the top one.
If your stack had no way to walk it but popping you probably wouldn't make it iterable. A true stack like that is conceptually very different from iterables.
If your stack had no way to walk it but popping you probably wouldn't make it iterable.
Only if we use your definition of iterable (which is different from what most us actually use). That's what I said: "If iterators had to be reusable, then some types of data structures couldn't have iterators at all."
This isn't just a Python thing. Back in the 90s, the C++ STL (which wasn't even originally written in C++ but in Ada) used iterator terminology, and it defined classes of iterators based on capability. Input, Output, and Forward iterators were not re-useable, but "Bidirectional" and "Random" iterators were.
Input and Output iterators are not exhausted by walking them. They are still valid iterators. Other users of those iterators may observe different contents, but they will observe something.
I think that is very different from the generator in python which once walked by one thread is forever empty.
Ultimately it just isn't necessary. It would be easy enough for generators expressions to be implemented as something that returns a new invocation of the iteration when iter is called on them. This would be much less surprising to most novice programmers.
Input and Output iterators are not exhausted by walking them.
If you walk an istream_iterator to EOF, it's exhausted.
Other users of those iterators may observe different contents
Yes, that's what we're talking about. That is the behavior that you have said, repeatedly, should not be allowed for something called an "iterator". So I'm not sure what point you think you're making now.
Sorry, I wrote "iterator" when I meant to write "iterable." I hadn't had my coffee yet.
As I have been saying the python generators are not "iterable." I don't object to an iterator that reaches its end, that is expected of an iterator, the question is if the iterator itself is an iterable thing. I don't think it is.
A file is an iterable thing, when you open the file you get an iterator on that file. You can exhaust the iterator, that is fine and expected (when you iterate across a list you eventually reach the end), but the thing you iterated, the file or list respectively, still exists and is still iterable.
Yes, you could lose access, or some other thread/process could remove all the members of a list, or any number of things could happen, but they aren't caused by the act of you iterating the iterable.
Python generators, and data piped in from standard in, are unique in that it is the act of iterating across the thing itself which causes it to empty. That isn't a behavior that I think is appropriate for an iterable, and I don't think of those things as truly being iterable.
Programming concepts don't have to be perfectly followed, and I'm not such a perfectionist to suggest that reading from standard in should have some distinct syntax to mark it as being a "single-use iterator," but I do think that should be the exception not the norm. I think it is a mistake in the python language to have generators return self when iter is called on them. The vast majority of programmers would expect such a call to restart the execution of the closure from the beginning, not to pick up in the middle.
Python generators are primarily iterators. They are also iterables because all iterators are, for ergonomic reasons, but their iterator is themselves. In your example, thing and iter(thing) are literally the same object.
thing = (x for x in range(10))
print(thing is iter(thing))
# True
So it is not that an iterator is modifying an underlying collection or anything like that. Although it is not uncommon to have an iterator that consumes another one (in fact, technically, thing is consuming an iterator of range(10)).
You got a pointer to an object, operated on it, and therefore operated on the object. I'm not sure why that's problematic. If you need it to be static, turn it into a list or something similar beforehand.
If you're handing off an iterator to some other code and you don't know what's going to happen to it, then you shouldn't depend on it. If you need it to be static, then put it in a list or similar.
Iterators are typically iterators for a good reason. Implicitly making them static and allocating memory for every possible item would be a terrible idea in many cases.
I really don't think this is a problem. They're a tool, use them in the right way. IMO the only issue is that the two words sound very similar and that can be confusing.
In C terminology I got a pointer to a list, and it's length, and then calculated the location of all the elements in the list.... And the list disappeared?
Just, the car would be driveable. However it would be driveable just once... Just like iterators are iterables that can only be iterated over once. But being able to iterate over something multiple times is not required for an iterable.
In python things are iterable if they implement __iter__ which gives you and object that implements __next__. And iterators do that.
Your mental model of your own code is incorrect, as demonstrated:
iterable = range(10) # stateless
iterator = (x for x in range(10)) # maintains state
for _ in iter(iterator):
pass
print(len(list(thing))) # 0, b/c state was modified
print(len(list(iterable))) # iterable is still the same & unmodified
It is not a mental model issue. Python generator expressions return self when __iter__ is called.
Any function that responds to iter is "iterable" in the basic sense of the english language. You can call iter and get an iterator from it, therefore you can iterate it.
Obviously one is not going to write (x for x in range(10)) and this is just short-hand for some more complex statement like (compute(foo) for foo in something if condition(foo)) where you definitely cannot just drop in the list-like something.
While a (non-iterator) iterable (or more specifically a sequence) can be consumed many times.
The caveat is critical, because the sentence is otherwise just wrong, there is no guarantee that an iterable is repeatable in the general case e.g. files are iterable.
The key insight is that iterators are iterables - by means of returning themselves on __iter__. The opposite is, in general, not true. The nice thing is that for most purposes wherever an iterable is expected (such as in a for loop) you can use an iterable or an iterator.
So, for example, if you need to process the elements of a list in a loop but want to skip the first element in the list, you can do the ugly thing which is to enumerate it and check the index is not zero on every iteration. Or you can make an iterator from the list with iter, call next on it once and then loop through the rest of the iterator. But the distinction between the original iterable (the list) and the iterator is important. For example, typically the iterator of a collection will raise an exception or work incorrectly if the underlying collection has been modified mid iteration.
Well, the key point is that Iterable and Iterator are different things in Python, there just are various things in python that are Iterable while not being Iterator.
r = range(0, 9)
isinstance(r, collections.abc.Iterator)
False
isinstance(r, collections.abc.Iterable)
True
Funny enough, the names Iterator and Iterable also identify distinct things in Java too - though not the exact same as the Python case. Something to bear in mind coming to Python from Java or vice-versa.
Well, in general terms. But worth noting in context Python gets especially weird about things, because duck typing, historical python-specific idiosyncratic meanings of "abstract base class" etc. As the Python documentation says "The only reliable way to determine whether an object is iterable is to call iter(obj).".
So you can also easily construct iterables that also aren't instances of Iterable. Fun for all the family.
class Blah(object):
def __init__(self):
pass
def __getitem__(self, key):
if key == 0:
return "blorp"
else:
raise IndexError(key)
def __len__(self):
return 1
b = Blah()
isinstance(b, collections.abc.Sequence)
False
isinstance(b, collections.abc.Iterable)
False
for i in b:
print(i)
blorp
foreach is at least a different keyword from for and it makes it a bit clearer exactly what is going on.
In a traditional c-style for loop one is directly tracking the index. In other words you build your own iterator by declaring i and incrementing it.
In a foreach loop you are asking the object/compiler to do that work for you. In principle foreach would allow the object to be indexed without using a sequence of integers as the keys.
note Python in particular simply has no c-style for though, its for is akin to some other languages' foreach, the subject of the original article (range) is used in the normal python idiom
for i in range(0, 10):
print(i)
Note range() is lazy (in modern python, once upon a time range() was eager and xrange() was lazy), this is not especially inefficient.
So long as you're not trying to learn Python based on analogy from some other language you already know, it's fine ;-)
The problem of course is that everyone learns every language by analogy. So lots of people come to python from other languages and get confused by the behavior of generator expressions and for.
If python had used foreach things might be a little clearer (and then perhaps they could reserve for with an explicit call to iter for other cases.
foreach x in thing: # allowed and sugar for
for x in iter(thing):
Then generators could potentially distinguish between themselves and the iteration of themselves.
When using iterables, it is usually not necessary to call iter() or deal with iterator objects yourself. The for statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop.
Then generators could potentially distinguish between themselves and the iteration of themselves
If you want a new pass, can always just get a new generator instance anyway, like that's exactly what a new call to your generator-function gives you anyway. it's just not a big deal.
def foo(up): yield from (x for x in range(0, up) if x % 3 == 0)
a = foo(20)
for i in a:
print(i)
b = foo(30)
for i in b:
print(b)
of course beware trying to use shortcut generator expression syntax overly nontrivially, you're expected to switch to named generator-functions for anything complicated, stylistically, and the threshold should be pretty low.
For more complex applications, full generator definitions are always superior in terms of being obvious about scope, lifetime, and binding - PEP289
Think a bit like the way Python lambda has also been deliberately constrained. Python tends to stylistically favor giving names to even the little things.
I don't necessarily like that personally even, I use a lot more anonymous stuff in Lisp, but in a work context best not fight basic idioms of the language you're using.
technically it doesn't even care for __len__, I guess I just added that out of habit. The protocol that signals end of iteration in the example is the raising of the IndexError, that the for is then defined to swallow.
Eeeh, it doesn't really matter if an interface is implemented in any static nominative typing Java-like sense though.
Note in the python case a thing is iterable when at runtimeiter() can return an iterator for it, and that really only depends on a __getitem__() existing (it's better to implement __iter__(), but for reasons python has fallback to __getitem__() ) at runtime.
I'm not saying this is wrong, but it is different to what some people may be used to.
Underneath, Python is very much not doing what programmers coming other more static languages often seem to think it is, it is usually far more dynamic. That can have significant performance implications of course but e.g. the below also works fine (probably best not do it in your code without exceptionally good reason, just illustrative).
class Foo:
pass
f1 = Foo()
for i in f1:
print(i)
TypeError: 'Foo' object is not iterable
f2 = Foo()
def wangle(obj):
def my_get_item(self, key):
match key:
case 0: return "blip"
case 1: return "bloop"
case _: raise IndexError(key)
# yes you can change an existing instance's class at runtime
obj.__class__ = type(
'Wangled', (obj.__class__,), {'__getitem__': my_get_item}
)
wangle(f2)
for i in f2:
print(i)
blip
bloop
This is almost universally true of things related to the type-system. Python's type hierarchy is trash.
Obviously there are some downsides of the Java type system and it can be a bit excessive, but at least it follows the english language.
An iterable is something that is "able to be iterated". An iterator is something that does the walking. So what you should do is construct an iterator from an iterable, by calling iter on it. But in python the iter becomes syntactic sugar and leads to some ambiguity, particularly around generator expressions.
What is the difference between:
list_thing = [compute(v) for v in range(1000)]
gen_thing = (compute(v) for v in range(1000))
In principle very little. We are expressing a preference between storing list_thing in memory, and computing gen_thing as needed, but there is no fundamental reason for them to behave differently... but they do:
def run(thing):
for _ in thing:
if condition(_):
break
for _ in thing:
frobnicate(_)
Will do very different things when given list_thing vs gen_thing. If you want a list_thing to behave like a gen_thing you can do so by calling iter on it, but to make a get_thing behave like a list either requires lots of work with itertools, or loading the whole thing into memory by converting to a true list.
Maybe you can't hold all that stuff in memory? Maybe it is easy to compute, but too large to store in memory.
You end up having to do things like: thing = lambda: (compute(x) for x in ...) and then calling it as for y in thing().
Sure you can work around it, but this is something that commonly confuses python newbies, and it is all because generators return self in their __iter__ method so as to support some syntactic sugar in for loops.
I'm not sure that the convenience of the for loop outweighs the type confusion.
Why wouldn’t generators return themselves if you want an iterator? Generators are iterators. Seems pretty logical to me. I’m sorry but if you’re getting confused because a language provides different mechanisms to do that allow you to efficiently implement a common need (like iteration) that appears in circumstances with different design constraints, that sounds more like a “you” problem rather than a problem with the language.
I guess the alternative is that __iter__ could return a new iterator instead of the existing one, in effect resetting the iteration (allowing you to use for on it multiple times). But in that case why wouldn't you just define a generator function and call it each time as needed instead?
Storing everything in memory is impossible due to size
Random access is not allowed.
No restriction or reason it could not be traversed multiple times.
But for the last requirement this thing is perfect for a generator expression. However that last requirement means you can't really use a generator expression.
This is a confusing section of documentation, mostly because the title is misleading. The first part talks about the one method that iterables need, __iter__, which must return an iterator. The next part talks about the actual iterator type objects.
It seems like this documentation section should either be split into two sections (iterable types and iterator types) or it should be renamed to "Iterable Types".
Don’t care. I have a CS degree (not a boot camper) and use statically typed languages for things that matter and python for throw away scripts like it was designed for.
There's a huge difference between a list and an iter(list). You can loop over the former multiple times and the later only once. Not understanding this can cause very confusing bugs.
Sure, and a programmer should be aware of this difference for the 0.1% of times when it matters when we talk about iterators. Your example is pretty basic and something I don't think veteran programmers get wrong very often.
How about by not calling iter() when there's no need to? Like the author of this blog, I feel like you're trying to make it sound more important than it is.
Maybe you're just learning so I'll excuse your confusion, but 99.9% of the time, you use the results of these functions to drive a loop and there's no need to delve further into the fine details.
for i, obj in enumerate(somelist):
...
It seems very important to you that everyone be super-aware of the remaining 0.1% and at the end of the day, you got to do you.
But this article and your arguments supporting its critical importance to me are very close to ridiculous. And that's just me. You're free to disagree and downvote but it doesn't make me wronger.
This article is clearly for those who do work with Python outside of making the trivial scripts that you're used to. It's not a finer detail but a basic explanation of Python objects. It doesn't even go into __iter__ and __next__.
No, I've been programming Python since v 1.5.2 and I implemented one of the lazy functions that we would now call iterators.
I'm not sure why you think it is unusual to assign the results of e.g. zip to a variable.
This is the sample code describing zip from the python.org documentation:
>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> zipped = zip(x, y)
>>> list(zipped)
[(1, 4), (2, 5), (3, 6)]
>>> x2, y2 = zip(*zip(x, y))
>>> x == list(x2) and y == list(y2)
True
I guess the people who wrote that code are also "new to Python".
One important reason that even someone extremely new to Python needs to know the difference is because if you assign the value to a variable for debugging, and then you print the value out, you will break your program because now the iterator is exhausted before it gets to your for-loop.
Unlike other programming languages, Python muddies the waters here by saying that iterators are iterable across themselves. That is the issue here. Iterators are iterable but iterables are not iterators, and iterating across an iterable uses the iterable itself (usually).
Noxitu@reddit
I learned something from this article, but I feel some of readers might end up confused in a different way after reading it.
This statement is my main issue. While factualy correct, objects returned by these functions are both iterators and iterables. And in vast majority of cases they are used as iterables - in fact direct use of iterators is very rare in Python. The fact that they are iterators at all feels a bit like an implementation detail that you shouldn't rely on.
JanEric1@reddit
But you definitely need to be aware that they are iterators, because you can consume an iterator only once. While a (non-iterator) iterable can be consumed many times.
PeaSlight6601@reddit
I think you can (and should) argue that python iterators are NOT iterable.
We all agree that:
thing = (x for x in range(10))
should have 10 members right?So why does this print 0:
How can walking down the iterator of an iterable modify the iterable?
theferrit32@reddit
It's wild to me that this code block is completely different if you change it to
I don't think I like that.
pojska@reddit
It's not that suprising. Different brackets mean different things in python.
thing = {x for x in range(10)}
yields a set, andthing = {x: true for x in range(10)}
yields a dict.theferrit32@reddit
Yes but the expressions
(1, 2, 3)
and[1, 2, 3]
both construct sequential collections in Python. One being a tuple and one being a list. I expected(x for x in range(10))
to result in a tuple.range(10)
is a generator and the syntaxx for x in thing
, normally, when it appears in code (since most instances I've seen are list or dict or set comprehension), iterates through whateverthing
is, creating another realized collection.pojska@reddit
That's fair, it's certainly not obvious that it would create a generator. A tuple would make sense.
ProfessorFakas@reddit
You... don't like that different syntax... does a different thing? Do you want one to be a duplicate of the other?
theferrit32@reddit
I wouldn't expect it to do the exact same thing, I expected
(x for x in range(10))
to construct a tuple like(1, 2, 3)
does, analogous to[x for x in range(10)]
constructing a list like[1, 2, 3]
does.PeaSlight6601@reddit
They don't have to be duplicates. One could still have square brackets compute and store in memory and parens be lazy and compute at run time.
curien@reddit
Because that's just how some data structures work. If you "walk through" a traditional stack or queue, you end up with an empty container. If iterators had to be reusable, then some types of data structures can't be iterated at all.
PeaSlight6601@reddit
I can walk through stacks and queues all day, but they will still exist after I walk them.
The act of merely calculating the location of the elements of the stack/queue doesn't pop/unlink them.
curien@reddit
They'll exist, but with nothing in them.
In a traditional stack, for example, you have two or three operations: push, pop, and sometimes top. If a stack has more than one element, you cannot get to its second element without removing the top one.
PeaSlight6601@reddit
If your stack had no way to walk it but popping you probably wouldn't make it iterable. A true stack like that is conceptually very different from iterables.
curien@reddit
Only if we use your definition of iterable (which is different from what most us actually use). That's what I said: "If iterators had to be reusable, then some types of data structures couldn't have iterators at all."
This isn't just a Python thing. Back in the 90s, the C++ STL (which wasn't even originally written in C++ but in Ada) used iterator terminology, and it defined classes of iterators based on capability. Input, Output, and Forward iterators were not re-useable, but "Bidirectional" and "Random" iterators were.
PeaSlight6601@reddit
Input and Output iterators are not exhausted by walking them. They are still valid iterators. Other users of those iterators may observe different contents, but they will observe something.
I think that is very different from the generator in python which once walked by one thread is forever empty.
Ultimately it just isn't necessary. It would be easy enough for generators expressions to be implemented as something that returns a new invocation of the iteration when
iter
is called on them. This would be much less surprising to most novice programmers.curien@reddit
If you walk an istream_iterator to EOF, it's exhausted.
Yes, that's what we're talking about. That is the behavior that you have said, repeatedly, should not be allowed for something called an "iterator". So I'm not sure what point you think you're making now.
PeaSlight6601@reddit
Sorry, I wrote "iterator" when I meant to write "iterable." I hadn't had my coffee yet.
As I have been saying the python generators are not "iterable." I don't object to an iterator that reaches its end, that is expected of an iterator, the question is if the iterator itself is an iterable thing. I don't think it is.
A file is an iterable thing, when you open the file you get an iterator on that file. You can exhaust the iterator, that is fine and expected (when you iterate across a list you eventually reach the end), but the thing you iterated, the file or list respectively, still exists and is still iterable.
You cannot do that with a generator.
curien@reddit
If the file is stdin piped from another program, you can try, but it'll still be at EOF.
Even for a normal file, you could lose access.
If the file is something like /dev/urandom, you can try going back to the beginning, but you won't get the same data.
PeaSlight6601@reddit
Yes, you could lose access, or some other thread/process could remove all the members of a list, or any number of things could happen, but they aren't caused by the act of you iterating the iterable.
Python generators, and data piped in from standard in, are unique in that it is the act of iterating across the thing itself which causes it to empty. That isn't a behavior that I think is appropriate for an iterable, and I don't think of those things as truly being iterable.
Programming concepts don't have to be perfectly followed, and I'm not such a perfectionist to suggest that reading from standard in should have some distinct syntax to mark it as being a "single-use iterator," but I do think that should be the exception not the norm. I think it is a mistake in the python language to have generators return
self
wheniter
is called on them. The vast majority of programmers would expect such a call to restart the execution of the closure from the beginning, not to pick up in the middle.jdehesa@reddit
Python generators are primarily iterators. They are also iterables because all iterators are, for ergonomic reasons, but their iterator is themselves. In your example,
thing
anditer(thing)
are literally the same object.So it is not that an iterator is modifying an underlying collection or anything like that. Although it is not uncommon to have an iterator that consumes another one (in fact, technically,
thing
is consuming an iterator ofrange(10)
).PeaSlight6601@reddit
I called
iter
and walked what was returned. The original thing was modified in the process.Python is duck typed right... So if it walks like a duck...
Don't give bullshit about how it was the same object, it's not the programmers responsibility to call
id
on something after callingiter
ProfessorFakas@reddit
You got a pointer to an object, operated on it, and therefore operated on the object. I'm not sure why that's problematic. If you need it to be static, turn it into a list or something similar beforehand.
If you're handing off an iterator to some other code and you don't know what's going to happen to it, then you shouldn't depend on it. If you need it to be static, then put it in a list or similar.
Iterators are typically iterators for a good reason. Implicitly making them static and allocating memory for every possible item would be a terrible idea in many cases.
I really don't think this is a problem. They're a tool, use them in the right way. IMO the only issue is that the two words sound very similar and that can be confusing.
PeaSlight6601@reddit
I didn't do anything to the object.
In C terminology I got a pointer to a list, and it's length, and then calculated the location of all the elements in the list.... And the list disappeared?
JanEric1@reddit
Just, the car would be driveable. However it would be driveable just once... Just like iterators are iterables that can only be iterated over once. But being able to iterate over something multiple times is not required for an iterable.
In python things are iterable if they implement
__iter__
which gives you and object that implements__next__
. And iterators do that.PeaSlight6601@reddit
I disagree. The act of iterating on an iterable should NOT change the iterable.
pojska@reddit
Too bad for you, I suppose.
Kache@reddit
Iterators maintain the state of walking through an iterable. They don't "modify" the iterable at all.
PeaSlight6601@reddit
Funny, because I just gave you code where walking the iterator did modify the "iterable" object.
Kache@reddit
Your mental model of your own code is incorrect, as demonstrated:
PeaSlight6601@reddit
It is not a mental model issue. Python generator expressions return
self
when__iter__
is called.Any function that responds to
iter
is "iterable" in the basic sense of the english language. You can calliter
and get an iterator from it, therefore you can iterate it.Obviously one is not going to write
(x for x in range(10))
and this is just short-hand for some more complex statement like(compute(foo) for foo in something if condition(foo))
where you definitely cannot just drop in the list-likesomething.
masklinn@reddit
The caveat is critical, because the sentence is otherwise just wrong, there is no guarantee that an iterable is repeatable in the general case e.g. files are iterable.
Mysterious-Rent7233@reddit
All iterators are iterables, so your statement applies to all of them.
jdehesa@reddit
The key insight is that iterators are iterables - by means of returning themselves on
__iter__
. The opposite is, in general, not true. The nice thing is that for most purposes wherever an iterable is expected (such as in afor
loop) you can use an iterable or an iterator.So, for example, if you need to process the elements of a list in a loop but want to skip the first element in the list, you can do the ugly thing which is to
enumerate
it and check the index is not zero on every iteration. Or you can make an iterator from the list withiter
, callnext
on it once and then loop through the rest of the iterator. But the distinction between the original iterable (the list) and the iterator is important. For example, typically the iterator of a collection will raise an exception or work incorrectly if the underlying collection has been modified mid iteration.zjm555@reddit
The article goes out of its way to explain that all iterators are themselves iterables (by virtue of returning themselves via
__iter__
).jdehesa@reddit
In fact, it is a
Sequence
.BleakBeaches@reddit
Which itself is an iterable
lood9phee2Ri@reddit
Well, the key point is that Iterable and Iterator are different things in Python, there just are various things in python that are Iterable while not being Iterator.
Funny enough, the names Iterator and Iterable also identify distinct things in Java too - though not the exact same as the Python case. Something to bear in mind coming to Python from Java or vice-versa.
BleakBeaches@reddit
Right, iterators ingest and/or operate on iterables. This is a common object model across most OOP languages no?
lood9phee2Ri@reddit
Well, in general terms. But worth noting in context Python gets especially weird about things, because duck typing, historical python-specific idiosyncratic meanings of "abstract base class" etc. As the Python documentation says "The only reliable way to determine whether an object is iterable is to call iter(obj).".
So you can also easily construct iterables that also aren't instances of Iterable. Fun for all the family.
Intradesting.
BleakBeaches@reddit
So kinda like manually implementing an Interface by declaring the function overrides directly in the class?
yanitrix@reddit
the compiler doesn't care for the type, it just cares for methods
__getitem__
and__len__
when it comes tofor
loop.Fun fact:
foreach
andasync/await
works the same way in c#PeaSlight6601@reddit
foreach
is at least a different keyword fromfor
and it makes it a bit clearer exactly what is going on.In a traditional c-style
for
loop one is directly tracking the index. In other words you build your own iterator by declaringi
and incrementing it.In a
foreach
loop you are asking the object/compiler to do that work for you. In principleforeach
would allow the object to be indexed without using a sequence of integers as the keys.lood9phee2Ri@reddit
note Python in particular simply has no c-style
for
though, itsfor
is akin to some other languages'foreach
, the subject of the original article (range
) is used in the normal python idiomfor i in range(0, 10): print(i)
Note
range()
is lazy (in modern python, once upon a timerange()
was eager andxrange()
was lazy), this is not especially inefficient.So long as you're not trying to learn Python based on analogy from some other language you already know, it's fine ;-)
PeaSlight6601@reddit
Agreed.
The problem of course is that everyone learns every language by analogy. So lots of people come to python from other languages and get confused by the behavior of generator expressions and
for.
If python had used
foreach
things might be a little clearer (and then perhaps they could reservefor
with an explicit call toiter
for other cases.Then generators could potentially distinguish between themselves and the iteration of themselves.
lood9phee2Ri@reddit
Meh. Just the way it is in python, what amounts to implicit iter() is defined to always be there.
https://docs.python.org/3/glossary.html#term-iterable
If you want a new pass, can always just get a new generator instance anyway, like that's exactly what a new call to your generator-function gives you anyway. it's just not a big deal.
of course beware trying to use shortcut generator expression syntax overly nontrivially, you're expected to switch to named generator-functions for anything complicated, stylistically, and the threshold should be pretty low.
Think a bit like the way Python lambda has also been deliberately constrained. Python tends to stylistically favor giving names to even the little things.
I don't necessarily like that personally even, I use a lot more anonymous stuff in Lisp, but in a work context best not fight basic idioms of the language you're using.
lood9phee2Ri@reddit
technically it doesn't even care for
__len__
, I guess I just added that out of habit. The protocol that signals end of iteration in the example is the raising of theIndexError
, that thefor
is then defined to swallow.https://docs.python.org/3/reference/datamodel.html#object.getitem
If implementing an iterator properly instead of the
__getitem__()
fallback you should also useStopIteration
notIndexError
.lood9phee2Ri@reddit
Eeeh, it doesn't really matter if an interface is implemented in any static nominative typing Java-like sense though.
Note in the python case a thing is iterable when at runtime
iter()
can return an iterator for it, and that really only depends on a__getitem__()
existing (it's better to implement__iter__()
, but for reasons python has fallback to__getitem__()
) at runtime.I'm not saying this is wrong, but it is different to what some people may be used to.
Underneath, Python is very much not doing what programmers coming other more static languages often seem to think it is, it is usually far more dynamic. That can have significant performance implications of course but e.g. the below also works fine (probably best not do it in your code without exceptionally good reason, just illustrative).
slaymaker1907@reddit
I’d say it’s more that iteratable objects produce iterators.
PeaSlight6601@reddit
Java is correct. Python is wrong.
This is almost universally true of things related to the type-system. Python's type hierarchy is trash.
Obviously there are some downsides of the Java type system and it can be a bit excessive, but at least it follows the english language.
An iterable is something that is "able to be iterated". An iterator is something that does the walking. So what you should do is construct an iterator from an iterable, by calling
iter
on it. But in python theiter
becomes syntactic sugar and leads to some ambiguity, particularly around generator expressions.What is the difference between:
In principle very little. We are expressing a preference between storing
list_thing
in memory, and computinggen_thing
as needed, but there is no fundamental reason for them to behave differently... but they do:Will do very different things when given
list_thing
vsgen_thing
. If you want alist_thing
to behave like agen_thing
you can do so by callingiter
on it, but to make aget_thing
behave like alist
either requires lots of work with itertools, or loading the whole thing into memory by converting to a true list.backfire10z@reddit
To require a gen thing to work like a list thing you just call
list(gen_thing)
PeaSlight6601@reddit
Maybe you can't hold all that stuff in memory? Maybe it is easy to compute, but too large to store in memory.
You end up having to do things like:
thing = lambda: (compute(x) for x in ...)
and then calling it asfor y in thing()
.Sure you can work around it, but this is something that commonly confuses python newbies, and it is all because generators return
self
in their__iter__
method so as to support some syntactic sugar infor
loops.I'm not sure that the convenience of the
for
loop outweighs the type confusion.abotoe@reddit
Why wouldn’t generators return themselves if you want an iterator? Generators are iterators. Seems pretty logical to me. I’m sorry but if you’re getting confused because a language provides different mechanisms to do that allow you to efficiently implement a common need (like iteration) that appears in circumstances with different design constraints, that sounds more like a “you” problem rather than a problem with the language.
NoInkling@reddit
I guess the alternative is that
__iter__
could return a new iterator instead of the existing one, in effect resetting the iteration (allowing you to usefor
on it multiple times). But in that case why wouldn't you just define a generator function and call it each time as needed instead?backfire10z@reddit
Then how would you use a list? This is my point.
PeaSlight6601@reddit
You can't. That is the point.
I have a thing that is:
But for the last requirement this thing is perfect for a generator expression. However that last requirement means you can't really use a generator expression.
lood9phee2Ri@reddit
Can also use a generator to generate generators. ;-)
gs = gengen(3)
or there's
gs = iter(lambda: (x for x in range(0, 3)), None)
That's slightly different insofar as the latter is technically not a generator, but you can then do
gs = (y for y in iter(lambda: (x for x in range(0, 3)), None))
if you really just gotta make it a generator generator.
Anyway, then you can do
Shrug.
pojska@reddit
```python
for i in create_gen_thing():
print(i)
for j in create_gen_thing():
print(j)
```
backfire10z@reddit
Sorry I made some edits to my comment not sure if you saw, I will ask again: is there something in Java that does this better? What would you suggest?
Seems like a fine option to me, but I’m not that experienced.
netfeed@reddit
if it quacks like a duck and walks like a duck?
smelly-dorothy@reddit
A sequence is an iterable that implements
__getitem__
and__len__
. Two of the iterator types are sequences and generators.But yeah, someone made a blog article about reading the documentation..
treyhunner@reddit
This is a confusing section of documentation, mostly because the title is misleading. The first part talks about the one method that iterables need,
__iter__
, which must return an iterator. The next part talks about the actual iterator type objects.It seems like this documentation section should either be split into two sections (iterable types and iterator types) or it should be renamed to "Iterable Types".
tenken01@reddit
Don’t care. I have a CS degree (not a boot camper) and use statically typed languages for things that matter and python for throw away scripts like it was designed for.
joey_nottawa@reddit
Much of the AI work I witnessed at Google was done using Python.
tenken01@reddit
Who cares
joey_nottawa@reddit
Curious, are you a Java programmer?
nikvid@reddit
Your mom, she does a lot of work with my python.
ptousig@reddit
"do not use the information below as an excuse to be unkind to anyone"
More programming articles should include this line. We tend to be an arrogant bunch.
XtremeGoose@reddit
Not sure how you can write a whole article about iterables and iterators and not give the definitions (straight from collections.abc)
probability_of_meme@reddit
I do. We're really just talking about the difference between iterator and iterable and 99.9% of the time it's just pedantry.
Mysterious-Rent7233@reddit
There's a huge difference between a list and an iter(list). You can loop over the former multiple times and the later only once. Not understanding this can cause very confusing bugs.
probability_of_meme@reddit
Sure, and a programmer should be aware of this difference for the 0.1% of times when it matters when we talk about iterators. Your example is pretty basic and something I don't think veteran programmers get wrong very often.
Mysterious-Rent7233@reddit
Veteran programmers don't get it wrong often because they know the difference between an iterator and an iterable!
How else would you avoid this error?
probability_of_meme@reddit
How about by not calling iter() when there's no need to? Like the author of this blog, I feel like you're trying to make it sound more important than it is.
Mysterious-Rent7233@reddit
As soon as you enumerate a list you'll end up with an iterator.
probability_of_meme@reddit
Maybe you're just learning so I'll excuse your confusion, but 99.9% of the time, you use the results of these functions to drive a loop and there's no need to delve further into the fine details.
It seems very important to you that everyone be super-aware of the remaining 0.1% and at the end of the day, you got to do you.
But this article and your arguments supporting its critical importance to me are very close to ridiculous. And that's just me. You're free to disagree and downvote but it doesn't make me wronger.
LIGHTNINGBOLT23@reddit
This article is clearly for those who do work with Python outside of making the trivial scripts that you're used to. It's not a finer detail but a basic explanation of Python objects. It doesn't even go into
__iter__
and__next__
.Mysterious-Rent7233@reddit
No, I've been programming Python since v 1.5.2 and I implemented one of the lazy functions that we would now call iterators.
I'm not sure why you think it is unusual to assign the results of e.g. zip to a variable.
This is the sample code describing zip from the python.org documentation:
I guess the people who wrote that code are also "new to Python".
One important reason that even someone extremely new to Python needs to know the difference is because if you assign the value to a variable for debugging, and then you print the value out, you will break your program because now the iterator is exhausted before it gets to your for-loop.
happyscrappy@reddit
Far more than 0.1% of the time. The change to have iterators and not just lists was a huge part of why python3 couldn't run so many python2 scripts.
nekokattt@reddit
Unlike other programming languages, Python muddies the waters here by saying that iterators are iterable across themselves. That is the issue here. Iterators are iterable but iterables are not iterators, and iterating across an iterable uses the iterable itself (usually).
JanEric1@reddit
iterating across the iterator uses the iterator itself. And that is actually required according to the python docs
though obviously not enforced.
augustusalpha@reddit
Type-vangelists, assemble!!
BearBearBearUrsus@reddit
Nice article.