Why does indexing star with zero?
Posted by Fit-Camp-4572@reddit | learnprogramming | View on Reddit | 167 comments
I have stumbled upon a computational dilemma. Why does indexing start from 0 in any language? I want a solid reason for it not "Oh, that's because it's simple" Thanks
MagickMarkie@reddit
Because in order to use zero at all in computing you need to start with it.
carcigenicate@reddit
Afaik, it's because indices started as offsets from the start.
If you have an array at address 5, the first element is also at address 5. To get to the first element, you add 0 to the address of the array because you're already at the correct address.
To get to the second element, you add 1 to the address of the array, because the second element is one after the first.
Basically, it's a consequence of pointer arithmetic used to get element's address.
Academic_Broccoli670@reddit
Interesting tidbit because of this, in C
a[b]
andb[a]
are the same. This becomes even more clear if you write in pointer arithmetic:*(a + b) = *(b + a)
CamelOk7219@reddit
Also in 'virtually' two-dimensional arrays (there are no such thing in low level computing, but you can pretend it to be one using a one-demensional array and some conventions) you get the coordinates `[i, j]` by `address + (i * row_length) + j`
flatfinger@reddit
Note that given e.g. `int arr[5][3];` the Standard allows implementations to behave nonsensically if a program receives inputs that would cause an access to `arr[i][j]` for values of `j` outside the range of the array, even if `i` would be in the range 0 to 5 and `i+3*j` would be in the range 0 to 14, and gcc is designed to exploit this permission rather than follow the pre-Standard behavior which had been defined in terms of pointer arithmetic.
Ok-Dragonfruit5801@reddit
The fasted way on various old 8 bit computer to access display memory, e.g. 40x25 characters screen. And looping to multiply as there was no coprocessor or MUL instruction
jmack2424@reddit
TY sir. So many people who didn't have to program using offsets get this wrong. It's effectively carryover from assembly. BASIC and C/C++ allowed you to directly invoke ASM registers, and that's where the debate started. Higher level languages allowed you to use whatever indexes you wanted, but at ASM level, not using the 0 index could have real consequences.
Fit-Camp-4572@reddit (OP)
Can you elaborate it's intriguing
OrionsChastityBelt_@reddit
In C/C++, when you have an
int
array, sayarr
, and you access it's elements viaarr[3]
, this is really just shorthand for telling the compiler to jump 3int
sized steps from the memory location wherearr
is located and get that element. The reason why 0 is the first is literally because the first element is located exactly 0 jumps from the memory location wherearr
is stored.There is support in modern assembly languages for the bracket notation for accessing arrays now, but in older assembly languages you literally accessed array elements by doing this arithmetic manually. If you want the nth element in an array, you add n times the size of each element to the memory address of the array itself.
fractalife@reddit
Truly makes you appreciate having modern dynamically sized arrays that you don't even have to worry about allocating memory for, let alone have to commit to an array size at compile time.
BlazingFire007@reddit
Yes, I’ve implemented vectors/arraylists in languages like C before as a learning exercise (highly recommend btw).
The basics are really easy, just decide on a % where once it gets that full, you double the size.
So if the size is 10 and your ratio is 50%, when the 5th item is added you manually double the size to 20.
Then, once you do the easy part, you realize just how hard it gets. (Shrinking the size, making it even remotely fast, etc). It can get kinda complicated, at least, for me
Dismal-Cancel6791@reddit
when you say you did them as a learning exercise, was it a school assignment or is it a self learning thing? Interested in learning about that and then figure out how to solve problems.
sudomeacat@reddit
Not the same person, but you’ll usually find a simpler version an assignment in school/uni. A sample description would be (in terms of Java):
Implement an ArrayList-like class without using the built-in ArrayList or Vector. The class must be able to - construct with a given size - construct given an object (i.e. copy constructor) - add(Object o): push the object (copy or reference) to the back; resize if needed - insert(Object o, int i): insert the object (copy or reference) to position i; resize if needed - get(int i) -> Object: get the object at position i. Throw an exception if i is out of range - remove(int i) -> Object: remove the object at position i; downscaling optional - toString() -> String: return a string representation of the list, formatting up to you
A C++ version would be similar. The return type can be ints for ease, or you can use template types. I would probably include/change: - operator -> Object o: same as get - friend operator<<(ostream os, List list) -> ostream: replaces toString
getfukdup@reddit
If you really wanted to make it easy you could just use a string for the whole thing!
sudomeacat@reddit
That took me a sec to get; I thought you meant storing a list of strings haha. But storing ints as a char*/string is pretty fancy, but sounds like a bit of effort for retrieval lol
ReasonableLoss6814@reddit
Yeah. If the assignment is ints, this will only work while ints are less than 256. Then it depends on the sizeof int (64 bit vs 32 bit).
BlazingFire007@reddit
It wasn’t for a school assignment, I was just curious why arraylists in systems programming languages typically were less robust than something like JS
monster2018@reddit
Man idk how you do it in C. Or does C have operator overloading? I feel like it can’t really because it doesn’t have classes, but maybe you can do it with structs or something, idk. I’ve done it in C++ using operator overloading so that you can still access the elements with the normal [] notation.
Without operator overloading, I’m genuinely at a loss for how you would implement it (where you can use the regular [] notation for indexing) without just making your own language lol.
BlazingFire007@reddit
Good catch, it must’ve been c++, it’s been a while and I’m not too familiar with either language now
There’s also a chance I just used some awkward syntax, I honestly don’t recall
monster2018@reddit
Ah. Yea I didn’t mean to like call you out haha, I just honestly thought you did it in C and I assumed you just somehow made it work with [] indexing because you knew a lot more than me. It still could be the case for all we know. I’m honestly not sure if it’s possible or not in C, but I certainly don’t know how to do it.
BlazingFire007@reddit
I’ll see if it’s still on my laptop.
My guess is I tried it in C, realized it would be annoying, then switched to C++ :P
Gugalcrom123@reddit
See GObject, they implemented objects in C. Of course not with dot notation and operator overloading but it's still OO. Also, Java claims to be OO but doesn't have operator overloading for some reason.
FakePixieGirl@reddit
Oh, but as someone who coded in C for a couple of years...
You can do so much fun stuff with pointers. Once you get used to it, it can be quite elegant.
rocdive@reddit
I do not know if modern compilers accept this or not but previously a code written like this would compile and work correctly for C. Apparently it worked because both eventually translate to *(arr+i)
// arr is the array and i is the variable to index it
int value = i[arr] ; // i and arr are interchanged from the normal converntion.
Joeman106@reddit
This is why I love learning C. A lot of fundamental questions about the nature of computers are answered on their own as you learn. It’s quite beautiful really.
My hot take is they should teach data structures in c/c++ instead of java. I could not wrap my head around even the most basic data structures or even pointers until I started trying to implement them in C, then it all clicked. Java does too much for you even though creating them as objects is nice
RomuloPB@reddit
When I learned about more complex data structures, linked lists, graphs and so on using C it really ticked me on how damn complicated and fascinating some projects at low level can be.
braaaaaaainworms@reddit
Whether it's using brackets or not in assembly is a feature of CPU, not assembler. Some CPUs support loading value from memory using address stored at register with a fixed offset(x86, m68k, SuperH) and some don't, where you have to calculate it yourself
Alarming_Chip_5729@reddit
And the cool thing is, at least in C, you can do the reverse and do
Logical_Angle2935@reddit
which means this syntax (or something like it - it has been a while) also works:
fourth_val = *(arr+3)
.RomuloPB@reddit
When people talk about real consequences, it was about the concerns around such an explicit memory management. Index were used mostly to explicitly manage machine address and memory, some patterns dominated most index work back then and working with 0 index was just more mathematically elegant, for example: slicing, offset, range, modulo and many cyclical patterns.
What leads to the hardware, elegant solution to simpler math, and so on, smaller hardware complexity, loops counters and pointer arithmetic were less expensive in terms of performance and hardware complexity, it was the difference between a multi vs single machine instruction.
lateratnight_@reddit
If you get further into c++ you should look at some assembly don’t let it scare you but it makes so many things make sense.
If you had an array of three integers, 3, 5, 8:
Array could start at 0x1000 The size of an integer is four bytes, the first one would be located at 0x1000, then 0x1004, then 0x1008, etc…
SpaceCorvette@reddit
Imagine you have a string of beads of different colors laying on the table, and 4 of them in a row are yellow,. You have a silly plastic pointing finger on a stick sitting on the table, pointing at the first yellow bead. How many times do you have to move the pointer to the right to get to the first element? 0 times. That's your array index to the first bead in the array.
am_Snowie@reddit
You start with zero, cuz the actual formula is element = base_address + size_of_the_element * index.
QFGTrialByFire@reddit
I guess you could then say its a carry over from opcodes .. and who knows maybe even from transistors. I mean say you had a 2 transistor computer why wouldn't you use state 00 it'd be a waste not to.
SeeTigerLearn@reddit
Mainframe Assembler taught by an old TI engineer in Dallas was one of the best classes I ever took. One of the most fundamental things I learned was “because it’s wired that way.”
Tuepflischiiser@reddit
This. For languages close to the machine or derivatives thereof.
And Fortran is 1-based. Because it was designed for scientists used to count indices from 1.
Less-Waltz-4086@reddit
and because it is simple ;)
Critical_Pin@reddit
Centuries are also counted this way (starting at zero) - 1900s are the 20th century. This is more about thinking of centuries as buckets of years.
Fit-Camp-4572@reddit (OP)
Thanks you're a lifesaver.
Spite_account@reddit
In the old days an array would be identified by the address value of its first element woth the promis that each element are equally distant and consecutive in memory. So to get the next element you would go
Start + element size for element 2 Start + element size x 2 for element 3
To generalise
Start + n × element size
To get the first element you set n=0.
Eventually programing languages created the short hand
Variable name[n] = start + n x element size where n=0 gives you the first element.
Particular_Camel_631@reddit
It’s a convention. Lots of languages used to start indexing at 1, but people stopped using them so much. Now everyone is used to them starting at zero.
Also, the compiler had to do some work, subtracting 1 from the index before multiplying by the size of the object to get the address.
xnachtmahrx@reddit
Damn, it is always because of these darn pointers!
rocqua@reddit
This is counting 'how many items from the beginning is this'.
It turns out that that, instead of "the how manyth item us this" is a lot more natural. This way, you need many fewer +1 or -1 expressions.
durmiun@reddit
It’s because arrays (at least in most older languages) are an implementation of a mathematical function. An array consists: the variable name (a pointer to a location in memory), the Type of data that array contained (which tells the system how large each block of memory an item in the array needs), and then the index, which tells the system how many steps from the origin location we need to travel to find our target item.
Effectively, it is listing where we start, how big our steps are, and how many steps we need to take to find each item. If you define an array of 16-but ints, and we imagine the computer helpfully gives us memory address 100 to start with… the first item in the array (index 0) is located at 100 + (0 * 16) = 100. The second item (index 1) is located at 100 + (1 * 16) = 116. The third item (index 2) is located at 100 + (2 * 16) = 132.
This is also why indexing out-of-bounds is so dangerous if not protected against. When you create the array in a language like c++, you tell the compiler how big each item in the array is, and also how many items the array can hold. When the program starts, the system allocates that much memory to your app as sequential blocks, but the OS doesn’t guarantee that all of the other memory needed by your application is in sequential blocks throughout the system. So if you tried to access a 4th item in the earlier example, you would move past the end of your array into memory potentially in use by another application.
cosmin10834@reddit
because an array is just a pointer pointing so if you dereference it you get the element at that location (the first in the array) if you want the next its pointer+1 (the second element) and if you want the nth one its pointer + (n-1) since the first one is always at pointer adress. Why like this? its super fast to retrive an element at the n th position, you just add the base + offset and that the location pf your element. If you instead assume the first element beeing at base+1 then you will use a byte (or more depending on the data type) and do nothing with it (them)
robkaper@reddit
Because all zeroes is simply the lowest value in any (unsigned) data store:
0000, 0001, 0010, 0011, etcetera. (Binary is just the example, this works for trinary, decimal etc etc as well.)
Not using that value is a waste of resources, which mattered a lot in the earlier days of computing.
In similar fashion: for the first year of your life your age is 0, in the 24-hour clock the first hour is 00:xx (and in Japan am/pm is occasionally 0-11 instead of 12 and then 1-11).
South-Tip-4019@reddit
It many languages it might be arbitrary and chosen out of convention, Matlab for example uses base-1 indexing. Why many languages use base-0 convention I think has to do with pointer/index indentity Ie ‘adrr===(adrr+0)===adrr[0]’ Using base 1 indexing would make the two types of element access needlessly different ie ‘adrr===(adrr+0)===adrr[1]’
Mission_Spinach_7429@reddit
I like to see it as the same reason the distance between two cities start at mile zero. You have to travel a mileto get to the first milestone.
schungx@reddit
That's because in most CPUs the addressing mode expects a base address plus an offset.
zzmgck@reddit
There are 10 types of people
Mission-Landscape-17@reddit
An array is just a continous block of memory starting at some address. The index is really an offset into that block. So the first item is a. Index 0 because it starts at that spot in memory. Other items can be found directly by taking the array address and adding the index multiplied by the size of the data type in the array.
essential61@reddit
xpath begs to differ ;-}
tillemetry@reddit
Depends on the language. Fortran arrays start at 1.
Phoenixon777@reddit
It looks like most answers here are talking about programming-specific reasons, but here are examples where even non-programmers, and you too, 'naturally' start with zero:
When someone is born, they are 0 years old. Their "first" year of life all takes place while they are '0' years old. Interestingly, there are some cultures that start this indexing from 1, e.g. in traditional chinese age counting, a baby is 1 when they are born. Even then though, you can generalize this to other time periods. A person's first 'decade' of life all takes place while they are 0 decades old. This is the same reason why we are living in the "21st" century even though the year begins with "20" and not "21". (Although note there's some annoying aspects of the definition of this type of 'century').
In many buildings throughout the world, the "1" floor of the building is the one above the ground floor. More rarely, although I've seen it, the ground floor may even be labelled the '0' floor. I suspect this probably has other reasoning behind it, but it's at least tangentially related. Here's some simple reasoning for why counting floors like this works and might even help you to see what's "nice" about zero indexing in the first place. The ground floor is "0" floors above the ground. The second floor (labelled 1) is 1 floor above the ground. And so on, the nth floor is labelled n-1 and it is n-1 floors above the ground.
(Side note: This "number of floors offset from the ground" idea is how arrays are implemented in C and many other programming languages. The first element has offset 0 to the 'start' of the array, the second has offset 1, and so on. So the reasoning and math lines up exactly with this floor offset stuff).
Here is some mathematical reasoning for why such indexing is nice. Let's say you have 100 people and you want to split them into groups of 10 each. You could label them 1 to 100 and then split up the groups so that people labelled 1 through 10 are in the first group, 11 through 20 in the second, and so on. However, there is a nice property that you are almost able to exploit here... What if everyone in the first group has a "0" as their tens digit, everyone in the second has a "1" in their tens digit, and so on. But this isn't the case because the first group has the person labelled 10, the second has the person labelled 20, an so on. You could get this nice labelling if you instead labelled from 0 to 99, so the first group is people labelled 0 through 9, second is 10 through 19, and so on.
It might seem like the example above is contrived (and it does work 'extra nicely' cuz I chose 100 and we use a base 10 numbering system), but you can generalize it as follows. Say you have n people (and n is divisible by p) and you want to split them into p groups. Say that n = p * q, so that each group q people in it. Then, if you label these people from 0 to n-1, you could ask each person labelled i to find the result of i / q (truncated), and that gives them "group index" they are in. So group 0 would be for people that are labelled 0 through q-1, group 1 would be for people labelled q through 2*q -1, and so on. We wouldn't get this nice scheme if we labelled our people from 1 to n (in fact, we would then have use the equation (i-1) / q, which is effectively re-labelling our people with zero indexing!) Another interesting thing to note here is that not only does this setup work nicely with zero indexing, but it also naturally results in a zero-indexed group numbering system.
The above example is related to why, when working in modular arithmetic, let's say the integers mod N, the 'canonical' form of the elements is usually considered to be from 0 through N-1. When you start to learn more algorithms, you'll see that many algorithms will work nicer or the algebra may be neater if we use zero indexing. (Note that there definitely are algorithms which work nicer with 1-indexing too, so this is more anecdotal than anything, but I think it'll still give you a feeling for why zero indexing is nice). The last example also relates to why using half open intervals i.e. [0, N), is such a common paradigm in programming (for example, a python range includes the 'start' but excludes the 'stop'), and the 'niceness' of using half-open intervals (which may also seem strange at first) is somewhat related to the 'niceness' of using zero indexing.
I'm sure there's more such examples, but hopefully this answers your question in a more broad sense, and you see that 'indexing by zero' is not just limited to programming, and, perhaps unintuitively, feels more 'natural' when you think about it.
andrew-mcg@reddit
In Britain, the floor at ground level is the "Ground Floor" and the one above that is the "First floor". It wouldn't historically have been the "zeroth" floor -- typically a label or elevator button would show "G", though you do see "0" more recently. (Similarly a basement might be "B", or sometimes more recently "-1").
On the real subject, there are pros and cons to 1 or 0 indexing. Most widely used languages today live in an ecosystem based on C, so C's 0-base predominates. (i.e. if you call C libraries, even from something exotic, it would be an extra problem if the array conventions were different).
y-c-c@reddit
Thank you. All these comments about memorial offsets are missing the point and why so many programming languages (which is, most of them) use 0-indexing, with similar patterns used in mathematics all the time.
Python for example really takes advantage of this and have indexing wrap around when you do
somearray[-1]
. Can’t do that with 1-indexing.1vader@reddit
I don't think Python's backwards indexing is a good argument for 0-based indexing. You can see it as wrapping around but that's rarely how you actually want to use it. It's usually rather annoying that 0 is the first index from the left but -1 is the first from the right, so if you want to get elements with the same offsets from both sides, you always need to add or subtract 1 somewhere. Also, you can get pretty hard to spot mistakes if the index accidentally/unintentionally becomes negative. In other languages, you get a clear exception instead. Imo it would be much nicer to have specific backwards indexing syntax instead, which also starts at 0. Iirc there's at least one semi-popular language which has something like this, but I can't remember which one (something like Kotlin or Swift or similar).
Accomplished_Pea7029@reddit
Huh, I've never thought of this as wrapping around. Just counting back from the end.
ArtisticFox8@reddit
It does not work "wrapping around" modularly as the lowest lengative number will be minus array length.
Tontonsb@reddit
I happen to live in the country where the ground floor is "1". I'd prefer 0-indexing instead.
If I'm on the floor "5" and go 3 floors down, I'm on the floor "2". Makes sense as
5-3=2
.If I'm on the floor "2" and go 3 floors down... I'm on the floor "-2". Makes no sense mathematically.
boadmax@reddit
I always assumed it was because binary you count from 0. And it was probably easier to match that in languages.
We could start at 1 but I don’t think it hurts anything.
mwesthelle@reddit
https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD831.html
UltGamer07@reddit
Cos arr[n] is just shorthand for *(arr + n)
PickltRick@reddit
I guess its since Boolean algebra started with on/off signals either 0 off or 1 on.
adxaos@reddit
Because indexing is just an order preserving map from naturals to the specified set and naturals include minimal element, namely 0.
Last_Being9834@reddit
Because 0 is the first number in decimal and binary. Id the reference point. Also, electronics work with binary so does memory, the first memory location is 0. (As they work as a spreadsheet, the first cell is 0 in electronics)
jax_cooper@reddit
Because if you have a byte with 8 bits, you can represent 256 characters, anything between 0-255, because b00000000 is 0 and b11111111 is 255. I know it seems unrelated but for me it always seemed that the first number I can represent is 0 and not 1 and since arrays go way back, low level programming languages did not set arrays to start with 1 and we got used to it?
+ In C you get the memory address of the nth element by adding the start of the array + n*size(elements), and since the first element is the start of the array (with the exact same memory address), we need n to be 0 and not 1.
Fragrant_Steak_5@reddit
Early languages like C were designed very close to assembly. Since hardware addresses start at 0, it was natural to carry that over. Other languages adopted it for consistency. That's the reason :o
MegaCockInhaler@reddit
It’s so modular arithmetic algorithms work well
Case 1: Zero-based indexing
Indices: 0, 1, 2, …, n-1
The index of the element after shifting k steps from position i is simply:
(i + k) mod n
Example with n = 5, start at i = 3, step k = 4: (3 + 4) mod 5 = 7 mod 5 = 2 → directly gives index 2.
No adjustments needed.
Case 2: One-based indexing
Indices: 1, 2, 3, …, n
Now the formula is messier, because modular arithmetic naturally produces 0..n-1. So you have to shift by 1:
((i-1) + k mod n))+ 1
photo-nerd-3141@reddit
Many of the uses for lists involve finding locations. Arithmetic for finding the locations works most simply with offsets (e.g., finding relative locations w/in an array is an offset, not a count). At that point using offsets from the start saves off-by-one errors when computing locations.
cluxter_org@reddit
Because the first value that is represented in binary for a byte is: 00000000 = 0 in decimal. This is the lowest and simplest value of a byte. Then the second value is: 00000001 = 1 in decimal. Value number 3: 00000010 = 2 in decimal. Value number 4: 00000011 = 3 in decimal. And so on, until: 11111111 = 255 in decimal. So we logically start with the simplest value, which is zero, and we count from here by logically adding 1 every time we need to increase the value.
As simple as it gets.
tr14l@reddit
For calculation of offsets. When you know each object takes, for instance, a 64 bit reference, you reference the first element by adding 0*64 to the memory address (because you are already at the first element). To get to the next element, you'd had 64 bits. Then another 64 for the next element. Now we can jump to any element in the array with one simple multiplication, which is highly efficient.
Starting at 1 just makes you have to do extra operations and confuses people who actually care about the references because now you have to subtract 1 from the index for each calculation. Extra complexity that isn't needed.
I'm other words, the "index" is actually "how many chunks are we from the start". The start would be 0 chunks, because you started there
RyeonToast@reddit
Somethings are best looked at in binary, and I suspect this is one. Pure speculation here, but hear me out.
Let's start with zero, one, and two in binary bytes. That would be 0b00000000, 0b00000001, and 0b00000010. There's a natural progression there. I think it just made sense to the people making compilers for various programming languages to start with the first available byte value, which is all zeros, which comes out to a decimal zero.
I also suspect this is related to the limitations of early systems. Way back, programmers were trying to make use of every bit they could because so little memory was available. This is the reason for two year dates and the Y2K problem. Back at the time, programmers thought "Hey, that's two whole bytes I could use somewhere else that could actually be useful." I think starting from the first available byte value, instead of skipping it, appeals to that tradition as much as it's just natural to do.
Snezzy763@reddit
Actually the two-digit year code started on punch cards. There were only 80 columns and it made no sense to waste two columns on "19" because the year 2000 was half a century in the future. "Hey, technology advances, and by the year 2000 we'll probably have cards with 160 columns." Meanwhile, the year 2038 is already causing problems for old Unix-related software.
Fit-Camp-4572@reddit (OP)
Nice one 😄 thanks
Pvt_Twinkietoes@reddit
Why does it matter? It's arbitrary
Birnenmacht@reddit
I know this has been answere, but the another reason it is still kept like this in higher level languages, is that indexing with -1 to refer to the end makes more sense.
Jazzlike-Poem-1253@reddit
In math it starts with 1. in CS as others pointed out it is the offset from the first element - 0 for the first.
Look into pointer arithmetic and the reason for the convention becomes obvious.
eduvis@reddit
The question has been answered, so I just add my two cents.
1st cent: best answer is: look at binary representation of a number + limitations of early systems (both hardware and software) 2nd cent: I would prefer array index to start with 1, positive index start from beginning of array, negative index start from end of array, 0 index trigger computer shutdown
Different_Counter113@reddit
Indexing starts at 0 mainly because of how memory addressing works.
When you create an array, it is stored in a contiguous block of memory. The name of the array represents the memory address of the first element.
To access the element at position i, the machine calculates the memory address as:
address_of_element = base_address + (i × size_of_element)
If indexing started at 1, then the formula would be:
address_of_element = base_address + ((i-1) × size_of_element)
That extra -1 makes things slightly less natural at the hardware level.
By starting at 0, the index directly represents the offset from the base address.
0 means “no offset → first element.”
1 means “offset by 1 element → second element.”
This is why languages like C (and those influenced by it) start from 0 — it maps cleanly to hardware addressing and is efficient to compute.
Not all languages follow this rule (e.g., Fortran, Lua, MATLAB start indexing at 1), but zero-based indexing became dominant because it simplifies array access and aligns with how computers use pointers.
Plus-Violinist346@reddit
It's based on the perspective of location and distance rather than cardinality. Address x plus size of type times 0.
But I would wager it probably doesn't really need to be, it's kind of just how it evolved. Just the way it is.
Imagine how annoying it would be if the next version of Java was like ok everything is 1 indexed now.
Floppie7th@reddit
Because 0 is the minimum unsigned integer. You can make a data structure that has a custom "minimum" index, but that's going to involve an extra subtract instruction on every access.
notacanuckskibum@reddit
Older programming languages BASIC and FORTRAN used 1 based arrays. C really set the standard at zero based, which more recent languages have followed.
0 based seems to produce fewer off by 1 errors, it allows the standard loop
For (i=0, i < numberofitems, i ++) { array [i]…..
Ok_Appointment9429@reddit
It's a crappy remnant of pointer arithmetic and I can't fathom why more modern languages perpetuated it.
AngeFreshTech@reddit
How do you count ? Do you start by zero or 1? Some programming languages starts indexing at 1. Java and others programming languages make it start at zero. Choose your battle!!
RevolutionaryRush717@reddit
The real question is why we're using two's complement representation.
ammar_sadaoui@reddit
Okay, imagine you’re lining up toys on the floor:
So the number is not “which toy,” it’s “how many steps from the start.” That’s why computers start counting at 0.
custard130@reddit
when you access an element from an array, the number you give as the index is used as the offset from the start of the array
eg lets say i have an array with 100 integers starting at memory address 0x1000
i will have a variable storing this address
then if i access index 0 of the array, that will fetch the integer from that address + 0 * 4 (integer is 4 bytes)
if i access index 1, that will load from the address + 1 * 4 aka 0x1004
to have a 1 indexed array, you either make the array 1 element longer than wanted and then ignore the 0 entry (just pretend that the array starts at 0x1004 even though you still store the start as 0x1000), or you need to subtract 1 as part of every array lookup
another scenario would be say you have an array representing pixels on a screen/in an image
with 0 indexed arrays + coordinates, the index in the array for an given pixel [x,y] will be
x + y * width
,with 1 indexed arrays + coordinates this would be something like
1 + (x - 1) + ((y - 1) * width))
basically the values here need to be 0 indexed for the maths to work out correctly so you would have to constantly convert between them
jshine13371@reddit
FWIW, this isn't true. Some languages do start counting indexes at 1 instead of 0, and it's kind of annoying if you ever need to work in both kinds of languages. An example of this is VB.
-Wylfen-@reddit
You don't start measuring things at 1 meter, right? Same reason.
zhivago@reddit
0 is the additive identity.
If it did not start at 0 then adding indexes or offsets would need to compensate.
Antypodish@reddit
Not all programming languages index start from 0. Lua for example starts by default from 1.
Gnaxe@reddit
Fortran, Lua, Julia, Matlab, Mathematica, and R would like to object. Languages imitating traditional math notation rather than building up from assembly start at 1.
In C arrays are kind of sugar for pointer arithmetic. That explains where the idea came from, but not why it persists. It's not just because we're used to it. Starting at zero is actually better for intervals.
Mozanatic@reddit
I would not call in traditional math notation. I have a masters in math and I have seen plenty of proof where indexing also starts at 0. It really depends on the definition of natural numbers that the teacher uses. Some consider 0 to be part of the natural numbers and some don’t. For me mathematically starting from 0 is as natural as from 1
superluminary@reddit
Traditional as in ancient. Roman numeral / finger counting style. Before we realised that the number line was a thing.
aa599@reddit
In APL you get a choice: the system variable
⎕IO
(Index Origin) can be set to0
or1
.A[⎕ IO]
is always the first element of the array.no_regerts_bob@reddit
A niche language I used back in the late 80s called BASIC09 also had a mechanism for setting the index origin to 0 or 1. Probably copied from APL
Gnaxe@reddit
Lua uses tables for everything, even as arrays. There's nothing stopping you from assigning a zero key to an "array". But the standard array-like functions don't expect that.
A language like Python could similarly use a dict instead of a list or put a dummy value in the zero index.
superluminary@reddit
Because zero is the middle of the number line.
The fact we traditionally count from 1-10 is a historical artifact based on finger counting where one finger is the smallest number of fingers. The number 0 wasn’t invented until the 7th century, and we still carry that legacy.
Starting from 1 excludes 0. 0 has no home.
Grithga@reddit
Not every language does start from zero. Most of the most popular languages do, but there are plenty that start at 1.
Languages are created by humans. The humans who created them decided to start at 0 (except for the ones who decided to start at 1). The ones who chose to start at 0 often did so because:
Array indices are often treated as an offset from the start of the array. You are effectively requesting "the element 0 elements away from the start of the array". This is especially true in languages like C that let you get closer to the memory, where
arr[x]
(item at position x) is directly equivalent to*(arr + x)
(advance the addressarr
byx
positions and dereference)keh2143@reddit
R, usually used for statistics, alao starts at 1
Accomplished_Pea7029@reddit
And MATLAB. I usually work in Python or C, so occasionally when I need to use MATLAB I immediately get a indexing error because I forgot about 1-indexing.
tms10000@reddit
I see your R and I raise you a COBOL!
wildgurularry@reddit
This is a great answer. I grew up learning Pascal, where array indices start at 1. I quickly got into graphics programming which required a mix of Pascal and assembly code.
I quickly realized that I had to subtract 1 from array indices to make the pointer arithmetic work in the assembly code. Since then, 0-based indices just make more intuitive sense to me, and require fewer instructions on the processor to convert into pointer values.
kihei-kat@reddit
Fortran also started at 1
Temporary_Pie2733@reddit
Pascal even let you choose the starting index; IIRC, the only constraint was that indices had to be a contiguous range of positive integers.
Suspicious-Bar5583@reddit
Open stopwatch on phone. Why does it start at zero?
Look at a measuring tape. Why doest it start at zero?
When you decide to collect something new, why does your collection start at zero?
Upon starting your career, why do you have 0 years of experience?
kodaxmax@reddit
it's mostly tradition for modern languages. If it bothers you, you oculd just use dictionaries, unless your truly desperate for every bit of performance.
Paxtian@reddit
Say you have an array a[1, 2, 3]
The memory address of a is ADDR.
The memory address of 1 is also ADDR. So it's ADDR+0.
The memory address of 2 is ADDR+1.
The memory address of 3 is ADDR+2.
Fit-Camp-4572@reddit (OP)
Best reason, simple and complex at the same time.
Lidex3@reddit
This is the best answer. If you want to understand this a bit more, I encourage you to learn how arrays and pointer work in c.
TrueKerberos@reddit
Fun fact: Did you know that in our calendar there is no year 0? The sequence goes directly from 1 BC to AD 1, because the system was created before zero existed and it used Roman numerals.
chipstastegood@reddit
Because in assembly language you start with an address to a memory location, which is the first element in the array, and then add an offset to it to get the test of the array elements. Then higher level languages like C had kept the idea of a pointer to a memory location and index. C then came up with syntactic sugar where you could write x = p[0] and most other C-like languages kept it. This is really just shorthand for p+i where p is the address of the first element and i is the offset. When i=0 you get the first element.
ConsiderationSea1347@reddit
Oof this question and these answers are making me feel old.
ottawadeveloper@reddit
In C and other languages that have to deal with pointers, if you have an array of 4 byte integers, starting at memory x = 0xF67489 (whatever, some number), then the first entry is at x the next at x+4, the next at x+8, etc (each being 4 bytes long). Therefore, the address in memory of the n-th array item is x + 4n where n is the 0-indexed index of the array. 0 indexing keeps the relationship between index and memory locations easy.
Some languages are 1 indexed, like Lua, Fortran, MATLAB, COBOL, etc. These languages are typically aimed at math /science / business people instead of hardcore programmers and therefore make the effort to connect with the 1-indexing people typically use. But more modern programming languages aimed at programmers like Java, Python, Go, Rust have kept the 0-indexing because it's what programmers are used to now.
mortimere@reddit
memory address + (x * byte_size_of_array_type)
sarnobat@reddit
Offset from base address
Business-Decision719@reddit
This is language dependent. In some languages starting with 1 is normal. It happens in Lua and I would say it was fairly normal in Pascal and Basic, just off the top of my head. But I would also say, it's been my experience that languages without a strong convention of zero indexing also are prone to have a very flexible and general approach to indexing.
Pascal liked the idea that array indexes could start and stop wherever you wanted, and that they didn't even have to be integers, just something reasonably be recited in order. So you could have a type like
array ['a'..'z'] of integer
and that would be fine. Lua likes the idea that literally anything can be an index, so you can use 0 as an index if you want, but your can also use strings or something else entirely.The real reason for zero indexing being really common is that a lot of languages evolved from C, and C happened to have zero indexing. I'm not saying there wouldn't be zero indexed languages without that or that there weren't zero indexed languages before that. But the driving question for a lot of the languages has been, "How can we make C more convenient, or make C++ easier, or at least look familiar to C and C++ programmers while doing our own thing?" If some other language had been just as influential then maybe some other indexing strategy would have been just as influential. We start with zero for the same reason we group statements with curly braces. We don't have to, and we don't in every language, but C did it and so many other languages did it that we now expect it.
nerdly90@reddit
It starts at 1 in Lua
Jim-Jones@reddit
What else? 1? Then you can go 1 less and still have a non-negative number.
12 o'clock is really zero.
msiley@reddit
Memory starts at zero. If you have a sequence of things laid out in memory contiguously then to get the very first thing you start at zero and end at the things size. So let’s say the size is 8. You start at 0 and 8 will be the memory chunk it will occupy. The second thing starts at 1 because you need to skip over the first thing. So (1 * 8) is the start position and will go up to (1 * 8) + 8.
dragonflymaster@reddit
Back when I worked on them In Ericsson Electronic Telephone exchanges device numbering started at 0 so the 1st device had address 0, the second 1 etc. It used Eripascal and Assembler/Machine language for its programing languages. It was interesting to watch how people used to Analogue (mechanical) exchanges had so much trouble adapting to that. Some never adapted.
Pale_Height_1251@reddit
They are memory offsets.
Say if you start measuring a wall to hang shelves or something, do you start at 0 or 1 cm?
TemporaryWeird4300@reddit
..
Chickfas@reddit
When you start to watch a video, does it start on 1:00 or 0:00? When you say “first floor” you mean ground floor? When you measure distance between two points, you start with 1cm or 0cm? Etc.
In Lua it starts with 1 actually :D
mapadofu@reddit
Dijkstra wrote a note about this
https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD831.html
huuaaang@reddit
Offset from the start of memory slot. Start + 0
da_Aresinger@reddit
People already mentioned pointers, but that is not the only reason. (although it is clearly the main reason)
Indices starting with 0 means they produce an algebraic closure as residue fields.
This means you can do "normal" math on them and 0 remains a meaningful value.
YetMoreSpaceDust@reddit
“Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration.” - Stan Kelly-Bootle
sessamekesh@reddit
It doesn't always - notoriously, arrays in Lua start with 1.
In C and C++, there's no such thing as an "array" as we know them in modern languages - an array is just a variable that instead of pointing to a chunk of memory with a single value in it, it points to a larger chunk of memory with many values next to each other. The "index" represents "how many variables worth of data should we look forward to find the one we're interested in".
C and C++ are the grandparents of most modern programming languages, so the pattern of accessing arrays stuck. In more modern, memory managed languages, there's no inherent reason that 0 needs to be the start - as Lua demonstrates - but changing that pattern also makes a pretty strong annoyance for any programmer who works in multiple languages - as Lua demonstrates.
Traditional_Crazy200@reddit
There is a reason, having 1 as the starting Index adds one extra computation
sessamekesh@reddit
For compiled languages, the extra computation happens at compile time and is pretty trivial (in the range of "shorter variable names are better because they parse faster" trivial).
For runtime languages I can see this being a thing, but an extra add op is pretty quick. The possibility of cache missing on a
length
property for bounds checking probably dwarfs the subtraction cost.JIT languages (Java, C#) and immediately compiled languages (JavaScript) probably behave more like properly compiled languages here too.
TheUltimateSalesman@reddit
Because zero is where it starts reading and goes to the beginning of the next one.
Todegal@reddit
Imagine you are iterating using an 8 bit unsigned integer (as they did back in the day), which has a maximum value of 255. If you start at 1 then you can only index up to 255 different values, but if you start at 0 you can now index 256 different values. So why wouldn't you?
nameisokormaybenot@reddit
It's easier to understand why if you study Assembly and understand how data is kept in registers and/or memory. We have to remember that data has a physical dimension to it inside the machine. Think of each storage unit as a box and each box has an address. If you move to a certain address, you are moving to a location in memory. Then you read from that position onward. From that location to the next, you move a "word" (say, 8 bytes). Then you have moved one position. Therefore, the first "read" goes from 0 until you move 1 location. That's one word. Moving two positions would be going from 0 until you "walk" 2 locations. The sequence of words then goes like this: 0 (first), then 1 (second), and then you are at location 2 (the start of the third location).
Thinking with numbers: you go to address 1000 [0]. You have to read from this position to get the data from this position onward. If yo u skip this and start reading from 1001 [1], you will lose this data in your reading. The next data is at address 1001 [1]; the next at 1002 [2], and so on.
0 1 2 3 | - - - - - - - - | - - - - - - - - | - - - - - - - - | - -
Another way of thinking about this is you go to address 10142 [0]. To read what is at this address you have to add 0 to it, else if you add 1 you would be reading address 10143 [1], and then 10144 [2], and so on.
Robert__Sinclair@reddit
This is the way
Narrow-Coast-4085@reddit
The first item in the list is zero steps from the start, the next is one step from the start, the next is 2 steps, and so on. If you're at the start, you need 0 steps to get the item.
sparant76@reddit
I want you you to take 2 people from a line of people. Starting with person 10.
Are you picking person 10 and 11 or 11 and 12?
Person 10 and 11 right?
So the first person starting at person 10 in line is 10+0 and the second is 10+1 etc.
AffectionatePlane598@reddit
because when counting in hex it goes 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f
Tissemat@reddit
Well the first spit is the smallest number . And to smallest Bit is 0
Extra_Intro_Version@reddit
Fortran also starts at 1. FWIW
KalasenZyphurus@reddit
There are some rare languages that use 1-indexing. We don't like to talk about those.
Mostly though, it's because we use the same data types as we use for other numbers to refer to the index. At the lowest level, everything is binary, like most people mention. But we use that binary to represent things. That could be true/false, it could be ASCII characters, it could be the entire contents of your computer's memory, with memory addresses pointing to various spots in that giant binary sequence. It can also map to different numbers than the literal binary number. It could be floating point numbers, it could be signed integers, it could be unsigned integers, Whatever is useful to map a series of flipped switches to. Even negative numbers have to be mapped to an otherwise positive binary sequence, using the Two's Complement method where the leftmost digit represents the sign rather than the number. For example, the binary "11111101" is 253 in decimal, but under Two's Complement, "11111101" is -3. The data type, the context of what the binary is supposed to represent, is important to keep in mind always.
Since arrays hold a countable number of things, they don't need a negative index. Some languages that allow you to specify a negative index use that to let you "wrap around" from the end, rather than referring to an actual negative slot. When referring to the actual slots in the array though, you don't need a negative number.
For that reason, the data type used for the index of arrays is generally an unsigned integer type, whether that's a 0-255 byte type or 0-2,147,483,64 or what-have-you. Those start at zero for those data types because "0" is a viable count of things to have, and it maps cleanly to the literal binary. "00000000" is 0, "00000000" is 1", etc. Programmers found it more useful to have a 0-255 type with that clean representation as opposed to a 1-256 type where "00000000" maps to 1, "00000001" maps to 2, "11111111" maps to 256, etc. 0 is a useful number, part of the natural numbers.
So if arrays use one of those types as the index input, 0 is one of the values that can get passed in as an array index. Since 0 has to be accepted, they label the first slot in the array as 0. The confusion comes in because the index number labelling the slot is different from the count of things. Slot 0 is the first, slot 1 is the second, and so on.
waffleassembly@reddit
Why does every counting system start with zero?
emote_control@reddit
I think the simplest answer is this:
You have a finite number of memory registers. They are numbered in binary like 0, 1, 10, 11, etc. You put an array in memory. What are you going to choose for the first index? If you choose 1, then you're skipping 0 and not putting anything in it. You have finite resources. Why would you skip 0 if you can use it? If you say "oh, I'll use 0, but call the index 1", then now you have to store that conversion somewhere in memory, and it'll take more space than just starting the index at 1 would have.
When the structure of computers was being laid down, resources were *tight*, and you had to use every bit you possibly could. We're talking on the order of a few kilobytes or even less. Now we do it because that's the way it's done, and to change it would be confusing, and would break algorithms that assume that the structure is the way it is.
tellingyouhowitreall@reddit
x = y
e = x + 50
while (x < e)
a[x++]
IrrerPolterer@reddit
The idea of indexes started as positional offsets in arrays of data. Say you have an array of bytes in memory. In order to read any byte in your array, you need 1. the starting position of your array, and 2. the offset from the starting position. Your first byte starts right at the start of the array, so offset is 0.
Another thing is that counting in binary makes most sense starting at 0. otherwise you're effectively wasting number space.
aleques-itj@reddit
It's easier to think of it as an offset.
Say you have an array of things. They're just sitting next to each other in memory.
There's nothing to add to the address if you're already at the beginning. The first one is effectively just arrayAddr+0.
Linestorix@reddit
You have to forget about how you learned to count. That was an arbitrary thingy and was only marginally connected with representations of reality.
_stroCat@reddit
If I had to guess, it's probably a remnant of binary and switches. The first position when counting is always everything turned off or all zeroes. One, would be first position turned on.
bit_shuffle@reddit
Fortran starts from 1 to be more like math equations.
Happy programming learning.
QuirkyFail5440@reddit
Historically in machine language or assembly, people were working with memory address and offsets.
If you had five things, each taking 10 bytes, starting at 0x12A0 or whatever, the first thing is at 0x12A0+(0*10).
The second thing is at 0x12A0+(110). The 3rd is at 0x12A0+(210).
Higher level language were like 'Let's have an array!' a is an array with five ints...a[0]'
In c at least, it's really just doing the arithmetic for you. Like 'a[N]' is the same as (a+N) and the compiler knows that Nsize of(N) is where to look.
So starting with 0 makes things easier.
Higher level languages abstract all of this away and then it's like, hold on, why do we have a 0th thing? Just have [1] be the first. And some languages did exactly that. Like....VB6, COBOL, Fortran and I dunno.
I think we've mostly accepted that 0 is a preferred convention.
Hugo1234f@reddit
The notation ’a[b] = c’ means that you first go to the memory adress of the array a, then go b * bytes further and write c there.
Starting at 0 simply means that you go to the start of the array, and then move 0 elements further into the list.
Affectionate_Horse86@reddit
lot of languages start at 1. Some start were you want, like Ada.
1luggerman@reddit
Its because of how arrays work under the hood.
Lets start simple, each variable is stored in memory, and the memory has addresses. So when you write something like: Int num = 10 The compiler of the languege finds an empty address on the memory, lets say 3 and puts the number 10 there. Num actually holds the address in the memory of where you put that value.
An array is a continous block of memory, so when you declare an array of size 5 the compiler looks for 5 consequtive free addresses, lets say 4, 5, 6, 7, 8 and gives you the address of the first one, 4, to save in the variable.
So how do you access each element this way? You go to the begining address and jump as much as you need.
arr[1] is translated to the address 4+1. The first element is at address 4 + 0 which is accessed by arr[0]
Xatraxalian@reddit
Not every language does. Many versions of Pascal started at 1.
teerre@reddit
To understand this you need to understand memory. The tldr version is that arrays are literally "blocks" of memory organized one after the other. Accessing "the array" is really accessing the first block. If you want some other element, you need to add an offset from this first block. See:
Memory layout of an array:
┌────────┬────────┬────────┬────────┬────────┐ │ arr[0] │ arr[1] │ arr[2] │ arr[3] │ arr[4] │ └────────┴────────┴────────┴────────┴────────┘ ^
│
Base address (pointer to arr[0])
Accessing arr[i] means: address = base_address + (i * size_of_element)
Example: arr[2] = base_address + (2 * size_of_element)
Ronin-s_Spirit@reddit
Because it's very comfortable programmatically.
The first element in a binary block of elements of 8 bytes long would start at
8×0
, the 4th element would start at8×3
and end at8x4
. This logic is very simple, you can draw it on a strip of paper and verify that yourself.Writing
i<arr.length
at least seems more efficient thani<=arr.length
, andlet i=1
lets you know that you have skipped1
element.code_tutor@reddit
People are saying pointers but it's also good for modulus math.
dajoli@reddit
EDW831 is a nice exploration of this from a theoretical point of view.
VibrantGypsyDildo@reddit
so that you could address Nth element with
initial_address + N * element_size
.or so that you didn't lose one value (
0
) when addressing elements.FLSurfer@reddit
https://www.reddit.com/r/learnpython/comments/vn4gzc/comment/ie52doi/
Lovecr4ft@reddit
Nice souvenir and very clever
leitondelamuerte@reddit
it's about binary and memory usage
because when you index something you are alocating a piece of memory(bytes) to do so.
And the the first number in the sequence is the full zero: 0000
So it's a way to save memory.
ChaosCon@reddit
Because indexing is a different operation from counting.
LowB0b@reddit
because of C and pointers. *(ptr + 0) = ptr[0] = first element
_Atomfinger_@reddit
It doesn't start with 0 in any language. For example, Lua is 1-indexed.
I don't know the actual reason, but I think it is because 0 is a very natural number in programming. I.e. the first position being position 0, and that it is a bit fiddly to "exclude" 0 when all other numbers are, technically, valid.
Internal_Outcome_182@reddit
because computer language (binary) starts from 0, and there is only 0 and 1.