dd block size | TheaterFire

[-]

kopsis@reddit

The idea is to use a size that is big enough to reduce overhead while being small enough to benefit from buffering. If you go too big, you end up largely serializing the read/write which slows things down. Optimal is going to be system dependent, so benchmark with a range of sizes to see what works best for yours.

[-]

DFS_0019287@reddit

This is the right answer. You want to reduce the number of system calls, but at a certain point, there are so few system calls that larger block sizes become pointless.

Unless you're copying terabytes of data to and from incredibly fast devices, my intuition says that a block size above about 1MB is not going to win you any measurable performance increase, since system call overhead will be much less than the I/O overhead.

[-]

EchoicSpoonman9411@reddit

The overhead on an individual system call is very, very low. A dozen instructions or so. They're all register operations, too, so no waiting millions of cycles for fetch data to come back from main memory. It's likely not worth worrying too much about how many you're making.

It's more important to make your block size some multiple of the read/write block sizes of both of the I/O devices involved, so you're not wasting I/O cycles reading and writing null data.

That being said, I agree with your intuitive conclusion.

[-]

dkopgerpgdolfg@reddit

Sorry, but that's a lot of nonsense.

It's likely not worth worrying too much about how many you're making.

Yes, you've shown the register preparing for the "syscall" statement. You've not shown how long context switching takes, and how much impact the MMU cache has. This "one instruction" (syscall) can cost you a five-digit amount of cycles easily, and that's without the actual handling logic within the kernel code.

As the topic here is dd, try dd'ing 1 TB with bs=1 vs bs=4M.

Otherwise, syscall slowness is a serious topic in many other areas. Some specific examples include eg. reasons why things like DPDK and io_uring were made, CPU vuln mitigations (eg. spectre), ...

[-]

EchoicSpoonman9411@reddit

Sorry, but that's a lot of nonsense.

That's kind of harsh, man.

This "one instruction" (syscall) can cost you a five-digit amount of cycles easily

That's... not a lot. It's a few microseconds on any CPU made in the last couple of decades.

try dd'ing 1 TB with bs=1 vs bs=4M (not everything of the difference is because syscall overhead, but still).

Almost none of the overhead in that example will be because of system call overhead.

So, the average I/O device these days has a write block size of 2K or 4K, something. Let's call it 2K for the sake of argument. When you dd with bs=1, you're causing an entire 2K disk sector to be rewritten in order to change 1 byte. Then again for the next, until each 2K disk sector is rewritten 2048 times before it goes on to the next one, which is also rewritten 2048 times, and so on.

Of course that's going to take a long time.

[-]

dkopgerpgdolfg@reddit

That's... not a lot

It's thousands of times more than those "12 register operations". And as syscalls aren't a one-time thing, it adds up over time.

About the dd example: Try it with /dev/zero, so you don't have any disk issues.

Btw. I just tried on that computer I'm using currently. The difference is a factor of about 29000x.

[-]

EchoicSpoonman9411@reddit

It's thousands of times more than those "12 register operations". And as syscalls aren't a one-time thing, it adds up over time.

Of course it does. But the system call overhead under real-world conditions, meaning for bs= values which actually make plausible sense, is negligible compared to the I/O load.

Try it with /dev/zero, so you don't have any disk block size issues.

What's the point of doing that? Of course if you eliminate the I/O load from the equation, the system call load becomes relevant, because the CPU isn't idle waiting for I/O to finish, but then it's not germane to the original problem.

[-]

lelddit97@reddit

just a wandering subject matter expert: the other person knows what they are talking about and you don't

[-]

EchoicSpoonman9411@reddit

There is sufficient demand for Linux kernel expertise so that SMEs don't need to live in their parents' basement.

You're that other guy's alt. You have the same rudimentary skill at reading comprehension.

[-]

lelddit97@reddit

no, i am engineer who knows what they are talking about and you are arguing for the sake of arguing

[-]

EchoicSpoonman9411@reddit

If you're not that guy's alt, then you wandered into a discussion thread for... what, exactly? You could have read it and not commented, it's really fucking easy. So who's arguing for the sake of arguing?

i am engineer

You sound like a fucking toddler. Communication is an important skill for "engineers" too.

[-]

dkopgerpgdolfg@reddit

is negligible compared to the I/O load.

Then just forget about hard disks and look at everything else. Page faults, pipes, ...

And I once again point you to the projects etc. mentioned above. It's everywhere. ... If you see someone saying they set mitigations=off so that there computer gets faster, and they can accept the reduced security because they only play games, then their problem was syscalls overhead.

What's the point of doing that?

Afaik, the topic was not if hard disks IO is slow, the topic was that a syscall takes much more than just a dozen register operations.

In any case, I said what I want, not going to fight about semantics. Bye.

[-]

DFS_0019287@reddit

My understanding is that the overhead of a system call is more than just the instructions; there's also the context switch to kernel mode and then back to user mode. A system call is probably 10x more expensive than a normal user space function call.

But as you wrote, this is still pretty negligible overhead compared to disk I/O.

[-]

LvS@reddit

you end up largely serializing the read/write which slows things down

I believe the larger problem is that you blow the CPU caches. If /u/etyrnal_ sets size to 500M, then each read will fill up the whole L3 cache multiple times over which means once you start writing you need to get the memory to write back from RAM.

And avoiding the detour through RAM is kinda important for performance.

[-]