I’d be curious to learn more about the CI/static analysis that can flag the use of certain functions, beyond just the lints that something like Clang provides?
For example, if your codebase uses a library that replaces a series of functions from a C header that you want to prevent use of.
I’d be curious to learn more about the CI/static analysis that can flag the use of certain functions, beyond just the lints that something like Clang provides?
Wouldn't grepping suffice?
For example, if your codebase uses a library that replaces a series of functions from a C header that you want to prevent use of.
I cannot parse that. Do you mean:
You are using a library to replace dangerous functions (gets, snprintf, etc)
or
You are using a library that replaces your safe functions with gets, snprintf, etc
Here's an example where grepping isn't good enough: imagine a library with two functions, AAA and BBB. AAA is acceptable; BBB is banned.
You can call BBB() if you happen to know the byte offset of the banned function from AAA(). Let's say BBB is 1234 bytes away fro AAA in the library. Instead of calling BBB() you instead call (AAA+1234)().
Yes, I've done this, and yes it's both groddy and delicate. Every new release of the library will almost certainly change the magic calling offset
You can call BBB() if you happen to know the byte offset of the banned function from AAA(). Let's say BBB is 1234 bytes away fro AAA in the library. Instead of calling BBB() you instead call (AAA+1234)().
I can't think of any static analysis that can flag usage of BBB.
Especially since you're going to have to cast the address to the type of a function, effectively silencing any compiler or static analysis tool that does warn you about it.
Unless your tool emits a warning on any and every cast, this can't really be caught.
Suppose my codebase uses a library “foo” that provides a special string type. I want to prevent people from using std::string. Some tool/compiler warning/lint that points them to use foo::string instead
I find that turning up the warnings in gcc and clang does a well enough job. I tried tidy and some of it is just junk (it ignores the casting between sign and unsigned and claims there's a signed/unsigned mismatch) and some parts of it is useful (there's a rule telling you if you forgot O_CLOEXEC)
If you want to delete functions you can use a define. Git has a banned header file that you can use as an example https://github.com/git/git/blob/master/banned.h
I don’t mind it too much. Though my personal preference is:
True brace style (just like Curl)
Always use braces.
else goes in the same line as the preceding closing brace: } else {
If I made a language, the parenthesis around the conditional would be optional, and the braces around the following block/instruction would be mandatory.
I'd take out always use braces as auto-indentation takes care of catching missing braces. Then I would add space before parens, in the style of Lisp and English.
Then I would add space before parens, in the style of Lisp and English.
I do that for if, while, and for, but for function calls I stick them to the function name: f(x). I’ve seen f (x) in the wild, but to me it makes more sense in languages like Lisp or ML, who use juxtaposition for function calls:
f(x, g(y)); // C, Java…
(f x (g y)) // Lisp
f x (g y) // ML, Haskell…
I generally prefer return/break/continue to not have a curley brace. If someone add an expression before it without putting a curley brace, the loop is likely to misbehave 100% of the time
Ah, good point. Though personally I don’t distrust the people who touch my code that much. I do distrust the people whose code I touch, but that’s because I can see the crap they wrote.
I'll cuddle the braces for everything but functions. But I'll skip braces if none of the if/else conditions need them. If one in the block of code.
The if/else on line 102 of the linked file is a good example. It bothers me. The second if should be with the else and there should be braces around that.
Code should be written narrow. It is hard on the eyes to read long lines, so we enforce a strict 80 column maximum line length.
This explanation — though often repeated — is completely baseless. It’s true that there are studies showing that reading wide text is tiring because it requires moving the pupils, and that ~80 columns (often less) is therefore easier to read.
However, this is only true for continuous text that’s being read top to bottom. The way we read prose and source code differs fundamentally, and (as far as I’m aware) there’s not a single study showing that enforcing an 80-column limit improves readability, or that this would be because of the stated reason. In fact, all experience points to this not being the case.
Sure, endlessly long lines are harder to read — at a minimum we want to avoid having to scroll our editor; so there needs to be some limit. But setting a hard cutoff, or setting it at 80 char, is not based on any evidence and probably hurts more than it helps. The assertion that “it’s hard on the eyes”, in the context of source code, is almost certainly false. I wish this myth would just die.
The reason long lines are harder to read is not only because they require more eye movement, but also because you have to "track" the line all the way back to the beginning when you have to change to the next line. This tires the eye and the brain (you have to focus more) and is prone to error (often you get lost when moving to the next line and have to re-scan/re-read the text to correct). It's much easier for our brain to identify the next line when they are short (i.e. if the whole text width is visible at once). I don't know if there are studies about this for code, but I can tell, with absolute certainty, that I have the same difficulty reading long lines in code than in prose.
Besides, not everyone has a wide screen or work with the editor in full-screen. For instance, I usually work with 2-3 text screens side-by-side when programming or reading code. Thus short lines are more "accessible" in this regard too.
That’s why I wrote “almost certainly”. The point is, there’s no a priori reason to assume it’s true. People only think that because they inappropriately extrapolate evidence that is completely inapplicable here.
And beyond that, we can all only resort to our personal experience, and my own personal experience tells me that it’s very obviously not the case: too narrow code decreases readability — if for no other reason then because it often forces people to work around this limitation, to the detriment of readability: somebody else already commented on 2- vs 4-column indent. It also encourages cryptic, abbreviated identifiers (I agree that short names are better than long names, but only if this doesn’t come at the cost of clarity), and breaking statements which are just slightly too long across lines. Which of these reads better?
result = a_computation_involving(some_terms * (with_sub * expressions)) + more
result = (
a_computation_involving(some_terms * (with_sub * expressions)) + more
)
There’s a reason why many modern style guides and formatters make the line length limit flexible to avoid breaking such expressions (for instance, ruff by default will reformat the above Python code to fit into a single line even though it’s 82 characters — assuming it is inside a function).
That CVE list does not bode well for the rest of C software if that's "world's best"
It's probably the second most deployed library in the world, and having a 5 year period with no critical vulnerabilities is pretty damn good considering the surface area and high-value of RCE-ing curl.
There are plenty of less used code written in something other than C which have more CVEs.
And even if they did have CVEs, you'd only count those that are due to using C for your statement "That CVE list does not bode well for the rest of C software"
The fact that the CVE list is as long (rather, as short) as it is is actually a point in curl’s favour given how much of the worlds infrastructure runs on it.
You (I assume) work in software. This shouldn’t be a surprise to you.
Over the last five years, we have received no reports identifying a critical vulnerability and only two of them were rated at severity high. The rest ( 60 something) have been at severity low or medium.
A dozen low/med CVEs a year doesn't sound that bad to me, more like an indication that cURL is heavily scrutinised.
cURL his is the world’s most-used system for client networking and as such, it’s an incredibly large attack vector with many creative ways attackers could cause damage. Don’t mistake the scale of the problem for a skill issue or anything else, really.
Also, “has CVEs filed on them” can just as well mean “some scold who couldn’t hack it in an actual R&D role tried to puff up their chest against a system they don’t understand”, so I take any and all CVE as a grain of salt. The system and the community of IT security community don’t deserve the benefit of the doubt anymore, IMO.
We use two-spaces indents to still allow us to do some amount of indent levels before the column limit becomes a problem.
I used to write with two-spaces indents, but nowadays I find such code hard to read. This is not an eyesight problem, and I already use patterns -- such as "guard-style" -- which minimize indentation... two-spaces is just not good enough for my brain any longer, I guess.
So I switched quite some time ago already to 4-spaces indent, it's just much more comfortable for me.
I do use slightly longer lines, though that's just because I can fit 3 editors at 120-lines width across my screen (complete with file-tree on the left-hand and file overview on the right-hand).
I used to write with two-spaces indents, but nowadays I find such code hard to read. This is not an eyesight problem, and I already use patterns -- such as "guard-style" -- which minimize indentation... two-spaces is just not good enough for my brain any longer, I guess.
So I switched quite some time ago already to 4-spaces indent, it's just much more comfortable for me.
Meh; I just compromised between the 2-space and 4-space indentation proponents; I wrote a little vim script that that alternated between 2 and 4 space indentation on every alternate line.[1]
Now everybody's happy.
=======================
[1] Of course I'm joking! My very first PR with that got shut down, after all!
Meh; I just compromised between the 2-space and 4-space indentation proponents; I wrote a little vim script that that alternated between 2 and 4 space indentation on every alternate line.
For code I write indents levels as prime numbers. First indent level is 2 spaces, 3 spaces, 5 spaces, etc. I call it Erathostenes indentation.
In YAML otoh, I use BB(1), BB(2), etc. as corresponding indentation level. Yes. My YAML contains the Collatz conjecture.
And that's why tabs are better. Change the indentation spacing at any time by changing a setting in your IDE. You want 2 spaces today and 4 tomorrow. No problem.
That's only true in naive situations, if every line of code is indented perfectly.
Often times in coding style, you need ways to handle multi-line code. There's a lot of value in being able to make sure how it looks on your screen is how it looks on others'. In code blocks like the following it's often unclear what should be a tab and what should be a space:
/*
* Some inline comments
*/
callSomeStuff(param1,
param2,
param3);
If you look at some code like this, do you immediately know which one is tab and which one is space? Without constantly turning on/off editor visualizations for them? In a lot of practical code base, using tabs just ends up creating a lot of confusing situations. It could work, but I think it tends to force you to spend some mental energy.
Also they're a logical indentation character. One indentation character equals one indentation level. No possibility for partial indent levels sneaking in.
Don't dismiss partial indentation until you've tried putting labels on half-indents! Gives switch statements a far more readable silhouette. Ideally it'd be a presentation option in the IDE rather than whitespace characters on disk, though.
This is my biggest observation of tabs vs spaces debate.. if you use tabs you can change the spacing to your liking without reformatting the code. Best of both worlds. You want single space indentation? Sure. 4 space? Go ahead. 10 space because you have an ultrawide? sure why not.
Maybe, but there are diminishing returns and it runs counter to the 4 default now seen everywhere.
The international tab standards committee lowered it from 8 due to modern programming languages' safety and verbosity when paired with the advent of 16:9 monitors.
Tabs are better than spaces, because the reader can set the tab width to their own preference. Sometimes it is an accessibility issue: depending on one’s visual disability, they might need different tab width. As for the blind, tab is only one character, and a clear indicator of indentation if you do the sane thing and keep using spaces for alignment.
And of course, when you use tabs, you can just change the width, if and when your personal preference ever changes.
There are two minor downsides:
You need to make sure your code still looks pretty under different tab widths. It hardly changes anything in practice, but you do have to mind a couple edge cases.
You need to chose how many spaces tabs are worth, when setting your line length limit. And you need to document that choice. People can chose whichever tab width they prefer when reading your code, but they do need to know how much spaces a tab is worth if they want to contribute.
droxile@reddit
I’d be curious to learn more about the CI/static analysis that can flag the use of certain functions, beyond just the lints that something like Clang provides?
For example, if your codebase uses a library that replaces a series of functions from a C header that you want to prevent use of.
lelanthran@reddit (OP)
Wouldn't grepping suffice?
I cannot parse that. Do you mean:
or
Which of the two do you mean?
rsclient@reddit
Here's an example where grepping isn't good enough: imagine a library with two functions, AAA and BBB. AAA is acceptable; BBB is banned.
You can call BBB() if you happen to know the byte offset of the banned function from AAA(). Let's say BBB is 1234 bytes away fro AAA in the library. Instead of calling BBB() you instead call (AAA+1234)().
Yes, I've done this, and yes it's both groddy and delicate. Every new release of the library will almost certainly change the magic calling offset
kevkevverson@reddit
I mean things still get reviewed by humans who will ask what the hell you’re doing
lelanthran@reddit (OP)
I can't think of any static analysis that can flag usage of BBB.
Especially since you're going to have to cast the address to the type of a function, effectively silencing any compiler or static analysis tool that does warn you about it.
Unless your tool emits a warning on any and every cast, this can't really be caught.
droxile@reddit
Suppose my codebase uses a library “foo” that provides a special string type. I want to prevent people from using std::string. Some tool/compiler warning/lint that points them to use foo::string instead
levodelellis@reddit
I find that turning up the warnings in gcc and clang does a well enough job. I tried tidy and some of it is just junk (it ignores the casting between sign and unsigned and claims there's a signed/unsigned mismatch) and some parts of it is useful (there's a rule telling you if you forgot O_CLOEXEC)
If you want to delete functions you can use a define. Git has a banned header file that you can use as an example https://github.com/git/git/blob/master/banned.h
syklemil@reddit
It's possible to used a
banned.hthe way the git project and MS do. They contain a bunch of macros that make using e.g.getsa compilation error.cdb_11@reddit
#pragma GCC poison printf snprintfnoodles_jd@reddit
You want something like Coverity; it goes way beyond linting. We use that, I'm sure there's many others like it.
TTachyon@reddit
I don't know how curl does it, but how we do it is just searching the undefined symbols/imports in the built binary.
__konrad@reddit
I don't like curl C style: https://github.com/curl/curl/blob/49ef2f8d1ef78e702c73f5d72242301cc2a0157e/src/tool_getpass.c#L106
loup-vaillant@reddit
I don’t mind it too much. Though my personal preference is:
elsegoes in the same line as the preceding closing brace:} else {If I made a language, the parenthesis around the conditional would be optional, and the braces around the following block/instruction would be mandatory.
xoner2@reddit
I mostly agree.
I'd take out
always use bracesas auto-indentation takes care of catching missing braces. Then I would add space before parens, in the style of Lisp and English.loup-vaillant@reddit
I do that for
if,while, andfor, but for function calls I stick them to the function name:f(x). I’ve seenf (x)in the wild, but to me it makes more sense in languages like Lisp or ML, who use juxtaposition for function calls:sammymammy2@reddit
Space separating function from its parameter list makes no sense to me.
levodelellis@reddit
What about
if (cond) break;?loup-vaillant@reddit
Though if I could, I’d rather go
if cond { break; }levodelellis@reddit
I generally prefer return/break/continue to not have a curley brace. If someone add an expression before it without putting a curley brace, the loop is likely to misbehave 100% of the time
loup-vaillant@reddit
Ah, good point. Though personally I don’t distrust the people who touch my code that much. I do distrust the people whose code I touch, but that’s because I can see the crap they wrote.
noodles_jd@reddit
I mostly agree.
I'll cuddle the braces for everything but functions. But I'll skip braces if none of the if/else conditions need them. If one in the block of code.
The if/else on line 102 of the linked file is a good example. It bothers me. The second if should be with the else and there should be braces around that.
guepier@reddit
This explanation — though often repeated — is completely baseless. It’s true that there are studies showing that reading wide text is tiring because it requires moving the pupils, and that ~80 columns (often less) is therefore easier to read.
However, this is only true for continuous text that’s being read top to bottom. The way we read prose and source code differs fundamentally, and (as far as I’m aware) there’s not a single study showing that enforcing an 80-column limit improves readability, or that this would be because of the stated reason. In fact, all experience points to this not being the case.
Sure, endlessly long lines are harder to read — at a minimum we want to avoid having to scroll our editor; so there needs to be some limit. But setting a hard cutoff, or setting it at 80 char, is not based on any evidence and probably hurts more than it helps. The assertion that “it’s hard on the eyes”, in the context of source code, is almost certainly false. I wish this myth would just die.
jlombera@reddit
The reason long lines are harder to read is not only because they require more eye movement, but also because you have to "track" the line all the way back to the beginning when you have to change to the next line. This tires the eye and the brain (you have to focus more) and is prone to error (often you get lost when moving to the next line and have to re-scan/re-read the text to correct). It's much easier for our brain to identify the next line when they are short (i.e. if the whole text width is visible at once). I don't know if there are studies about this for code, but I can tell, with absolute certainty, that I have the same difficulty reading long lines in code than in prose.
Besides, not everyone has a wide screen or work with the editor in full-screen. For instance, I usually work with 2-3 text screens side-by-side when programming or reading code. Thus short lines are more "accessible" in this regard too.
lelanthran@reddit (OP)
This is true.
This is [Citation Needed] stuff.
It's true that there are no studies showing 80-cols is better for code, but there are likewise no studies showing 80-cols is worse for code.
guepier@reddit
That’s why I wrote “almost certainly”. The point is, there’s no a priori reason to assume it’s true. People only think that because they inappropriately extrapolate evidence that is completely inapplicable here.
And beyond that, we can all only resort to our personal experience, and my own personal experience tells me that it’s very obviously not the case: too narrow code decreases readability — if for no other reason then because it often forces people to work around this limitation, to the detriment of readability: somebody else already commented on 2- vs 4-column indent. It also encourages cryptic, abbreviated identifiers (I agree that short names are better than long names, but only if this doesn’t come at the cost of clarity), and breaking statements which are just slightly too long across lines. Which of these reads better?
There’s a reason why many modern style guides and formatters make the line length limit flexible to avoid breaking such expressions (for instance, ruff by default will reformat the above Python code to fit into a single line even though it’s 82 characters — assuming it is inside a function).
phillipcarter2@reddit
Missing in the list: have the architect and contributor of the most code be one of the world's best C programmers :)
cpp_is_king@reddit
“Who is also a giant chode and actively hostile to contributors”. Might as well add that if we’re keeping score
gmes78@reddit
"contributors"
Halkcyon@reddit
That CVE list does not bode well for the rest of C software if that's "world's best"
lelanthran@reddit (OP)
It's probably the second most deployed library in the world, and having a 5 year period with no critical vulnerabilities is pretty damn good considering the surface area and high-value of RCE-ing curl.
There are plenty of less used code written in something other than C which have more CVEs.
And even if they did have CVEs, you'd only count those that are due to using C for your statement "That CVE list does not bode well for the rest of C software"
Rain-And-Coffee@reddit
What's the most deployed? SQLite?
NYPuppy@reddit
SQLite and curl are distributed everywhere so it's likely one of those two. Even Windows ships with SQLite.
mlieberthal@reddit
I was thinking glibc but have no idea really
yoch3m@reddit
That, or gcc / a C compiler?
Jmc_da_boss@reddit
The curl project has a really good cve track record though
ClassicPart@reddit
The fact that the CVE list is as long (rather, as short) as it is is actually a point in curl’s favour given how much of the worlds infrastructure runs on it.
You (I assume) work in software. This shouldn’t be a surprise to you.
SpaceMonkeyAttack@reddit
From the article:
A dozen low/med CVEs a year doesn't sound that bad to me, more like an indication that cURL is heavily scrutinised.
phillipcarter2@reddit
cURL his is the world’s most-used system for client networking and as such, it’s an incredibly large attack vector with many creative ways attackers could cause damage. Don’t mistake the scale of the problem for a skill issue or anything else, really.
Also, “has CVEs filed on them” can just as well mean “some scold who couldn’t hack it in an actual R&D role tried to puff up their chest against a system they don’t understand”, so I take any and all CVE as a grain of salt. The system and the community of IT security community don’t deserve the benefit of the doubt anymore, IMO.
matthieum@reddit
I used to write with two-spaces indents, but nowadays I find such code hard to read. This is not an eyesight problem, and I already use patterns -- such as "guard-style" -- which minimize indentation... two-spaces is just not good enough for my brain any longer, I guess.
So I switched quite some time ago already to 4-spaces indent, it's just much more comfortable for me.
I do use slightly longer lines, though that's just because I can fit 3 editors at 120-lines width across my screen (complete with file-tree on the left-hand and file overview on the right-hand).
lelanthran@reddit (OP)
Meh; I just compromised between the 2-space and 4-space indentation proponents; I wrote a little vim script that that alternated between 2 and 4 space indentation on every alternate line.[1]
Now everybody's happy.
=======================
[1] Of course I'm joking! My very first PR with that got shut down, after all!
-Y0-@reddit
For code I write indents levels as prime numbers. First indent level is 2 spaces, 3 spaces, 5 spaces, etc. I call it Erathostenes indentation.
In YAML otoh, I use BB(1), BB(2), etc. as corresponding indentation level. Yes. My YAML contains the Collatz conjecture.
evaned@reddit
Reminds me of my style of
int * p, which I picked so it makes both theint *pandint* ppeople mad. ;-)MechanicalHorse@reddit
I thought you were gonna say you use 3-space indentation
EducationalBridge307@reddit
Ada style guide recommends three-space indents: https://www.adaic.org/resources/add_content/docs/95style/html/sec_2/2-1-2.html
noodles_jd@reddit
And that's why tabs are better. Change the indentation spacing at any time by changing a setting in your IDE. You want 2 spaces today and 4 tomorrow. No problem.
y-c-c@reddit
That's only true in naive situations, if every line of code is indented perfectly.
Often times in coding style, you need ways to handle multi-line code. There's a lot of value in being able to make sure how it looks on your screen is how it looks on others'. In code blocks like the following it's often unclear what should be a tab and what should be a space:
If you look at some code like this, do you immediately know which one is tab and which one is space? Without constantly turning on/off editor visualizations for them? In a lot of practical code base, using tabs just ends up creating a lot of confusing situations. It could work, but I think it tends to force you to spend some mental energy.
syklemil@reddit
The old rule is tabs for indentation, spaces for alignment. So you'd have tabs up to where the
cbegins, and spaces between there andp.The new rule is "just let the autoformatter handle it".
syklemil@reddit
Also they're a logical indentation character. One indentation character equals one indentation level. No possibility for partial indent levels sneaking in.
Uristqwerty@reddit
Don't dismiss partial indentation until you've tried putting labels on half-indents! Gives
switchstatements a far more readable silhouette. Ideally it'd be a presentation option in the IDE rather than whitespace characters on disk, though.endgamedos@reddit
The real solution is "tabs for indentation, spaces for alignment", but you'll never get everyone to write invisible characters correctly.
FyreWulff@reddit
This is my biggest observation of tabs vs spaces debate.. if you use tabs you can change the spacing to your liking without reformatting the code. Best of both worlds. You want single space indentation? Sure. 4 space? Go ahead. 10 space because you have an ultrawide? sure why not.
xtravar@reddit
4 spaces is superior because it makes (cyclomatic) complexity more of an eyesore. Just my opinion, man.
loup-vaillant@reddit
Hmm, can’t the same argument be made for 8 spaces? 16?
WellMakeItSomehow@reddit
Of course. Linux uses 8 characters tabs.
xtravar@reddit
Maybe, but there are diminishing returns and it runs counter to the 4 default now seen everywhere.
The international tab standards committee lowered it from 8 due to modern programming languages' safety and verbosity when paired with the advent of 16:9 monitors.
Firepal64@reddit
Every line should have its own monitor to be displayed on. That's how you catch bugs
loup-vaillant@reddit
Use tabs.
Tabs are better than spaces, because the reader can set the tab width to their own preference. Sometimes it is an accessibility issue: depending on one’s visual disability, they might need different tab width. As for the blind, tab is only one character, and a clear indicator of indentation if you do the sane thing and keep using spaces for alignment.
And of course, when you use tabs, you can just change the width, if and when your personal preference ever changes.
There are two minor downsides:
You need to make sure your code still looks pretty under different tab widths. It hardly changes anything in practice, but you do have to mind a couple edge cases.
You need to chose how many spaces tabs are worth, when setting your line length limit. And you need to document that choice. People can chose whichever tab width they prefer when reading your code, but they do need to know how much spaces a tab is worth if they want to contribute.
BlueGoliath@reddit
Have you tried writing it in Rust.
xoner2@reddit
I downvoted you for the lolz...
BlueGoliath@reddit
A furry!
IllustriousBeach4705@reddit
I am pretty sure they actually did. But it is being removed.
steveklabnik1@reddit
As an http backend is being removed, there is still the options for Rustls and QUIC to be used via Rust implementations.
levodelellis@reddit
Sometimes I wonder if I should write a post on how I write C++. Other days I assume everyone will be upset because I'm writing about C++
levodelellis@reddit
Fil-C runs Curl!