Why are SQL, HTML, and JS prone to injection while C, C++, Java, and Python aren't ?
Posted by Possible-Beyond6305@reddit | learnprogramming | View on Reddit | 33 comments
Why are SQL, HTML, and JS prone to injection while C, C++, Java, and Python aren't ? What structural flaw makes them so susceptible ? I've received conflicting AI answers and need a definitive technical explanation. Someone please help !
Pyromancer777@reddit
All injection attacks boil down to "a user is inserting something that the code is interpretting as more code", that's where the "injection" term comes from.
This means frontend vulnerabilities can lead to higher rates of injection attacks from not properly sanitizing user input. This can happen in form fields, API calls, XSS exploits where users use something like a 3rd party chatlog to insert malicious code, or can even be done via URL inserts (this is usally just a client-side injection that doesn't always propogate to actual servers).
Any language is susceptible, but the common denominator is that there needs to be a way for the end-user to actually insert some code and be interpreted by the underlying scripts as valid executable code.
For example, if you have a non-sanitized form-fill, and your backend API leverages SQL syntax, a SQL injection can cause database info to be leaked once the form is submited to the API. However, if that same vulnerable form-fill was using a different language for handling their API requests, the other language wouldn't necessarily register SQL code as actual code and nothing malicious would happen from the attempted SQL injection.
WystanH@reddit
This feels obligatory: XKCD: Exploits of a Mom.
The the source of a compiled language isn't available at runtime. You can't inject anything into C, you're instead attacking the runtime it produced.
If you can see the thing that's going to be executed, the source code, and can manipulate it before it runs, that's injection.
If you hear about things like buffer overruns, that could be C or C++ or anything else that ultimately produces machine code. Java and Python are more p-code and are exploited differently.
Alive-Cake-3045@reddit
The framing is off, C and Python are absolutely vulnerable too, buffer overflows in C are the same class of problem. SQL, HTML, and JS get hit more because they mix instructions and user input in the same string. The parser cant tell where your code ends and their data begins. Keep data and instructions in separate channels and the problem mostly goes away. Parameterized queries exist for exactly this reason.
syklemil@reddit
Yep, C has some known bad functions like
gets(see e.g. SO on why gets is considered bad).Input & string handling in C has a history full of incidents, leading to stuff like MS' banned.h and git's banned.h.
Alive-Cake-3045@reddit
Yeah exactly, gets is the classic example of what happens when a language trusts the developer completely and the developer trusts the user too much. The banned.h files are a good read for anyone who thinks this is theoretical, real co debases, real incidents.
syklemil@reddit
man 3 getsis pretty funny too, as it's mostly yelling not to use the function and telling the reader how bad it is. It is, fortunately, not part of the standard since C11, but people can still unlock the problem by choosing to compile with, say, C89.Possible-Beyond6305@reddit (OP)
Hello Alive-Cake-3045, thank you for the detailed answer. Why do SQL, HTML, and JS mix commands and user input within the same string ?
Alive-Cake-3045@reddit
Because they were designed to be written as strings from the start.
SQL is just text you send to a database, HTML is just text a browser renders, JS is just text an engine executes. When you build that string by concatenating user input, the parser has no way to know what was yours and what was theirs.
C has the same problem with memory, Java and Python just handle memory for you so that specific attack surface shrinks.
Majestic_Rhubarb_@reddit
It’s not that SQL is prone to injection. A c++ app takes user input and builds some sql statement naively assuming the user always types in expected content.
If the user guesses that sql is being used under the covers, they could type in specific constructions that would change the statement and do something nasty.
It’s the c++ app that is prone to injection.
Ninchad@reddit
C/C++ Python are prone to format string vulnerability code like
print(unsanitized-user-input) can be exploited to achieve arbritary code execution.
Infinite2k@reddit
It’s mainly down to the mixing data and instructions. SQL injections happen when data isn’t sanitised and are misinterpreted as SQL instructions. XSS is the same thing, where JS is loaded in through HTML, the browser will load all the scripts it sees inside the page whether it’s meant to be there or not. Languages like C and C++ are actually prone to injection too! Look up buffer overflow attacks. Python prevents this by having memory safety but can still happen if ‘exec()’ is used, which runs strings as python code.
Possible-Beyond6305@reddit (OP)
Hello Infinite2k, thank you for the detailed answer. Why does the issue stem from the mixing of data and instructions (commands) ? How are data and instructions mixed together ?
Infinite2k@reddit
At a low level, there is no difference between data and instructions, it is all just binary after all. For this reason, a computer cannot distinguish between the two. This has been a problem for a very long time and the abuse of this is called arbitrary code execution (ACE). If a malicious user is somehow able to inject their own computer instructions, they basically have complete control over the machine. Modern systems are usually more robust against these attacks with advancements such as memory safety however poorly designed systems and programmer misakes still allow these vulnerabilities to be exploited.
As an example of SQL injection, if we are using an email address that has been given by a user, a naïve programmer might construct an SQL query to retrieve the user data like this:
sql = "SELECT * FROM user WHERE email = '" + user_email + "'". The problem is that in trying to insert the email address into the SQL query, the user input has now become part of the SQL query. A malicious user would be able to use this vulnerability to insert their own SQL statements into the query, which would allow them to take control over the database and steal information. Because the SQL is sent as a single string to the database, the database can't tell the difference between the user input and the instructions, so it will just accept whatever is given.This could be considered a bit of a design flaw of SQL considering that it is such a common mistake to make and has lead to many companies getting hacked.
These kinds of injections vulnerabilities are generally prevented by keeping user input seperate, and in situations where they must mix, you sanitise your data by removing unsafe characters and enforcing limits on the length of the text.
In SQL, the safe approach is to use parameterised queries. This is where you put placeholders into your statement and send your parameters separately.
Monster-Frisbee@reddit
Yeah, buffer overflow attacks are some of the oldest types of injection even back to early assembly languages. Plenty of classic game consoles were also able to be hacked this way.
Todo_Toadfoot@reddit
Log4j enters the chat. Giggity.
dafugiswrongwithyou@reddit
I think a good way to answer that is to focus in on how one of those things happens, because it may be illuminating.
So you have a SQL table. It supplies information from a table called "custs". So a routine to pull out information for one customer might be;
SELECT id, forename, surnameFROM custsWHERE custs.id = 5;This will show you the customer details for customer 5; put in a different number, you'll get a different customer's details.
Then you, a beginning web developer, are making a web interface to show information; you have a field where people can put in a number, and you want to display details for that customer. The quickest and most obvious way is just to have a string of text which is that routine above, but where "5" is, you swop it out with whatever they put in the field, and then tell the SQL server to execute that string as a command. So if they type "10", it'll go away and pull that information for the customer with ID 10.
You have just opened your server to SQL injection.
The problem here is; what if the user doesn't type "3" or "15" or "94236" in that field? What if, instead they type something like;
1; TRUNCATE TABLE custs?Well, now, the string your routine makes, and so the routine the SQL server executes is;
SELECT id, forename, surnameFROM custsWHERE custs.id = 1; TRUNCATE TABLE custs;The server will go ahead and return the customer details for customer 1, and then wipe the custs table of all data. Because the site, which is set up with permissions to access it, told it to.
The trick here is that isn't a flaw with SQL, or HTML, it's a flaw with how the command given to the server was created. There should have been permissions limitations to prevent the web process from being allowed to do that, and/or sanitisation to prevent invalid inputs, and/or the use of SQL parameters rather than bare SQL. With SQL parameters, the "structure" of the SQL is sent separately from the variables and the server assembles them itself, with the understanding of what each should be. In this case it would understand that
1; TRUNCATE TABLE custsshould be treated purely as a customer id to search for, not more code to execute, and would simply fail to execute or return no results.So, why aren't C, Python etc prone to injection? Mostly just because they're not used in ways that allow it. If you built a website where the user could type in unsanitised text which would then be inserted into a piece of python code and executed on the server, that absolutely could results in "Python Injection", it's just not very common for that python to be used that way.
theLOLflashlight@reddit
I'm not sure if python belongs there or not, but the difference can be summarized by the fact that half of those languages are interpreted while the other half are compiled. It's much more complicated than that, but that's your high level overview.
grtbreaststroker@reddit
I’m not a cyber security expert so maybe there’s another injection you’re referring to, but I can confidently say python is prone to SQL injection. Use prepared statements for anything that leaves your machine
aneasymistake@reddit
Python is prone to SQL injection in the same way that C++ is prone to SQL injection. If you use either language to execute SQL queries that you’ve constructed using user input, and if you don’t handle that properly, then you can get in a pickle.
Choice_Supermarket_4@reddit
I can confidently say that python isn't prone to SQL injection.
zeekar@reddit
It's semantics/jargon. Injection attacks can happen any time code is constructing and executing other code at runtime. That's more likely to happen with some technologies than others. When dealing with a SQL database, you construct and execute queries at runtime - and SQL queries are code. Dynamic web pages construct HTML for the browser to render - and that's code.
I don't know that JavaScript belongs on the list, because that's JS attacks aren't usually about getting JavaScript to execute other JavaScript, but rather adding your Javascript to HTML so it gets executed.
You very rarely have C, C++, or Java code that is constructing C, C++, or Java code and then compiling and executing it. So that particular type of attack doesn't apply.
Python can construct and execute Python code more easily because it's interprted rather than compiled, but it's still not a common thing to do, so you don't see injection attacks.
That said, there are a number of exploits that let you "inject" arbitrary code to be run; they're just not called injection attacks in that case. They're buffer overflows, remote execution, etc. But the basic idea is much the same: sneak your code in somplace the computer will execute it inadvertently.
TechBriefbyBMe@reddit
SQL and HTML are interpreted languages where user input becomes code. C and Python compile/execute separately so injected text just stays text. Your AI was probably arguing about different things lol.
divad1196@reddit
The statement is wrong.
If you use
systemin C/C++, java or python you could have shell injection. If you useeval/execin python you can have RCE. Log4j in java allowed code execution. Buffer overflow in C/C++ can be used to inject a stub as well. In any of them, you can have an SQL injection the same way JS would. Etc..Some exceptions are specific to the web, like XSS/CSRF, but if you have a stored XSS, is it the fault of JS in the browser that your python backend didn't sanitize it?
So no, they are not more prone to injections.
Living_Fig_6386@reddit
SQL is prone to injection when someone passes non-validated queries directly from third parties to the SQL interpreter -- it's not a matter of the language, it's that people just have a bad happen of forwarding user input.
HTML isn't prone to injection as it's just a text markup and doesn't execute any code or anything.
JavaScript is similar to SQL in that you can just pass on user input for the interpreter to execute, but there's not many cases where there's a reason to do so like SQL. If you are talking client-side JavaScript, obviously the person with the web browser can do anything they like in JavaScript and fiddle with the code in the browser. That's not injection, just control over the execution environment.
C is translated to machine code. It doesn't interpret any input and execute it naturally, you have to go out of your way to have it execute things in its environment. That's not to say you can't do it, it's just more difficult. Same with C++ and Java.
Poorly written Python code can execute user input, or it can run user input in a shell, but generally the programmer needs to be explicit in doing that to user input.
Maggie7_Him@reddit
From HTTP automation and scraping work — HTTP itself has this exact problem. CRLF injection exists because the HTTP spec uses \r\n as delimiters between headers. If you reflect user input into a response header without stripping carriage returns and newlines, an attacker can inject additional headers or split the response entirely. Same root cause as SQL injection, just one layer down. The language doesn't protect you there; Python will happily relay whatever string you pass to the response headers. The recurring pattern is always: wherever data and control signals share the same channel without structural separation, injection is possible.
sessamekesh@reddit
The category of issue that you see with HTML, SQL, and (less often, but still realistically) JS is that the line between "code" and "data" is pretty blurry unless you're really careful.
Because of that, you can put code somewhere that the programmer expected data, and the environment (browser HTML parser, JS engine, SQL query engine, etc.) will happily execute the instructions.
The answer is to carefully separate code from data, and to be extra skeptical of any user-influenced (especially user-input) data. This is pretty easy to do in C++, JavaScript, and Java, decently easy to do in C, but still requires a bit of thought in HTML and SQL. Generally speaking if you're following best practices and/or using industry standard tools, you'll be fine. Generally.
It's technically possible to achieve the same thing in C, C++, and Java, but usually much more difficult. From your CPUs perspective, code is data, so it's possible (but usually hard) to convince your CPU to start executing commands in an unexpected place that an attacker can modify.
If you want to jump down a really fun rabbit hole, start Googling "Arbitrary Code Execution (ACE) in speedrunning". Speedrunners (video game hobbyists who complete video games as quickly as possible) have relied on the same category of bug by intentionally corrupting certain portions of memory and then causing certain execution branches to hit those segments.
taedrin@reddit
What makes you think that Java, Python, C and C++ aren't vulnerable to injection attacks? Depending on how you use these languages, they can absolutely be vulnerable to an injection attack, just the same as SQL or Javascript.
Here's a trivial example of a Python program that is vulnerable to injection attacks:
C/C++ and Java are "less vulnerable" because they don't support evaluation/execution of user input out of the box, but it's still possible for these vulnerabilities to sneak in, especially if you use libraries. As an example, Java's Log4J library was infamously vulnerable to log injection attacks.
PalpitationOk839@reddit
It is not about which language is safer, it is about how code is executed. When systems treat user input as runable code or queries, injection becomes possible. Proper handling like prepared statements and sanitization prevents it regardless of language.
carcigenicate@reddit
The original premise is a bit off. C and C++ aren't vulnerable to injection in the sense you mean because C and C++ aren't interpreted. It's highly unlikely that user input into a C program will be successfully compiled and then run by accident. Exploits like buffer overflows can be exploited to achieve a similar result, but that doesn't seem to be what you're referring to. Python can be vulnerable to injection attacks, though, via calls like
exec. Because it's trivial to directly run Python code with user input, it's also trivial to introduce vulnerabilities where user input is run as code.That said, it's also not difficult to naively inject user input into an
execsystem call in C and give the attacker a shell. Any time you're inserting user input in a "trusted" context, you're introducing a potential vulnerability.AssiduousLayabout@reddit
It's not so much a technical flaw as it is how the technologies are used. Injection attacks can occur when you combine user input (or rather any kind of external input) with your own code. For example, a SQL query that includes an entered username as part of the query, or this very website, which displays user-entered text within a page served by Reddit.
Any time you permit external input to form any part of the code that is executing, or the content being displayed to your users, you have to think about how you can prevent malicious input from hijacking the intended behavior of your application or site.
Interpretable languages are much more vulnerable to this than compiled languages.
rooygbiv70@reddit
It’s not that those things aren’t susceptible to injection, it’s just that the access patterns less frequently allow for it. You’re not as likely to find a mark that’s blindly taking Java/C/C++ from a user, compiling it, then running it. If you did, however, you’d have a bona fide injection site. On the other hand, SQL and JS are more commonly used in such a way that input data is being interpolated into queries/scripts, so you get those attack vectors as a hallmark of insecure code.
groogs@reddit
All 3 are often built dynamically as a string, that eventually ends up being executed. They get user data mixed in, and now it's easy to escape and do whatever you want.
SQL most commonly is executed on the database server, but the results are used by the app making the call. Simple example is adding to a user login query to make it select the admin user.
JS is almost always executed on users browsers, but there it can be used to add a script that steals session cookies or credentials (by sending them to another server the attacker controls).
HTML is mostly just exploited to inject JS.
Everything else you mentioned is executed on the server, and typically you don't take strings and run them as code there.
I-Am-The-Jeffro@reddit
Compiled languages like c, c++ and Java cannot practically be injected into. Non compiled run time plain text scripts that aren't strongly typed are extremely easy (in a relative sense) to inject malicious code into.