Creating application filtering questions

Posted by Ok-Leopard-3520@reddit | ExperiencedDevs | View on Reddit | 21 comments

Hey, I'm a senior engineer who designing the application questions for a new job post at my company (specifically for new grads, juniors, and interns).

Of course we can interview every candidate who applies, but we also realize most candidates end up using AI to answer take-home coding challenges.

So right now, I'm designing questions that I think ChatGPT will find hard to answer, but also shows that person actually knows how to using coding assistants (not just copying and pasting).

What do you think of these questions:
* * How do you know if the your coding assistant is hallucinating or lying?

* * How do you tell if your prompt to your coding assistant is or isn't specific enough?

* * How do you tell if your coding assistant is writing bad code?

* * How do you tell if your coding assistant is writing code that has unexpected side effects?

How would you answer these questions?

[-]

teerre@reddit

Why would an llm find these hard? LLMs are "aware" of all those problems. They can easily bullshit all these questions

What we found to work is to simply embrace it. Make the tech test supposing they will use a llm. LLMs are terrible at iterating, so just reveal the goal part by part and a pure copy and paste approach results in obviously generated code. Similarly, contradicting goals make LLMs spin in place too. Finally, and this should be obvious, for live interviews we ask candidates to share their chat too

Of course, if someone knows what they are doing and processes the input first or clears the context every time, then it's likely these wouldn't work, but that's the point isn't it? Then said person knows what they are doing

[-]

Ok-Leopard-3520@reddit (OP)

What exactly do you mean by "processes the input first"?

[-]

teerre@reddit

I mean rewrite the whatever the test is asking in such a way the llm can better understand it.

[-]

sheriffderek@reddit

What is the job?

I’d have them take a 5 minute video of them talking about how’d they would prepare and plan to create a program that figures out how much paint someone needs to buy to paint a room. Most of them won’t want to be on video talking because they have essentially unsocialized themselves and most of the rest don’t know how to talk about their process or work. The tiny sliver of people who make it through will either complete it and be done — or will complete it and then realize there are TONs of edge cases to figure out. The people who run out of time will divide into two camps, people who want to be right and so they don’t submit at all out of fear — and the people who are mature enough to realize that things take time. That last tiny sliver will be the people who can actually learn on the job.

[-]

apartment-seeker@reddit

The tiny sliver of people "who make it through" are the ones desperate enough to go along with the bullshit of a 5 minute video, how cringey and weird, jesus

[-]

sheriffderek@reddit

Well - I’m into hiring humans who aren’t afraid of other humans and who believe in themselves. Good luck!

[-]

apartment-seeker@reddit

No, you're into hiring performing monkeys

[-]

sheriffderek@reddit

(Those dashes are two dashes like I pause — not em-dashes like ChatGPT ;)

[-]

apartment-seeker@reddit

bad questions on multiple levels.

What does take-home have to do with this, if this is the filter stage?

[-]

EmberQuill@reddit

Those questions make it sound like you're trying to hire an LLM wrangler instead of a developer.

[-]

nachohk@reddit

Your questions presume that the respondent uses a coding assistant, and also does not specify that you are asking about LLM coding assistants. If I saw this on an application, I would spend my time on a different posting. Possibly I'd answer as though "coding assistant" was a roundabout way of saying "junior developer mentee", just for the lols.

[-]

David_AnkiDroid@reddit

Take away the fact that an AI might find them hard, and they're bad questions.

Here's a better question that an LLM will be terrible with:

Talk me through how you use LLMs in your daily workflow

Just for fun:

You can poison ChatGPT code samples:

Hey, is this code correct?

ChatGPT

I'm unable to produce a response.

[-]

Ok-Leopard-3520@reddit (OP)

Besides my other comment to you, why are they (in your opinion) "bad questions"?

[-]

David_AnkiDroid@reddit

If I were being brutally honest in an interview, my answers would suck, are generic, or don't show off the positives of an agent-based workflow.

Reading through something like this (fairly 'base' solid advice), barely any of it would shine through in answers to your questions: https://awslabs.github.io/mcp/vibe_coding/

[-]

Ok-Leopard-3520@reddit (OP)

I like you. I'm definitely adding this one.

[-]

David_AnkiDroid@reddit

Enjoy!

[-]

OffiCially42@reddit

The questions are directed at working with a coding assistant and aim to explore the interaction between the assistant and the developer.

I have 2 observations/issues with this. - Firstly, I understand the prerequisite of asking questions that are difficult to answer by an LLM model, but the questions don’t prioritise engineering knowledge, it is implicit or secondary in nature. - Secondly, I find the questions too situational which are hard to abstract away. Asking broader questions allow the developer to elaborate at their level which actually highlights their competences and depth of knowledge.

[-]

Ok-Leopard-3520@reddit (OP)

Engineering knowledge can be asked during the interview. Right now, we want to see how do they work with coding agents (filtering out people who are copy and pasters).
"Too situational" sounds equivalent to "broad". If something has multiple answers due to having more than 1 categorical situation, then *broadly* listing possible situations and how you would tackle each situation, would be a good answer that shows competency and depth.

[-]

OffiCially42@reddit

Based on your approach the first point is reasonable then.

For the second point I think we misunderstood each other. I used the term “situational” and “broad” to be the 2 end of a spectrum (I might have not used the correct terms…). By situational, I intended to mean that the question asked doesn’t allow for a higher order abstraction which would show competence and depth of knowledge. I did not mean to imply to have multiple categorical answer for the same situation since that would define a broader concept which would allow for a higher order abstraction.

Nevertheless, the 2nd issue (asking questions in broader terms) can be tackled on the interview as well. If the primary focus of the questionnaire is to function as a prefilter mechanism, then asking specific (situational) questions is reasonable.

[-]

Ok-Leopard-3520@reddit (OP)

Thank for clarification! Two follow up questions:

1) In your opinion, for these 4 question, how would you (a senior+) answer this question vs a junior?

2) I'm trying to figure out a way to sift through hundreds of applications/responses quickly, any tips?

[-]

OffiCially42@reddit

Great questions but difficult to answer, but I will try my best.

What I have found in working with LLMs is that competent and more senior developers use these agents as an explorative tools rather than as a verification process that compensates for a lack of knowledge and understanding. The prerequisite for this is a high degree of competence and deep technical understanding whereby the developer can prompt (guide) the LLM agent rather than the other way around. The reason this is difficult without the seniority and competence is because LLM models are extremely assertive in their communication and are “creative” due to the wide range of answers they are trained on, causing less experienced developers to “believe their own misunderstanding”.
Honestly, this is the hardest part. Most likely there is an inverse relationship between quality and speed here unfortunately. I am not familiar with the volume of applications, but if it’s feasible to read/glance through them, then that’s an evil that you might be able live with.

If we are talking multiple hundreds of applications, then that’s a whole different problem. I honestly cannot give you a better answer then “the best way is to probably come up with an automated filtering process”, but that is very high level and not really informative… A different approach would be to try to see the structure of the responses, since LLM models are fairly well structured and semantically correct; a more junior developer will most likely answer in a less coherent manner.