Log Statements Burn my Eyes. Code is Ugly and Hard to Read
Posted by frompadgwithH8@reddit | ExperiencedDevs | View on Reddit | 115 comments
Our department is using the strangler pattern to modernize a legacy monolith. My team built a domain service. Other teams did too.
Merges to main automatically deploy to development. We must manually bump git tag version numbers in .yaml files to make helm/kubernetes deploy to staging or prod. Sidebar: promoting takes most of my day (when I am tasked with it) and I hate it. So many things must be manually checked or we deploy a crashed service. It is stressful. Our department will improve its processes someday… I wish that day would come sooner. When I return from vacation I’ll write some new argo/kuberentes job to prevent argo from deploying broken services. Such a facepalm Homer Simpson “D’oh” situation is preventable, yet we haven’t… ANYWAYS.
We use open telemetry for logging and metrics.
It’s been many months.
I’ve noticed the other teams don’t seem to care about code quality. Nor mine, actually. It’s frustrating. But that’s besides the point.
The only way for us to know what went wrong in the cloud (dev/staging/prod) is to look at logs. So we’re all getting better at log querying.
I’ve noticed I seem to be in a minority at my department when it comes to my approach to testing. I prefer to set up the systems I’m working on locally with Docker in shared network mode so they can communicate through their own queue topics and make RESTful API calls to each other. I’d rather not develop the feature one repo at a time, push it up to the cloud after going through 2+ PRs (and we enforce multiple approvers so getting PRs through can take half a day or more) only to find I made a mistake and I need to do more work on one or the other repo. Again, that’s now how I do it. But this is how almost all the engineers in my department do it.
Don’t get me wrong there are engineers whom I truly respect. They’re stellar engineers I aspire to be like someday. I’m building a homelab this weekend to learn all the kubernetes stuff myself, since I just don’t get exposed to it enough on an application domain team. Although I am super looking forward to patching up our promotion pipeline failure point w/ my argo improvement next week. But getting back to the point: I did notice these ultra-high level engineers do the same as me; they’ll set everything up locally. So I know I’m not crazy. Not that I ever doubted my sanity…
The original topic of this thread though:
My team is putting logs everywhere! It’s insanity! After practically any branching logic there’s a log! Before return statements a log! At the entry point to API invocations (controller methods) there are multiple logs! And the logs take multiple lines of code. I’ll read a method that’s 20 lines long and there’s, what, 4 blank lines, 4 lines of code and 12 lines of logs. That’s an example I made up because I’m not on my company computer looking at source code right now but it’s basically seared into my brain… Just imagine you gave two if statements, both following the guard pattern, short circuiting an early return. So that’d be like 3 LOC per if statement (if (thisGuard(…)) { return; }) and then maybe a return true at the end. So 7 lines of code for some typical guard pattern code. But now there are two 3 line log statement before both returns. So now we have the method body going from 7 to 13 lines of code. It’s basically doubled! And that’s what reading my team’s code feels like! I hate it! It takes me so much longer to understand the code I’m reading.
Another example: I’ll read a method and only one out of ten lines actually mutates any data or has any effect on the program. Everything else is just logs. Now imagine all the methods are like this. It makes code review so hard because I am spending mental energy wondering “ok is there actually any business logic happening here or it is all just logs?” (Sometimes it’s all just logs!!!)
So yesterday I pulled one of my coworkers aside after I reviewed a pr they wrote, specifically to talk about basically what I’m talking about in this thread. I voiced all my concerns here. My coworker listened. Then they made the great point: “who cares about how hard it is to read, compared to being actually able to debug stuff that breaks in prod?” And I had no answer. My coworker agreed that it’d be nicer to read the code without all the log statements shitting it up. We both agreed. And I agreed with my coworker that being able to actually “debug” stuff (via logs) in prod is waaaaay more important.
So I am at a loss. I decided I might write a custom plugin for my editor of choice that automatically collapses log statements, just so that when I am reading code on my editor I can process it better. I already know major IDEs support plugins integrating with services like bitbucket or GitHub to do things like PR review. So I could potentially move more of my workflow off of websites and into my IDE. Another solution would be a custom formatter that forcefully ensures log statements only take a single line even if it’s 400 characters long (lol).
But ughhhh, brother ughhhh, just ughhhhh!!!! What do you guys do about this? I was thinking maybe there’s a way to get log statements out with meta programming or annotations or some other trick/hack/wrapper approach so we can get the logs we want to debug stuff in live cloud environments without littering metric tons of log statements all over our codebase.
The other teams, btw, apparently don’t have logs at all, and purportedly discovering “what went wrong” with their services is a “shitshow”. I recall back when i was learning how to program, reading about how in elixir/erlang if a process crashed you could remotely ssh into a debug session live in that thread to inspect the state/variables. That’s absolutely not what my team/department needs but it’s cool if for anything but to demonstrate that I just feel it in my jellies (was Ryan his dad the whole time??) that there’s gotta be a more clever way to approach “figuring out what went wrong in the cloud”.
And that’s what the logs are for, remember. To figure out what went wrong in the cloud. When customer X encounters a bug doing Z, we look at the logs and try to figure out if something errored out, or, also very commonly, just didn’t get called at all. For example, oops, we never updated system Y to publish a message to the event queue. So system G never received the message, which explains why the customer’s pet never got a new fur dye job! All we gotta do is make sure system Y publishes event “dye dog fur”. (Example out of my ass).
How do you guys solve diagnosing issues in prod/stage/dev? Are your codebases shitted up with logs too? Do you take a fundamentally different approach that sidesteps this entire pattern/problem?
rover_G@reddit
Log levels should help here. What you described sounds like a lot of debug and even some trace logs.
When I run a project locally or look at remote environment logs I filter to the level that suits me. Usually info unless an app has particularly noisy logs at that level.
Cell-i-Zenit@reddit
This is just wrong.
Guess which code produces more errors? Readable or unreadable code? LOL
How often do you have prod issues that you can only "debug" them via millions of logs? Logs are pretty expensive at that scale btw.
If you use java, you could look into the open telemetry agent, which is "logging" automatically everything you do in your application with minimal visual clutter.
My advice here is to look into open telemetry
edgmnt_net@reddit
Also modern debuggers should be able to do some sort of less-intrusive debugging, like tracing all calls without stopping the program, possibly while aggregating and showing some extra data.
The_Northern_Light@reddit
Sounds like an education problem? Start mentoring people and teach them how many problems it solves
frompadgwithH8@reddit (OP)
I use jetbrains
We use open tel.
Cell-i-Zenit@reddit
do you use java? If yes checkout https://opentelemetry.io/docs/zero-code/java/agent/
angelicosphosphoros@reddit
The stupid part would be if some business logic is evaluated wheb running log statement argument evaluation.
CallinCthulhu@reddit
Does this coworker know that logging isn’t free?
bakingsodafountain@reddit
Logging is an art. Log just enough that you can figure out what happened, don't log too much.
I put logging around critical decisions in the code, or some areas where it would be very hard to build a metal model of what happened without a log.
Logs just need to give me enough of an idea about what went wrong and where and so I can figure out how to reproduce it. I practice TDD so from there I'll just reconstruct the failure in a test and debug the test case to get more information. That test case then gets locked in with the issue fixed.
All my systems are pretty much event driven so it's generally pretty easy to pull the events, see their data, and reproduce the issue.
Generally so long as I'm logging the inputs and outputs from every API call I can do this. I don't need many more internal logs (though having the right amount of internal logs does makes it easier).
Another thing I do in my projects is heavy integration testing. When I'm running integration tests (my preferred way of reproducing bugs too), I will try and rely on logs as much as possible before stepping into the debugger. It helps me find a balance of what logs would be useful in production to diagnose issues and what is irrelevant.
Finally, I tend to start with less logs and add them as time goes on. If I struggled to debug an issue because of having less logs, I'll add more. If I struggled to debug an issue because the logs were full of junk I'll delete or modify logs.
frompadgwithH8@reddit (OP)
Interesting point about extracting queue messages to recreate issues. My department should do that but i haven’t heard of any of us ever actually accessing our dashboards in the cloud for our queues. I don’t think they get deployed actually; they’re just in our local machine docker compose files.
You’ve helped me with a solution here. If i can get our queue web dashboards to be accessible in our cloud environments, us engineers can extract the queue messages that created bugs and then we can recreate them locally without bringing up all the systems.
Aside from that: we have no integration or e2e tests. We have unit tests but they aren’t worth much.
Do your integration tests involve side effects? Like db calls or api calls? Or do you mock/stub those out for the tests?
bakingsodafountain@reddit
My integration tests use an in-memory alternative to external resources. So an in-memory database, the mock consumer/producer for Kafka, static reference data where appropriate.
I leverage dependency injection for my project (Java) so have a strict rule to specify any external dependency implementation behind an interface and in a separate module. In the integration tests I use all the same bindings as production except the external ones, then bind specific (usually specially crafted) implementations for external resources.
My integration tests are my most valuable resource (complex distributed system). I could write an entire essay on this area alone. I invested significant resources early on in the project, and continually since, so that my integration tests provide an "as close as possible" representation of production, be fully deterministic, and be fast. We have close to 1000 integration tests on my project now and the value they provide is immense (these are in addition to unit tests which are reserved for testing more discrete functionality).
ZunoJ@reddit
Log whats necessary to replay the events in a dry run and in case of an error the exact error. Everything else is noise
okayifimust@reddit
He's very wrong on multiple levels.
Logging shouldn't reduce readability. Almost nothing should ever reduce readability. If you can't log whatever it is you want to log, in whatever way you want to log, you have a skill issue.
You don't need to place logs around every control statement in order to be able to debug production issues. That's not just a skill issue, it is outright amateurish.
If you log your inputs, you can reproduce what your code did in reproduction. During reproduction, use proper debugging techniques - break points and stack traces rather than log spam - to follow the path your code takes, and to understand what happens to your data.
So?
You're using code, not magic. What happens is deterministic given the inputs. You shouldn't be having any problems here, and if you don't have the confidence that proper processes will do the trick, things are very broken. Production issues shouldn't be special, certainly not frequently so.
gfivksiausuwjtjtnv@reddit
Write a vscode extension to hide log statements. Then quit because you work with fucking morons
wrd83@reddit
Invest in service tracing. Invest in metrics. Invest in improving logging, if your language has exceptions throw and log only on the outermost layer. If you queue, invest into dead letter queueing, poison pills. Have recovery processing in the dead letter queue.
Make sure you do proper input validation at the edges of the system, reject early throw bad data back to the user. Bad data that does not enter the system reduces complexity of error handling later.
If there is lots of batching, try to remove it if it does not impact performance.
Debugger in prod is often a no no anyways.
Reduce the pipeline depth, make sure the requestor gets the error through the system, not by bothering engineers to go log diving.
Treat tests like code, maximum impact with minimum code.
It's a long path. Conserve your energy and maximise long term momentum over fast results.
I hope it helps.
frompadgwithH8@reddit (OP)
Thank you that is helpful. I think I’ll have to read your post a few times to really break it down.
I think the number one issue is that the legacy system does not publish events to the queue system when it should. And so the new distributed system doesn’t get the events. And so things don’t occur and the parity between the legacy system and the new system gets broken. And then we have to figure out why the parody gets broken.
It’s almost always because either an event didn’t get fired off that should have been, and so we have to go into the legacy system and add some new lines of code that just fire an event…
Or there was an error somewhere upstream, and that stopped an event from getting to another service…
We do get your traditional bugs where our own application code just has issues in it unrelated to distributed system design
But distributed systems problems is a major problem here
And so logging doesn’t really solve the problem… Because it’s an absence of code that’s running and an absence of logs, that’s how we identify a lot of problems
wrd83@reddit
Make a second queue for each input queue, if you get an error condition throw an exception. throw it far far.
There should be a handler per worker application, if that handler is reached: put the message in the second queue.
Monitor the second queue and if something lands there, make a test with that message.
Find the error why it lands in that queue. If the error is valid, find where it should go. Otherwise fix the code and write a regression test with anonymous data.
Run the test suite for every merge to master.
frompadgwithH8@reddit (OP)
So you’re saying capture every error in a queue and ensure each error in the queue becomes a ticket and gets a fix, eventually, along with a test?
I guess the advantage of this approach is that literally every error gets captured and has to be dealt with? Whereas with logs we can miss errors?
wrd83@reddit
Alert for error queue >0 item timestamp older than processing SLA (e.g. 5 hours).
With this you can set a person as the bug hunter of the day, and he can do a batch of queue items and send out fixes periodically (advantage 1)
Standard implementation puts the items in the main queue a couple of times (3-10) in case the error was transient. (Advantage 2 )
But that assumes you classify errors as transient or permanent.
throwaway0134hdj@reddit
Learn the debugger. I know everyone hates it but learning it is so much better than print statements.
FrenchFryNinja@reddit
Log channels and logging levels help a lot.
That way you can look specifically at a log for a certain functionality, and you can activate or filter viewing it by severity (log level)
ryhaltswhiskey@reddit
Do you use logging levels? That's a pretty standard approach.
Your team is absolutely going overboard. If your functions are so complex that you can't deduce what's happening by logging just the input then you need to fix that. If your developers can't figure out what a function is doing by logging just the input, they are dumb and need help.
frompadgwithH8@reddit (OP)
Yeah sorry boss we can’t. Or at least almost no engineer can. To recreate the env involves setting up half a dozen services with databases and event queues. Yes all that can be dockerized but there is only one person working on that and it’s a hard job and this person is struggling and the thing isn’t finished. In the meantime people like me set up subsections of the distributed system concerning our domains and debug that. although i am a minority; it only seems to be a few high level engineers who do this (plus me).
Logging levels are nice, sure, but it doesn’t solve the issue of the code itself being doubled in size by log statements.
ryhaltswhiskey@reddit
Then log the input to the remote call and log the return value of the remote call, etc.
If you are unable to determine what happens in your code when you have all the information about what goes into remote calls and what comes back out of remote calls.... Yeah this seems off.
I deal with a very similar system all the time, I have tons of calls out to various APIs and gql endpoints.
Why would you need to set up other databases to sort out a bug in your code? It sounds like you don't have any idea where the bug is actually happening. That seems like a bigger problem.
frompadgwithH8@reddit (OP)
Yes, you’re correct. The way it goes as a customer will say that something didn’t work right
So then we go trying to figure out why it didn’t work right and usually we will try to recreate the behavior using the same system. The customer used although in a lower environment.
And then we just start looking through logs and we say “hey you know our API didn’t even get called“ and so then we start looking up stream and we eventually identify something like “oh hey this upstream service through an error and never published a message to our downstream service“ or we will find something like “oh the legacy service hadn’t been updated to fire an event into the event queue to notify the new distributed system“
Oh, and I believe we do have all of the information about those what goes into the remote calls because whenever a restful API gets invoked I think we log that automatically although I’ll need to doublecheck
As far as what leaves the system… It’s usually an event to queue a message… And I think we do log all of that. No, I know we do.
However, once a request hits one of our system systems, and before it leaves it in the form of an event queue message, our systems will make API calls to a legacy system. There’s a lot of opportunity for bugs there. Perhaps I might consider that we wrap all of our API calls to other systems with some logging that happens implicitly.
I really like the point that you’re making which is that we should be able to treat these programs like black boxes and we should be able to just say “we know what went in and we know what came out” and so if an error occurred. We will know it was somewhere in that black box.
I think just due to the distributed nature of our systems, there’s a lot of different black boxes, and so it takes some time to figure out which black box had the problem. And usually blackbox “A” had the problem, but it’s actually because black box “B “didn’t correctly invoke black box “A”.
Anyways, I’ll think more about what you said thank you for your comment
ryhaltswhiskey@reddit
Well I appreciate that you're open to constructive criticism here.
It sounds like a messy system. What you said about implicitly logging everything that goes into an API call and comes back out is a smart choice.
I deal with a similar system. And usually the reason that it breaks is because the system that we depend on changed some data in production and didn't tell anybody. So we get a nice clear error message like "selection X is not valid for account Y" or "No accounts found for customer X".
Still, sometimes it takes an hour of investigating to actually figure out what happened because the system we depend on is incredibly opaque and not very good at logging.
frompadgwithH8@reddit (OP)
Yeah sounds like a similar issue to mine. The legacy system doesn’t have logs. Our new systems do. The legacy system doesn’t throw a lot of errors though… it’s pretty solid. Its legacy, it’s been worked on a lot
poincares_cook@reddit
I don't want to be harsh, but this indicates that you've never worked in complex systems, let alone legacy code.
Not everything is deterministic, there are env setups, potential race conditions, retries and timeouts, outside call etc.
ryhaltswhiskey@reddit
The key word in the sentence that you're arguing about was ideally.
flowering_sun_star@reddit
We're talking about logging - we've long since left the world of the ideal and entered that of practical realities.
visicalc_is_best@reddit
Sounds like your engineers are using logs as a basic debugger rather than a runtime diagnostic tool for edge cases. Usually a couple of things going on here:
the team just doesn’t know how to write good tests, e.g. using mocks/fakes/dependency injection
the team doesn’t have any integration or e2e tests, either because the system’s too large or has too many external dependencies for any single team to build an e2e testing framework
if the crazy level of logging is really needed, there isn’t anyone who can write abstractions (eg logging middleware, logging decorators, logging function wrappers) that reduce the need to manually write log statements everywhere
Each of those problems fortunately has a solution built into the description. Go be that person.
frompadgwithH8@reddit (OP)
And yes i believe you’re right about logs as a debugger. It’s frustrating. I get paid less than them too, it’s comedy. Comedy gold
azuredrg@reddit
It's crazy too, debuggers are like magic, you can run like any code in them temporarily in a breakpoint. You don't have to change the logic for logging statement to show different code to test. Just run the code statement you want in the breakpoint. You can watch fields in the breakpoint. Not using the debugger is leaving your machine gun on the table and trying to hit a target 1000 feet away with a pistol.
terdia@reddit
frompadgwithH8@reddit (OP)
Yeah, I might literally give a presentation on how cool using the debugger is and I will have to try very hard not to take a condescending tone and act like I’m spoon feeding retards
flowering_sun_star@reddit
Is attaching a debugger to a live prod instance safe? It wouldn't be an option for us anyway because we deploy without debug enabled. But I would be very nervous about dropping breakpoints into a live system under load. It'll also only work for a single service at a time. Doing it locally with a docker container is fine, but that's not the scenario this sort of logging is for.
My belief is that every branch in your code has a business reason to exist. That business reason should be documented. There's also a practical desire to be able to trace events through a complex system (I've been bitten enough times by the lack of that ability). These two desires can be combined, with log messages that describe the meaning of that branch. So the the code ends up being a bit verbose, rather than just logical statements. I'm fine with that if it means that we know why the logic exists and have that ability to get traces for individual customers through the system.
There are ways to get a lot of useful information into logs without explicitly adding it in your log statements. Our java app uses the MDC to carry information about the executing thread context. So you set details about the calling customer ID at the outer boundary, and that gets carried through to all the logging done by that thread.
Frankly I think you're wrong on this one, and even if you have a point it is debateable enough that it just isn't worth dying on this hill. Better to spend the energy on your lack of testing.
terdia@reddit
You're right to be cautious - traditional debuggers in production are risky. But there are now APM tools designed specifically for this: production-safe breakpoints that capture variable state without pausing execution or affecting performance. You set them through a dashboard, they fire once to capture context, and there's no halt. It's a different model than attaching gdb to a running process.
visicalc_is_best@reddit
Your receptive-to-feedback attitude and reasonable takes in this thread are abhorrent for Reddit.
Revision2000@reddit
Sounds like you first need to have a healthy discussion with the other developers in the team, to see if they also see this problem.
Hopefully that gives you some team support, which you can then use to have yet another discussion with the PO/PM to get the necessary “budget”.
Ultimately any sane PO/PM should recognize that investing in good solutions upfront should lead to product gains and stability long term.
frompadgwithH8@reddit (OP)
Yes, it’s Thanksgiving today so I don’t want to dwell on it too much but I’m making a mental note to devote some time to this in the upcoming work week. I didn’t realize it was such a cultural issue until this thread. I now have several issues I wanted to enumerate in a document that I will present it to my team and my manager.
Firstly, the no format thing in the fact that many people have tried to use a format and all failed. That’s a cultural problem, the fact that no one cares. And the fact that people get denied to use a format over little things like “it should be a standard in the entire department before anyone team does it”. Bullshit.
The fact that there’s logging statements everywhere and it seems to me that people don’t actually use the D bugger built into their IDE. This is a skill issue. I literally gave a presentation on D bugging applications running in a docker container over SSH and almost no one showed up and no one gave a shit.
All of our functions mutate arguments and so I never know if something got changed from one function indication to the next, and you basically have to trace the entire function called tree in order to determine the true program flow. Whereas in functional programming, you can rely on the return value from objects since there are no side effects. And I’m not saying that we should practice functional programming at my job, but the fact that we mutate arguments all the time makes the code harder to understand as well. And I actually received a reply from someone with twice my experience who gets paid way more than me that “cloning objects“ in order to prevent mutating objects would take more computation time. I mean, obviously it would. But who the fuck cares? All of that computation time is locked up in database calls, taking the extra time to clone objects would basically be free. Computers are very fast. This is a given. I’m a little incredulous that someone with twice my experience and honestly, he is a better engineer than me, but I’m a little incredulous that he would even care about something like that. I think it’s a textbook case of a senior engineer, getting locked into their ways and not consider considering fundamental new approaches.
There might be a few other things that are cultural problems. Oh yes. The fact that our deployment or promotion process is extremely manual and error prone oh and also that we don’t have end end tests. I believe that locks down a lot of of our time and also if we had end to end tests, we wouldn’t need as many of these log statements in the first place because bugs would get caught first!
So yes, there’s lots for me to talk about and dissect and it’s all seems to be like a cultural problem It’s like a leadership problem… Leadership should have prioritized and end testing and they should have prioritized some coding standards
aqjo@reddit
n+1: The code has side effects, dependencies, etc. that make it virtually untestable.
azizabah@reddit
You really need to get distributed tracing going. It will change your life.
terdia@reddit
This is the right answer. Once you have tracing, you stop needing logs for "what happened in this request" - you just look at the trace. Logs become for business events, not debugging breadcrumbs. The codebase gets so much cleaner.
terdia@reddit
The core tension here is real: logs are your visibility, but they're also pollution. A few things that helped me escape this:
Structured logging with context propagation - instead of sprinkling logs everywhere, you attach context (request ID, user ID, etc.) at the entry point and it flows through automatically. This cuts down on repetitive "what request is this?" logging.
Distributed tracing - this is the bigger shift. Instead of logging every branch decision, you instrument at the boundaries (HTTP calls, DB queries, external services) and get a complete request timeline automatically. You see what happened without manually logging everything.
On-demand observability - the holy grail is being able to capture variable state in production without adding logs ahead of time. Some APM tools now let you set breakpoints in production code and inspect state without redeploying.
I built something in this space (TraceKit.dev) specifically because I was tired of the "add log, redeploy, hope it reproduces" cycle. But even without my tool, moving toward tracing over logging will dramatically clean up your codebase.
Fair_Local_588@reddit
On the technical front, you should only really add logs for interesting events that will be useful when investigating production issues.
But more importantly, it sounds like this is cultural. If engineers on your team don’t know how to debug code or read through code without logs telling them exactly what happened at every single point, there’s not much you can do. You can attempt to get up skill them but think about the cost-benefit of that.
I’d suggest bringing up the logs as a pain point and feeling out if there’s already support for a change. If not, better to just ride along and move to a different team if you can. You’re not going to change the minds of legacy devs who have had opportunities to learn basic things and decided not to. This is coming from someone who used to work on a team like this.
frompadgwithH8@reddit (OP)
Yes so we do have legacy devs. But we also have new blood. But the new blood is also mixed competency. Someone else mentioned what you’re discussing, which is, this all sounds like a skill issue. And yes it’s cultural. I hadn’t considered that until you spelled it out. Multiple people, all with more political sway than I, have tried to get a formatter and failed. There are probably other things too. currently I am trying to convince my team to stop mutating function arguments and instead make functions/methods return values but the head honcho with twice my experience is against it. I did get the principal from another team to agree with me though so i know im not dead in the water on that front. I also know my team for the most part doesn’t debug with an actual debugger and they dont set up multiple systems on their computers to debug stuff. So it does all seem like a skill issue and a culture issue. Which is making me consider getting a job elsewhere just a little bit. Juuuuust a little bit. There’s room here for me to effect change and get promoted and get paid more. I’m shooting for a promotion in 3 months although i should’ve been promoted this cycle… I’ve been made to learn the business side of things so i can demonstrate the business impact of my contributions so i sound extra like the level i wanna be promoted to. So i downloaded 6 business/product books and honestly im learning a lot. But i didn’t need to learn any of that to understand all the stuff in this thread and i already am one of the top contributors in the department in terms of literally everything. Not just code, im in chat threads helping people who aren’t on my team (and who are), I’ve written countless documents to help people diagnose common issues, I’ve given countless presentations, I am a major contributor to the entire department’s core tech stack and strategy. And yet i am not at the level that 90% of my department’s engineers are. It’s salt in the wound
Fair_Local_588@reddit
Yeah I also had new blood and old blood. It will still trend toward the existing culture. You’ll basically be fighting for inches when you could join another team or another company and be a mile ahead.
zarikworld@reddit
constant logging (beside the cluttering and causing readability issues) is revealing more fundamental issues in a system's architecture and separation of concern in parts. when u need to log the entire process, that's a clear sign that the responsibilities are mixed and there is no clean data flow and therefor you need words to explain the code, rather than code blocks and chunks! we have a similar issue in our main product (+25 years old successful enterprise solution), where methods reach tens of thousands of lines and files up to a couple of megabytes!!! no logging level, same plain text, sometimes tagged with jibrish terms, left by teams from decades ago! once i talked in a meeting about improving the logging and all team members disagree and said if it's working, why do we have to change... so i gave up.. if i go back, i would never suggest changing anything since the issue is not in the code, but in teams culture! long story short, i managed to change the team and join a young group of devs with more modern and solid mentality! and now everything works! another problem is that if u change it, if it works, no one will appreciate it, but it's enough once things go south and all the mess be blamed on you!
so if i was at ur place, and since u get paid less, i would move either to another team or another company!
frompadgwithH8@reddit (OP)
Yeah man, you really nailed it. Some other comments in this thread have elucidated similar points you’re making. I was not considering leaving my job and I’m still not seriously considering leaving my job, but if I was considering leaving my job, I’m considering it just a little bit more after critically thinking about what everyone has said in this thread. It’s making me realize I’m even more undervalued than I thought I was. I appreciate your comment. If I could talk to you on a phone/conference call to hear about your experiences that would be really cool. I would love to hear about the problems you had. They sound basically the same as what I have. I literally gave a presentation about a 4000 line stored procedure last year and there were several other files that were similar. There are also application code files that are thousands and thousands of lines long… The developers use collapsing code region tags to organize the code instead of literally just separating code into separate files…
We made these new domain based services that we’re supposed to be fresh and new and better but everyone is just copying the code from the old legacy code base into the new one and so the new code base is quickly turning out to be just as annoying to work with as the old one, albeit with a newer tech stack and newer version of the coding language we’re using.
I hadn’t considered the culture until you and a few other people mentioned it but yes, even though we made a new department with a new project and a new initiative and we split up into new teams and we hired new people and we let go of some people, etc. the actual culture is still the problem.
The code quality never got better and nobody learned how to debug things correctly. There are a few extremely high-level engineers in the department that I think are just carrying the entire project on their back and I’m not at their level and I’m not at their skill level either, but I am carrying my team on my back and I am carrying other teams on my back too.
And my other jobs I had a higher level but at this job during the interview rounds they had this one interview round where I was supposed to talk about AWSAPI Gateway firewall stuff and I totally flooded that one and so they hired me at a lower level than what I was at at my previous job. And I would have not accepted the offer, but it was the hardest job search I’ve ever done since I initially tried to get my first job as an engineer, and I had been unemployed for months and so I just took the job. But I’ve been salty ever since because not once has anyone on my team had to do anything with an AWSAPI gateway or firewall and I’ve consistently delivered more code and documentation and presentations than anyone else and I’ve been in threads helping other teams and my team jumping in to help people and fix other people‘s problems and get things done and all I get for it is atta boy good job five stars for you buddy
In order to get promoted, I have to demonstrate the business impact to my contributions make… I literally wrote the entire Web app for my team. At my last job and job before that we had a senior engineer who said that all up and then I worked under those people and under their guidance. At this job? No other engineer. I did everything. I literally am a senior. That’s the definition. But no gotta learn the business so I pirated six business books. I bought another business book. I’m reading two business books simultaneously. Honestly, I’m learning a lot about business. But it’s so bullshit because none of my coworkers and almost no one at the company knows this stuff at least no engineer. I know knows this stuff. By the time I actually get promoted. I’m going to know so much that I’ll be halfway to the next level. It’s all just such a fucking joke.
zarikworld@reddit
i totally get where u coming from! feel free to send me a message whenever u want, i will be happy if i can help/share my experience! since u mentioned you only accepted lower salary+position only due to the condition you were into, and since u r (based on ur brief info about learning and reading a lot) a person who is cool with learning, why don't u invest this time in something more rewarding, something what takes u out of that system? any way next promotion will land u more and less ur previous salary and position! won't it? now u r secure! get paid, do ur job, and when ur ready, start interviewing and move on!
roynoise@reddit
Logs are better than no logs...trust me.
adhd6345@reddit
Friend, I have ADHD, I cannot read this even though I want to.
frompadgwithH8@reddit (OP)
I’ve summarized the important parts and removed everything else. Here’s the original post in case anyone was interested:
Our department is using the strangler pattern to modernize a legacy monolith. My team built a domain service. Other teams did too.
Merges to main automatically deploy to development. We must manually bump git tag version numbers in .yaml files to make helm/kubernetes deploy to staging or prod. Sidebar: promoting takes most of my day (when I am tasked with it) and I hate it. So many things must be manually checked or we deploy a crashed service. It is stressful. Our department will improve its processes someday… I wish that day would come sooner. When I return from vacation I’ll write some new argo/kuberentes job to prevent argo from deploying broken services. Such a facepalm Homer Simpson “D’oh” situation is preventable, yet we haven’t… ANYWAYS.
We use open telemetry for logging and metrics.
It’s been many months.
I’ve noticed the other teams don’t seem to care about code quality. Nor mine, actually. It’s frustrating. But that’s besides the point.
The only way for us to know what went wrong in the cloud (dev/staging/prod) is to look at logs. So we’re all getting better at log querying.
I’ve noticed I seem to be in a minority at my department when it comes to my approach to testing. I prefer to set up the systems I’m working on locally with Docker in shared network mode so they can communicate through their own queue topics and make RESTful API calls to each other. I’d rather not develop the feature one repo at a time, push it up to the cloud after going through 2+ PRs (and we enforce multiple approvers so getting PRs through can take half a day or more) only to find I made a mistake and I need to do more work on one or the other repo. Again, that’s not how I do it. But this is how almost all the engineers in my department do it.
Don’t get me wrong there are engineers whom I truly respect. They’re stellar engineers I aspire to be like someday. I’m building a homelab this weekend to learn all the kubernetes stuff myself, since I just don’t get exposed to it enough on an application domain team. Although I am super looking forward to patching up our promotion pipeline failure point w/ my argo improvement next week. But getting back to the point: I did notice these ultra-high level engineers do the same as me; they’ll set everything up locally. So I know I’m not crazy. Not that I ever doubted my sanity…
The original topic of this thread though:
My team is putting logs everywhere! It’s insanity! After practically any branching logic there’s a log! Before return statements a log! At the entry point to API invocations (controller methods) there are multiple logs! And the logs take multiple lines of code. I’ll read a method that’s 20 lines long and there’s, what, 4 blank lines, 4 lines of code and 12 lines of logs. That’s an example I made up because I’m not on my company computer looking at source code right now but it’s basically seared into my brain… Just imagine you gave two if statements, both following the guard pattern, short circuiting an early return. So that’d be like 3 LOC per if statement (if (thisGuard(…)) { return; }) and then maybe a return true at the end. So 7 lines of code for some typical guard pattern code. But now there are two 3 line log statement before both returns. So now we have the method body going from 7 to 13 lines of code. It’s basically doubled! And that’s what reading my team’s code feels like! I hate it! It takes me so much longer to understand the code I’m reading.
Another example: I’ll read a method and only one out of ten lines actually mutates any data or has any effect on the program. Everything else is just logs. Now imagine all the methods are like this. It makes code review so hard because I am spending mental energy wondering “ok is there actually any business logic happening here or it is all just logs?” (Sometimes it’s all just logs!!!)
So yesterday I pulled one of my coworkers aside after I reviewed a pr they wrote, specifically to talk about basically what I’m talking about in this thread. I voiced all my concerns here. My coworker listened. Then they made the great point: “who cares about how hard it is to read, compared to being actually able to debug stuff that breaks in prod?” And I had no answer. My coworker agreed that it’d be nicer to read the code without all the log statements shitting it up. We both agreed. And I agreed with my coworker that being able to actually “debug” stuff (via logs) in prod is waaaaay more important.
So I am at a loss. I decided I might write a custom plugin for my editor of choice that automatically collapses log statements, just so that when I am reading code on my editor I can process it better. I already know major IDEs support plugins integrating with services like bitbucket or GitHub to do things like PR review. So I could potentially move more of my workflow off of websites and into my IDE. Another solution would be a custom formatter that forcefully ensures log statements only take a single line even if it’s 400 characters long (lol).
But ughhhh, brother ughhhh, just ughhhhh!!!! What do you guys do about this? I was thinking maybe there’s a way to get log statements out with meta programming or annotations or some other trick/hack/wrapper approach so we can get the logs we want to debug stuff in live cloud environments without littering metric tons of log statements all over our codebase.
The other teams, btw, apparently don’t have logs at all, and purportedly discovering “what went wrong” with their services is a “shitshow”. I recall back when i was learning how to program, reading about how in elixir/erlang if a process crashed you could remotely ssh into a debug session live in that thread to inspect the state/variables. That’s absolutely not what my team/department needs but it’s cool if for anything but to demonstrate that I just feel it in my jellies (was Ryan his dad the whole time??) that there’s gotta be a more clever way to approach “figuring out what went wrong in the cloud”.
And that’s what the logs are for, remember. To figure out what went wrong in the cloud. When customer X encounters a bug doing Z, we look at the logs and try to figure out if something errored out, or, also very commonly, just didn’t get called at all. For example, oops, we never updated system Y to publish a message to the event queue. So system G never received the message, which explains why the customer’s pet never got a new fur dye job! All we gotta do is make sure system Y publishes event “dye dog fur”. (Example out of my ass).
How do you guys solve diagnosing issues in prod/stage/dev? Are your codebases shitted up with logs too? Do you take a fundamentally different approach that sidesteps this entire pattern/problem?
adhd6345@reddit
Thanks for summarizing. That made the issue much clearer.
The solution is log levels. Both you and your coworkers(s) are correct. You want visibility, but also, you want to be able to read the logs well.
nullbyte420@reddit
Dear lord you are not good at writing
frompadgwithH8@reddit (OP)
You’re correct
I’ve been working on my communication skill for years
It causes me practically physical pain to do it
Obviously on this Reddit post, I did not do that at all
nullbyte420@reddit
At least your summary is much more readable!
WittyCattle6982@reddit
Holy shit. The summary is even worse.
ryhaltswhiskey@reddit
This is not helpful or constructive.
WittyCattle6982@reddit
The person OP was replying to was saying they have ADHD and couldn't read the post. OP followed up with what, 3x the amount of text? That wasn't helpful or constructive, which makes your comment unhelpful and not constructive.
Is it possible that adhd6345 was referring to log msgs, but I don't think so.
ryhaltswhiskey@reddit
If you're going to tell somebody that something is even worse, give them some indication as to why. Because just saying even worse is not helpful or constructive because what the fuck are they supposed to change to make you happy?
WittyCattle6982@reddit
Here's the indication:
ryhaltswhiskey@reddit
No, what you said was
And that's what I responded to. Stay on topic.
WittyCattle6982@reddit
Yep, my mistake. You win!
ryhaltswhiskey@reddit
Gee that sure sounds like sarcasm
WittyCattle6982@reddit
I'm a bot, what do you expect.
skdeimos@reddit
I don't think you read correctly my guy. The post was originally the long thing, OP edited it to be shorter which is what you're seeing now.
WittyCattle6982@reddit
You think I read that long-assed comment?
skdeimos@reddit
No, but I thought maybe you would have read literally the first two sentences before writing a rude comment.
I thought this was a higher-effort subreddit for more thoughtful, mature people. My mistake I guess.
WittyCattle6982@reddit
Nope.
ryhaltswhiskey@reddit
Yes, that's what I see too
syklemil@reddit
I think what they've actually done is leave the original as a comment and edit the post to be the summary.
frompadgwithH8@reddit (OP)
lol when you put it in a quote like that it does seem comical doesn’t it
frompadgwithH8@reddit (OP)
Oh you misunderstood that’s the original post. The OP is the summary
frompadgwithH8@reddit (OP)
Summarize it for me then? IDK what you want.
WittyCattle6982@reddit
I can't summarize your own issue. I think the thing to do would be to ask adhd6345 what you can do to make it more digestible for them instead of a text dump. Questions lead to resolutions, assumptions lead to
Potterrrrrrrr@reddit
I think you’ve misunderstood. The reply to adhd includes the original post that OP made for anyone curious. The actual post has been edited to be concise due to adhd’s comments (and was done well, the information was a lot more succinct and relevant).
frompadgwithH8@reddit (OP)
Oh good point i didn’t understand either. Nice r/Potterrrrrr yeah it’s just confusion
ben_bliksem@reddit
They're using logs for tracing instead of... well tracing.
budding_gardener_1@reddit
That to me (along with excessive code commenting) is a tell tale with of AI slop
ImprovementMain7109@reddit
Yeah, this is just “printf debugging as architecture”. Seen it a lot.
What’s usually worked for me: push for a logging guideline where you only log at boundaries (entry/exit, external calls) and on meaningful state changes, with structured fields + correlation IDs. Fewer logs, higher quality, much better signal.
TransCapybara@reddit
Do you have log levels? (trace, debug, info, warn, crit) ?
metaphorm@reddit
you'll probably get better debugging information with modern APM tools that can report on uncaught errors and give you full stack traces.
logs are an inherently noisy signal and you will eventually have an intractable problem of log management and log searchability. good logs should be tight, focused, not spammy, and should reflect the kind of observations you actually care about (edge cases, mostly) rather than every execution branch.
sneeds_of_volition@reddit
You can mitigate the amount of logs output by using logging levels ie. Verbose/info/warning/error/debug etc. If your problem is the logging calls in the code, though, I think you're out of luck unless you can change hearts and minds. Trace logs for every decision point in the code is a bit much unless you are specifically diagnosing a very tricky issue in a deployed environment. Usually by default I will tolerate a trace log per function call with a debug object included. This usually gives you enough info to understand what control flow occurred based on the debug object
frompadgwithH8@reddit (OP)
Yes the issue is lines of code. Not the logs im reading in our logging platform. As others have replied
edgmnt_net@reddit
This is worse than one might expect, because it likely means you're also pushing crap code if it can't be tested locally.
SupermarketNo3265@reddit
Then how about you focus your attention on trying to solve that disease? Rather than complaining about the symptoms of the disease (log statements).
frompadgwithH8@reddit (OP)
I did
I use it every day
My coworkers won’t
I got them on conference calls, and I walked through setting it up on their machines and they still don’t use it
It’s docker compose FYI
Also, there’s an engineer in another team who has been tasked with coming up with this system that brings up all of the domain teams with docker compose
He’s interviewed me several times and I’ve helped him a lot.
dnult@reddit
Bare Metal Software has a couole of great utilities - BareGrep and BareTail. I can't recommend them highly enough. They have a grep-like pattern matching syntax as well as highlighting features that help emphasize the details and hide the noise.
syklemil@reddit
Did they grow those log statements over time to help make sense of the system, or have they put them in "just in case"?
Because it smells like the latter, in which case it absolutely should be reined in, but if it was the former, then your app is just cursed, and all we can offer is thoughts & prayers.
We've got a decent amount of structured logs, a shit-ton of metrics, and some Opentelemetry tracing. The tracing we can actually tune to be some reasonable amount of data with parent-based sampling rules, and the devs seem to love it.
We do occasionally have to do some tuning of log levels, including dropping logs we get from third-party stuff that's just … not interesting.
frompadgwithH8@reddit (OP)
New change requests have tons of new log statements. I also see log statements added to code that didn’t have it before, though that’s a decreasing occurrence since everything’s starting to be clogged up with log statements.
I’ve never heard of this “parent based tracing”. Could you tell me more about it? It sounds like something promising that could help my team and department.
Another poster already said something about how tracing could be helpful so I do plan to look more into tracing. We use open telemetry so I know tracing is supported. We have trace ids and span ids but they only show up when we invoke the logger or we throw an exception that gets logged.
I don’t see how that can help us identify when things don’t happen. For example, when messages don’t get put into the event queue.
Or when a function doesn’t get invoked.
As others have said, having end to end tests would be helpful here and having our engineers be able to set up the software locally so they can debug it would also be good. We currently can’t do that. I have a set up so I can do that, but nobody wants to copy me even though I’ve published my set up two source control.
syklemil@reddit
It sounds like you could do with some instructions to ease up on the logging. Some logging is fine, but production should generally be silent in the old "no news is good news" way. FWIW we generally also don't permit INFO or lower level logs in production, only WARN/ERROR.
Essentially with opentelemetry you set a rule for whether a trace will be generated or not. The base levels are "never", "always" and "sample(ratio)", and then you can turn any of them parent-based, which means that incoming requests may indicate that a call should be traced.
E.g. you can set
parent_based_ratio(0.0001)and then any call might start a trace, with its children included, or you could set the top level to something likeratio(smallnum)and everything below it asparent_based_always_off, in which case they'll only ever trace when instructed to do so by a parent.See SDK for the actual possible settings
frompadgwithH8@reddit (OP)
OK, yes I really will have to just go read the documentation on this. I think what my team wants is they want to be able to see lined by line which code executed. They basically want to be able to step through the code as if they were using a debugger…
Of course this is after an incident is reported. So after the program has finished executing, etc.
bluemage-loves-tacos@reddit
Feels like the over logging is a sympton of either inadequate tests, or people not trusting the tests.
I advocate for good logging to let you know something was done, or not done. That something should be a behaviour that is important enough to want to go find out if it did, or did not happen. So, for example, an endpoint gets a request to send an email. I want to know the endpoint was accessed, what the parameters were, and if the email sent or not. I don't care about the email template loading, or the content of the email, or the email being in english... etc. I have tests to tell me that the code functions, I only want to know if something didn't happen as it should, and the inputs the endpoint that made that occur.
From there I can alter my tests to see if it's an input thing, or if I need to log further to find out what the root cause is. I can then update the tests, and that problem is sorted. I don't need to keep the extra logs, I can delete them, I resolved it.
You should also ask, how many times has a log been used to ACTUALLY understand a production issue. What else can be done to better detect that issue in the future?
ceirbus@reddit
Two words - structured logging! Use an open source log sink and formatting for a specific log viewer with filtering/sorting, azure devops, ELK, whatever you like
Then the noise can be filtered out to see what matters
frompadgwithH8@reddit (OP)
We are using structured logging. That’s not the issue. The issue is that there are more log statements in our code than lines of business logic code.
ceirbus@reddit
Audit the logs to see if they provide anything useful, otherwise a platform to filter them would be ideal for you - i tend to think there is no such thing as too many logs but there is an extreme edge case where you could have a bunch of useless ones.
Going through and modifying this would probably be more tedious than just creating configurable filters in some log viewer that filters out the noise
frompadgwithH8@reddit (OP)
You’re fundamentally misunderstanding my issue. I don’t mind the noisy logs. I mind that when i open up the source code and read the files full of functions and methods, there are more lines of code with logging statements than actual lines of code for business logic.
Like
Logger.log(“thing {@msg}”, new Object {
Thing: 1,
Thing2: 2,
Thing 3: 3,
});
Return true;
See what i mean?
ceirbus@reddit
Could you not add some way to enumerate the params in a log helper to clean that up?
frompadgwithH8@reddit (OP)
I’m considering writing a custom format that forces all log statements to take a single line of code so that at least maybe it’ll be 400 characters “wide” but it won’t make it as hard to read the code because I can just mentally skip the log statement…
We don’t have a format, but everyone’s editor automatically makes objects take a new line for every key value pair so when we use structured logging and ends up, taking like seven lines of code for a fucking log statement
If there was a single object to pass to the structured blog, it would not be that bad, but the engineers often combine desperate pieces of data in the method within the scope of the program into one amalgamation blob object
cs_legend_93@reddit
This really makes it so easy. You can even have a log interpreter that removes all lines that don't have the keyword of whatever you're looking for. You just put it in some [brackets].
teerre@reddit
First thing is understanding the difference between logging and tracing. If you have multiple services, you want tracing. Otherwise you'll never be able to have a complete view of the system. Once you do, there are countless tools to visualize traces
As for logging, I adopt the "one log entry per workflow" mantra. I never sprinkle logging statement randomly, instead I collect them into a single statement and output it once when a semantically meaningful amount of work is done. This has two benefits: much less noisy and, more importantly, it forces you to think what you're logging because if you logging the same info, it will be obvious
This approach does require some engineering, usually a custom logger, but it can be made very ergonomic. Again, precisely how depends on your stack
frompadgwithH8@reddit (OP)
Thanks I’ll look into tracing. We have trace ids in our logs. So i can follow a request across services. And error messages have the trace ids. However the main issue we are trying to solve is identifying when things DONT happen. So when errors dont get thrown. Idk if tracing does this. I will investigate. Thx
unknownhoward@reddit
I'm incomprehending why a log statement requires three lines. Surely that can be just one - or else wrapped in a helper method (that is then just one line)?
Sounds like you all could benefit from a workshop day to reach agreement on how to approach logging - and get buy-in from the powers that be to make a one-time global code pass to change the amount of logging in the codebase.
Pipe dreams, I know. You started it.
frompadgwithH8@reddit (OP)
I can’t even convince my team let alone the department to use a formatter. Formatting is all over the place.
The log statements often used structured logging which involves instantiating a new object where each key value pair is separated with a newline.
unknownhoward@reddit
Ugh, I feel your pain.
frompadgwithH8@reddit (OP)
Yeah i might start looking for a job elsewhere idk. I could stand to be paid more. I have better skills than most of the people in my department and deliver more too. By the time i get promoted I’ll be halfway to the level above the level im trying to be promoted to. I’m reading 6 business books to learn business stuff so i can get promoted in 3 months… it’s a farce. I’m an engineer. Ugh. It is fun though. Now im thinking about starting my own biz on the side with these mandatory business skills im learning
soundman32@reddit
About 15 years ago, inused to write log analysers as a WinForms desktop app, that could highlight selections so you could search for all occurrences of a log source (class) or regex. It made it really easy to see all the repeated entry points or certain things, just scrolling through.
These days you would probably use some sort of query language like log insights on AWS CloudWatch or DataDog or Application Insights on Azure.
frompadgwithH8@reddit (OP)
When i write logs i prefix the message with [thing] so i can search or [thing] on the cloud log service query tool
basically_alive@reddit
That sounds insane actually. I'm guessing there's zero testing anywhere? Testing is the thing that would sidestep that particular mess.
frompadgwithH8@reddit (OP)
Yeah there’s mandatory x% code coverage but that’s solved with unit tests which to this date have hardly ever helped me. And they’re all written with AI. Nobody gives a shit about tests.
And we have no integration tests and no end to end tests. We get bugs all the time and none of them were stopped by our >x% unit test coverage. Lel
frompadgwithH8@reddit (OP)
I can’t even convince my team let alone the department to use a formatter. Formatting is all over the place.
The log statements often used structured logging which involves instantiating a new object where each key value pair is separated with a newline.
todo_code@reddit
Diagnosing an issue should be an exception, not done often. You should log flow. And log anything that reached a "crash" aka catch in most languages. This way you can tell when it happens. Some sort of trace I'd for the flow of an individual request. You should have observability in sticker level operations, db operations, CPU and memory usage etc. deploying to production taking a day is fine. There should not be a lot of manual intervention.
I also couldn't read your wall of text. But I think I addressed your major points
frompadgwithH8@reddit (OP)
The major issue then, though, is how do we identify when things DON’T happen? That’s a major problem in our applications which rely on distributed system design and event queues. Often an event doesn’t get fired, doesn’t go into the queue, doesn’t get consumed from the queue and so no RESTful API call is made. Often we are diagnosing the absence of expected behavior
norskie7@reddit
I think personally it’s knowing about what branching decisions actually matter for debugging. Every if-else probably doesn’t need two log statements, but functionalities that get handed off to other classes or functions might. Discretion is key, although I’d lean towards over-logging than under. I think keeping logs to one line should also be sufficient