Log Statements Burn my Eyes. Code is Ugly and Hard to Read

Posted by frompadgwithH8@reddit | ExperiencedDevs | View on Reddit | 115 comments

Our department is using the strangler pattern to modernize a legacy monolith. My team built a domain service. Other teams did too.

Merges to main automatically deploy to development. We must manually bump git tag version numbers in .yaml files to make helm/kubernetes deploy to staging or prod. Sidebar: promoting takes most of my day (when I am tasked with it) and I hate it. So many things must be manually checked or we deploy a crashed service. It is stressful. Our department will improve its processes someday… I wish that day would come sooner. When I return from vacation I’ll write some new argo/kuberentes job to prevent argo from deploying broken services. Such a facepalm Homer Simpson “D’oh” situation is preventable, yet we haven’t… ANYWAYS.

We use open telemetry for logging and metrics.

It’s been many months.

I’ve noticed the other teams don’t seem to care about code quality. Nor mine, actually. It’s frustrating. But that’s besides the point.

The only way for us to know what went wrong in the cloud (dev/staging/prod) is to look at logs. So we’re all getting better at log querying.

I’ve noticed I seem to be in a minority at my department when it comes to my approach to testing. I prefer to set up the systems I’m working on locally with Docker in shared network mode so they can communicate through their own queue topics and make RESTful API calls to each other. I’d rather not develop the feature one repo at a time, push it up to the cloud after going through 2+ PRs (and we enforce multiple approvers so getting PRs through can take half a day or more) only to find I made a mistake and I need to do more work on one or the other repo. Again, that’s now how I do it. But this is how almost all the engineers in my department do it.

Don’t get me wrong there are engineers whom I truly respect. They’re stellar engineers I aspire to be like someday. I’m building a homelab this weekend to learn all the kubernetes stuff myself, since I just don’t get exposed to it enough on an application domain team. Although I am super looking forward to patching up our promotion pipeline failure point w/ my argo improvement next week. But getting back to the point: I did notice these ultra-high level engineers do the same as me; they’ll set everything up locally. So I know I’m not crazy. Not that I ever doubted my sanity…

The original topic of this thread though:

My team is putting logs everywhere! It’s insanity! After practically any branching logic there’s a log! Before return statements a log! At the entry point to API invocations (controller methods) there are multiple logs! And the logs take multiple lines of code. I’ll read a method that’s 20 lines long and there’s, what, 4 blank lines, 4 lines of code and 12 lines of logs. That’s an example I made up because I’m not on my company computer looking at source code right now but it’s basically seared into my brain… Just imagine you gave two if statements, both following the guard pattern, short circuiting an early return. So that’d be like 3 LOC per if statement (if (thisGuard(…)) { return; }) and then maybe a return true at the end. So 7 lines of code for some typical guard pattern code. But now there are two 3 line log statement before both returns. So now we have the method body going from 7 to 13 lines of code. It’s basically doubled! And that’s what reading my team’s code feels like! I hate it! It takes me so much longer to understand the code I’m reading.

Another example: I’ll read a method and only one out of ten lines actually mutates any data or has any effect on the program. Everything else is just logs. Now imagine all the methods are like this. It makes code review so hard because I am spending mental energy wondering “ok is there actually any business logic happening here or it is all just logs?” (Sometimes it’s all just logs!!!)

So yesterday I pulled one of my coworkers aside after I reviewed a pr they wrote, specifically to talk about basically what I’m talking about in this thread. I voiced all my concerns here. My coworker listened. Then they made the great point: “who cares about how hard it is to read, compared to being actually able to debug stuff that breaks in prod?” And I had no answer. My coworker agreed that it’d be nicer to read the code without all the log statements shitting it up. We both agreed. And I agreed with my coworker that being able to actually “debug” stuff (via logs) in prod is waaaaay more important.

So I am at a loss. I decided I might write a custom plugin for my editor of choice that automatically collapses log statements, just so that when I am reading code on my editor I can process it better. I already know major IDEs support plugins integrating with services like bitbucket or GitHub to do things like PR review. So I could potentially move more of my workflow off of websites and into my IDE. Another solution would be a custom formatter that forcefully ensures log statements only take a single line even if it’s 400 characters long (lol).

But ughhhh, brother ughhhh, just ughhhhh!!!! What do you guys do about this? I was thinking maybe there’s a way to get log statements out with meta programming or annotations or some other trick/hack/wrapper approach so we can get the logs we want to debug stuff in live cloud environments without littering metric tons of log statements all over our codebase.

The other teams, btw, apparently don’t have logs at all, and purportedly discovering “what went wrong” with their services is a “shitshow”. I recall back when i was learning how to program, reading about how in elixir/erlang if a process crashed you could remotely ssh into a debug session live in that thread to inspect the state/variables. That’s absolutely not what my team/department needs but it’s cool if for anything but to demonstrate that I just feel it in my jellies (was Ryan his dad the whole time??) that there’s gotta be a more clever way to approach “figuring out what went wrong in the cloud”.

And that’s what the logs are for, remember. To figure out what went wrong in the cloud. When customer X encounters a bug doing Z, we look at the logs and try to figure out if something errored out, or, also very commonly, just didn’t get called at all. For example, oops, we never updated system Y to publish a message to the event queue. So system G never received the message, which explains why the customer’s pet never got a new fur dye job! All we gotta do is make sure system Y publishes event “dye dog fur”. (Example out of my ass).

How do you guys solve diagnosing issues in prod/stage/dev? Are your codebases shitted up with logs too? Do you take a fundamentally different approach that sidesteps this entire pattern/problem?