Simple Apache Log Parser

Posted by Jeron_Baffom@reddit | linuxadmin | View on Reddit | 4 comments

I was trying to find a simple CLI tool (for Linux) to parse apache log, do some stats and create a plain text output with some simple aggregate data (ex: view counter). Then, this plain text output would be submitted to MySQL via cronjob.

The advantage I see by doing this way, is that the database would be hit outside the page request and in batches.

I could find several tools to plot graphs and do some realtime monitoring (ex: GoAccess, AWStats, ApacheTop ...), but none that would create a simple plain text output. Hence, I was left only with bad alternatives:

Create myself a parser script using 'awk', 'grep', 'cut', 'sed', 'tail -f' ...
Or, use LogStash. Which is an overkill for me.

Question

Any recommendation of a simple CLI tool to parse Apache Logs into plain text ?

[-]

Complex-Internal-833@reddit

This post might be too late for you but does all and more than your requirements. I just finished and released it this week. Here's a complete open-source Apache Log Parser & Data Normalization Solution. Python module imports Apache2 Access (LogFormats = vhost_combined, combined, common, extended) & Error logs into MySQL Schema of tables, views & functions designed to normalize data. Client & Server components capable of consolidating logs from multiple web servers & sites with complete Audit Trail & Error Logging! https://github.com/WillTheFarmer/ApacheLogs2MySQL

[-]

Jeron_Baffom@reddit (OP)

"Client & Server components capable of consolidating logs from multiple web servers & sites with complete Audit Trail & Error Logging!"

It seems you've been working hard for while ...
Did you do all this by yourself?

"Here's a complete open-source Apache Log Parser"

Before hitting the database, is it possible to:

Detect bad robots and insert them to a blacklist?
Improved view counter instead of only request counter?
Aggregate data?

[-]

Complex-Internal-833@reddit

Have you run it yet? All that can be done once into MySQL. MySQL is doing all the data manipulation. I initially started doing it in Python but SQL is way better at it.

A pre-import Stored Procedure could be executed on the LOAD DATA tables prior to executing the import Stored Procedure. The import processes is where the data normalization occurs. Once the normalization is done it becomes very clear what data is Good and Bad. It could easily be implemented in a post-import process as well.

Yes, I designed and developed every bit of this application.

I've been designing databases and data processes professionally since 1993.

https://farmfreshsoftware.com

[-]

Jeron_Baffom@reddit (OP)

"Have you run it yet?"

No, not yet. But it is on the radar for a next development iteration.

"I've been designing databases and data processes professionally since 1993."

Impressive.
Are you somewhat connected with Linus Torvalds or Richard Stallman's open source projects ??