Jake Scruggs

writes ruby/wears crazy shirts

I spent the last week writing a log parser of all things. The web site I'm currently working on is the sign-up page for a large company so they are very concerned about how far the average joe gets into the process before they give up. If everybody bails on the address page, then perhaps it's too complicated or unresponsive or... something. Up till now we've been redesigning pages based on some data and a lot of guesswork so we thought it might be about time to get some serious data on every session every day. Now the first thing you learn about parsing rails logs is that they are designed to be human readable but not so much computer parsable. An Apache request is one line. A Rails request log has lots more info, but is multi-line. And since we run a lot of Mongrel instances at once, all these requests are interleaved, so sorting them out is an intense computation (we actually ran out of memory a few times before we did some optimization).

Our first pass was mostly a ruby script that called off to a few classes which produced a csv file that our analysts could play with. But they wanted more data -- data only Apache could provide. So we added an Apache log parser and merged the results. And there were bugs (stuff wasn't logged, sessions seemed to end at the starting page or somehow start in the middle (both of which are impossible)). And the code was complicated and hard to understand. At this point we realized that we had been treating this task like a second class citizen -- a task to be automated but not really cared about. So we bit the bullet and refactored the log parser to be more Object Oriented. Now there's a log collector that has lots of requests. Each request mostly delegates to its Apache Request or its Rails request. Along the way things became more clear and the bugs started to become obvious, including one I have to share with you:

Remember how I said many sessions seemed to start in the middle while others ended at the beginning? Well, it turns out that a typical telemarketer will sign up one user and then head back to the first page to sign up another. At which point Rails grabs the session id of their previous session and logs it before processing the request (which wipes the session, but too late for the logging). So it looks like the previous session ended at the beginning. Of course, the next session has had it's beginning stolen so it looks like it started in the middle. Since Rails does this bit of logging before it even looks at our code we needed to adjust our logger to ignore the false session and look for the real session id in a extra bit of logging we added in.

Anyway now that we have our own custom log parser we can get some awesome info on our app for debugging weird stuff that only happens under load and the analysts get a csv file then can import into Excel and generate all sorts of cool graphs and statistics. Yes it did mean a week of writing regular expressions and some crazy command line statements (like: cat access.log | grep | awk | sort | uniq -c | sort ) but we now have a very sharp, well tested, tool.


Dan Manges said...

You mentioned the log was interleaved due to many mongrel processes. Did you change your logging so each mongrel process logs to a separate file?