Parse Apache Log Files With PHP
Published: 01/09/2010
Programming
Parsing the log files generated by Apache is one of those random tasks with a random occurrence in my world. This is a task that, until recently, hadn’t come up enough to warrant any sort of a ready solution (and it was just fun enough to be ok to write a custom solution). So every time this came up I would always fire up Google and go on a scavenger hunt for a starter script written in php.
This always felt like a good idea at the time the need came up. These days, for some ungodly reason, parsing Apache logs seems to come up a little too frequently to keep this up. In the spirit of making my life a hell of a lot easier for tomorrow I’ve taken a shot at writing an Apache log parser written in PHP.
One thing I decided to implement is a filtering system so you can filter out based on a provided regex. Might not be too useful to everyone but it should be trivial to remove the functionality.
Anyway, I hope someone finds this useful (even to learn from and, of course, use)
Here’s the main class:
<?php /** * Apache Log Parser * Parses an Apache log file and runs the strings through filters to find what you're looking for. * @author Eric Lamb * */ class apache_log_parser { /** * The path to the log file * @var string */ private $file = FALSE; /** * What filters to apply. Should be in the format of array('KEY_TO_SEARCH' => array('regex' => 'YOUR_REGEX')) * @var array */ public $filters = FALSE; /** * Duh. * @param string $file * @return void */ public function __construct($file) { if(!is_readable($file)) { return FALSE; } $this->file = $file; } /** * Executes the supplied filter to the string * @param $filer * @param $status * @return string */ private function applyFilters($str) { if(!$this->filters || !is_array($this->filters)) { return $str; } foreach($this->filters AS $area => $filter) { if(preg_match($filter, $str, $matches, PREG_OFFSET_CAPTURE)) { return $str; } } } /** * Returns an array of all the filtered lines * @param $limit * @return array */ public function getData($limit = FALSE) { $handle = fopen($this->file, 'rb'); if ($handle) { $count = 1; $lines = array(); while (!feof($handle)) { $buffer = fgets($handle); $data = $this->applyFilters($this->format_line($buffer)); if($data) { $lines+):(\d+:\d+:\d+) (]+)\] \"(\S+) (.*?) (\S+)\" (\S+) (\S+) (\".*?\") (\".*?\")$/", $line, $matches); // pattern to format the line return $matches; } /** * Takes the format_log_line array and makes it usable to us stupid humans * @param $line * @return array */ function format_line($line) { $logs = $this->format_log_line($line); // format the line if (isset($logs)) // check that it formated OK { $formated_log = array(); // make an array to store the lin info in $formated_log = $logs; $formated_log = $logs; $formated_log = $logs; $formated_log = $logs; $formated_log = $logs; $formated_log = $logs; $formated_log = $logs; $formated_log = $logs; $formated_log = $logs; $formated_log = $logs; $formated_log = $logs; $formated_log = $logs; $formated_log = $logs; return $formated_log; // return the array of info } else { $this->badRows++; // if the row is not in the right format add it to the bad rows return false; } } } ?>
And here’s an example of how to use it:
<?php $data = new apache_log_parser($d->path.'/'.$entry); // Create an apache log parser $data->filters = array( 'path' => array('regex' => '/^.*\.(FLV|flv)$/') //pull only flv files ); $data = $data->getData(); ?>
A couple things to note about this script though:
1. The regex and parsing was pretty stolen from the Apache Log Parser on PHPClasses.org.
2. Without filters the script is pretty memory intensive. My needs don’t require anything client facing but heed my adivice; Don’t use this on a public web server.