Aaronblog

Merging Awstats statistics via PHP-CLI

Because I previously hosted parts of this site on several subdomains, the Awstats crontab I ran on a daily basis used to generate separate statistics for each subdomain, too. As I had since merged all sub domains into one, subdomain-less site, I was looking into merging the old statistics for each month into one file per month, to make the old statistics reflect the new file structure as well, leading to more data I could use for comparisons. Unfortunately, I was unable to find couldn't find a script that did just that, so I wrote my own.

So, here's the deal: I wrote a PHP script containing three classes, to be used via PHP-CLI. The abstract class AwstatsFile is used as a generalizer for both the AwstatsFromFile class (reads an existing Awstats statistics file) and the AwstatsMerger class (merges instances of AwstatsFile). CLI arguments are the filenames to merge; the merged statistics file will be echoed to the output buffer, effectively STDOUT, and can be piped.

Let's give an example. Say you have two Awstats statistics files awstats062010.aaronweb.net.txt and awstats062010.projects.aaronweb.net.txt, and the AwParse script as awparse.php. Assuming php is the PHP interpreter, you'd merge these files by executing:

$ php awparse.php awstats062010.www.aaronweb.net.txt awstats062010.projects.aaronweb.net.txt > awstats062010.aaronweb.net.txt

Without further ado: the script can be found in its GitHub project. Please feel free to submit pull requests or patches there!

Suggestions and other comments are more than welcome! :)

Comments

I just tried using your tool, great tool btw.

One bug I came across is that after I merged 2 log files, in awstats it displays the following error message:

Error: History file '/home/usr/tmp/awstats/awstats012017.mydomain.com.txt' is to old (version '0'). This version of AWStats is not compatible with very old history files. Remove this history file or use first a previous AWStats version to migrate it from command line with command: awstats.pl -migrate="/home/usr/tmp/awstats/awstats012017.proxyape.com.txt".

And in the file the first 2 lines ar giving me this output:

arning: A non-numeric value encountered in /Users/Zamolxis/Downloads/Awmerge/src/AaronVanGeffen/AwstatsParser/AwstatsMerger.php on line 272

Warning: A non-numeric value encountered in /Users/Zamolxis/Downloads/Awmerge/src/AaronVanGeffen/AwstatsParser/AwstatsMerger.php on line 272
AWSTATS DATA FILE 6.9 (build 1.925)

Is there any way you could update it?

Thank you for building this tool!

One more attempt to post the same patch, leaving in tabs instead of blanks. If that doesn't work, I give up.

<pre>
--- awparse.php.O 2012-08-07 17:36:18.000000000 +0200
+++ awparse.php 2012-08-07 18:43:28.000000000 +0200
@@ -199,7 +199,6 @@
}
else
{
- $str = $this-&gt;data['GENERAL'][$item];
if ($item == 'FirstTime')
$this-&gt;data['GENERAL'][$item][0] = min($row[0], $this-&gt;data['GENERAL'][$item][0]);
else if (($item == 'LastTime') || ($item == 'LastUpdate'))
@@ -340,7 +339,7 @@
{
if (!isset($this-&gt;data[$section_name][$key]))
{
- $this-&gt;data[$key][$key] = $row;
+ $this-&gt;data[$section_name][$key] = $row;
continue;
}

@@ -376,7 +375,7 @@
{
if (!isset($this-&gt;data[$section_name][$key]))
{
- $this-&gt;data[$key][$key] = $row;
+ $this-&gt;data[$section_name][$key] = $row;
continue;
}
</pre>

Please totally ignore comment #3. There's no need to update TotalUnique, because the awstats web page ignores it. Instead, the number of unique visitors is taken from the number at the beginning of the BEGIN_VISITOR section, which was wrong. A patch for that bug is below, along with a patch to the same bug that was in helper_sum_merge. This patch also deletes the useless setting of $str that I had added in my first patch in comment #2.

I don't know why all the tabs were deleted in my comment #3. Just in case I changed them all to 8 blanks this time.

<pre>
--- awparse.php.O 2012-08-07 17:36:18.000000000 +0200
+++ awparse.php 2012-08-07 18:43:28.000000000 +0200
@@ -199,7 +199,6 @@
}
else
{
- $str = $this-&gt;data['GENERAL'][$item];
if ($item == 'FirstTime')
$this-&gt;data['GENERAL'][$item][0] = min($row[0], $this-&gt;data['GENERAL'][$item][0]);
else if (($item == 'LastTime') || ($item == 'LastUpdate'))
@@ -340,7 +339,7 @@
{
if (!isset($this-&gt;data[$section_name][$key]))
{
- $this-&gt;data[$key][$key] = $row;
+ $this-&gt;data[$section_name][$key] = $row;
continue;
}

@@ -376,7 +375,7 @@
{
if (!isset($this-&gt;data[$section_name][$key]))
{
- $this-&gt;data[$key][$key] = $row;
+ $this-&gt;data[$section_name][$key] = $row;
continue;
}
</pre>

Here's one more little patch, for the total number of unique visitors:

<pre>
--- awparse.php.O 2012-08-07 17:36:18.000000000 +0200
+++ awparse.php 2012-08-07 17:42:57.000000000 +0200
@@ -206,6 +206,9 @@
$this-&gt;data['GENERAL'][$item][0] = max($row[0], $this-&gt;data['GENERAL'][$item][0]);
else if ($item == 'TotalVisits')
$this-&gt;data['GENERAL'][$item][0] += $row[0];
+ else if ($item == 'TotalUnique')
+ /* don't really know total unique, just take max */
+ $this-&gt;data['GENERAL'][$item][0] = max($row[0], $this-&gt;data['GENERAL'][$item][0]);
}
}
}
</pre>

Thanks for this script. I made a few changes, patch below.

First I had to add the php run option "-d allow_call_time_pass_reference=true" to avoid errors about "Call-time pass-by-reference has been deprecated". This was with PHP 5.1.6.

Next it complained about an access to protected item $data, so I switched it to public.

It was adding some dates together to make a date really far in the future :-). Also it needed a little merging in the GENERAL section.

Finally, I found I needed to run "awstats.pl -LogFile=/dev/null -config=myconfig" after awparse.php to re-generate the MAP section.

<pre>
--- awparse.php.O 2012-08-06 21:05:14.000000000 +0200
+++ awparse.php 2012-08-06 22:31:07.000000000 +0200
@@ -40,7 +40,7 @@
*/
abstract class AwstatsFile
{
- protected $data = array();
+ public $data = array();

public function getFileContents()
{
@@ -197,6 +197,16 @@
$this-&gt;data['GENERAL'][$item] = $row;
continue;
}
+ else
+ {
+ $str = $this-&gt;data['GENERAL'][$item];
+ if ($item == 'FirstTime')
+ $this-&gt;data['GENERAL'][$item][0] = min($row[0], $this-&gt;data['GENERAL'][$item][0]);
+ else if (($item == 'LastTime') || ($item == 'LastUpdate'))
+ $this-&gt;data['GENERAL'][$item][0] = max($row[0], $this-&gt;data['GENERAL'][$item][0]);
+ else if ($item == 'TotalVisits')
+ $this-&gt;data['GENERAL'][$item][0] += $row[0];
+ }
}
}

@@ -335,12 +345,27 @@
}

// Merge rows, taking in account that not all vistors have a start and end date of their visit set.
+ // The 3rd and 4th index are begin and end dates
foreach ($row as $num =&gt; $stats)
- $this-&gt;data[$section_name][$key][$num] = isset($this-&gt;data[$section_name][$key][$num]) ? $this-&gt;data[$section_name][$key][$num] + $stats : $stats;
+ if (($num == 3) || ($num == 4))
+ $this-&gt;data[$section_name][$key][$num] = isset($this-&gt;data[$section_name][$key][$num]) ? max ($this-&gt;data[$section_name][$key][$num], $stats) : $stats;
+ else
+ $this-&gt;data[$section_name][$key][$num] = isset($this-&gt;data[$section_name][$key][$num]) ? $this-&gt;data[$section_name][$key][$num] + $stats : $stats;
}
}

/**
+ * Merges extra_1 statistics.
+ * Note: handled the same way as visitor.
+ * @param $rows existing set of rows
+ * @param $section_name identifier of the current section
+ */
+ private function merge_extra_1(&amp;$rows, &amp;$section_name)
+ {
+ return $this-&gt;merge_visitor($rows, $section_name);
+ }
+
+ /**
* Merges statistics by simply taking the sum of existing rows.
* @param $rows existing set of rows
* @param $section_name identifier of the current section
</pre>

nice tools, but the problem is... what append if you don't have the some days data:
example: file 1
20120603 55 62 586059 1
20120604 162 167 866558 2
20120605 64 176 2104678 3
20120606 197 202 1004882 3
20120607 30 122 1277923 1
file 2:
20120608 1 6 737614 1
20120611 1997 3241 18431625 4
20120612 1505 1677 11318071 3
20120613 1085 1273 9629391 3
20120614 1555 1782 11289854 3
20120615 601 635 3092400 5
20120618 175 293 2899393 4

Comments closed

This blog post has been archived; it is currently not possible to comment.