Copyright

Creative Commons License

This work is licenced under a Creative Commons Licence.

User login

Anonymising IP addresses in apache logs

Apache logs are great, and with the right settings, are a wealth of information about browser versions, and a myriad of visitor statistics. However, as a lazy person, I didn't want to do the analysis of them myself, nor did I want to sort out more conventional monitoring (for example Google Analytics). I decided to get someone else to look at the logs for me. However, in a nod towards privacy, I did not want to provide them with the list of IP addresses of the users that had hit that web server, but simply removing all the IP addresses would have lost a large chunk of potentially useful data. A simple hashing of the IP address should suffice, but this turned out to be slightly more difficult than I had initially anticipated, possibly largely due to my lack of familiarity with awk. However, I must give much thanks to gnomon on #awk on freenode for his help, and the script provided is all his work.

This script relies on having an apache logfile named apache.log and the IP addresses as the first 'item' for each entry. If your log format is different, you can change which field you are wanting to change by editing the three "$1" values toward the end of the script. The script will create pseudo-ips for each entry, incrementing as it goes. It also checks to make sure that each IP is a valid IP.

First, copy and paste the following into apache-anon.awk:

function octp(oct) {
	return (oct ~ /^[0-9]+$/) && ((oct+0) >= 0) && ((oct+0) <= 255)
}

function isValidIP(ip,	o) {
	return (split(ip,o,".")==4) && octp(o[1]) && octp(o[2]) && octp(o[3]) && octp(o[4])
}

function h2d(str, oct) {return hex2dec[substr(str, (oct*2)+1, 2)]}

function anonymize_ip(ip,	h) {
	if (ip in ANONIPS) {
		return ANONIPS[ip]
	} else {
		h = sprintf("%08x", ++ANONIPS["index"])
		return ANONIPS[ip] = sprintf(IPFORMAT, h2d(h,0), h2d(h,1), h2d(h,2), h2d(h,3))
	}
}

BEGIN {
	ANONIPS["index"] = 0
	for (i = 0; i < 256; i++) hex2dec[sprintf("%02x", i)] = i
	IPFORMAT = ((ZEROPAD) ? "%03d.%03d.%03d.%03d" : "%d.%d.%d.%d")
}

isValidIP($1) {
	$1 = anonymize_ip($1)
}

1

Then run:

awk -f apache-anon.awk apache.log > apache.log.anon