February 21, 2011

Bash script: Harvest email addresses from a directory of files

I had to go through over a year of server logs to rescue newsletter signups that may or may not have been passed to our newsletter service, due to server updates that happened without my knowledge. I wrote a little command to scrape all the log files in my directory, and output a text file with all the addresses it found. Here we go:
# recursively(!) scrapes directories and prints out each email address to a new line
egrep -o -h -r '[a-zA-Z0-9_-\+\.]+@[a-zA-Z0-9_-\+\.]+?\.[a-zA-Z]{2,3}' *.* | sort | uniq > email_addresses.txt
Credit: I started with a command from Linux.com, and customized it until the output was correct, and nicely formatted.

No comments:

Post a Comment