Is there a !GOOD! program or method to get email addresses from multiple word docs?
Hi All,
I have an important situation at work where we need to obtain email addresses from hundreds of word documents. Basically we have acquired another company whose database comprised word docs and spreadsheets. So what we are hoping to do is find an application or some other method to scan all these word documents and then spit out all the found email addresses into a separate file to be used in a mass mail out.
To go through each document one by one would be an admin nightmare and we just want to avoid it.
I know this can be done by using an application because I have trialled several solutions all of which have failed to do the job properly. I did find one that did a good job when up against around 3 documents, but when i gave it the task of a directory with 15 word documents it crashed out.
So i was hoping someone on here might have dealt with this before and had a good solution. I think by the time i try every dodge piece of software they could have hired someone to sift through every doc!
Any help would be much appreciated!
I have an important situation at work where we need to obtain email addresses from hundreds of word documents. Basically we have acquired another company whose database comprised word docs and spreadsheets. So what we are hoping to do is find an application or some other method to scan all these word documents and then spit out all the found email addresses into a separate file to be used in a mass mail out.
To go through each document one by one would be an admin nightmare and we just want to avoid it.
I know this can be done by using an application because I have trialled several solutions all of which have failed to do the job properly. I did find one that did a good job when up against around 3 documents, but when i gave it the task of a directory with 15 word documents it crashed out.
So i was hoping someone on here might have dealt with this before and had a good solution. I think by the time i try every dodge piece of software they could have hired someone to sift through every doc!
Any help would be much appreciated!
Comments
-
astorrs Member Posts: 3,139 ■■■■■■□□□□Something like this from Linux (or cygwin) should work:
grep -Eihor '\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b' C:\FolderToSearch | sort | uniq > emails.txt -
astorrs Member Posts: 3,139 ■■■■■■□□□□Okay I decided to figure out how to translate it into PowerShell 2.0, here you go:
$searchDirectory = "C:\Users\Andrew\Documents" $searchExtensions = "*.doc", "*.docx", "*.xls", "*.xlsx" $outputFile = "C:\Users\Andrew\Desktop\emails.txt" Get-ChildItem * -Include $searchExtensions -Path $searchDirectory -Recurse | ` Select-String -Pattern "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[a-zA-Z]{2,4}\b" ` -AllMatches | Select-Object -ExpandProperty Matches | Select-Object ` -ExpandProperty Value | ForEach-Object { $_.ToString().ToLower() } | ` Sort-Object | Get-Unique | Out-File $outputFile
This will scan all files in $searchDirectory (and any subdirectories) with one of the extensions listed in $searchExtensions and save a list of all the unique email addresses it finds (no duplicates) in a file called $outputFile.
Hopefully this will work for you. -
carboncopy Member Posts: 259Okay I decided to figure out how to translate it into PowerShell 2.0, here you go:
$searchDirectory = "C:\Users\Andrew\Documents" $searchExtensions = "*.doc", "*.docx", "*.xls", "*.xlsx" $outputFile = "C:\Users\Andrew\Desktop\emails.txt" Get-ChildItem * -Include $searchExtensions -Path $searchDirectory -Recurse | ` Select-String -Pattern "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[a-zA-Z]{2,4}\b" ` -AllMatches | Select-Object -ExpandProperty Matches | Select-Object ` -ExpandProperty Value | ForEach-Object { $_.ToString().ToLower() } | ` Sort-Object | Get-Unique | Out-File $outputFile
This will scan all files in $searchDirectory (and any subdirectories) with one of the extensions listed in $searchExtensions and save a list of all the unique email addresses it finds (no duplicates) in a file called $outputFile.
Hopefully this will work for you.
I don't think I will ever use that but that is pretty cool. -
albanga Member Posts: 164Thanks astorrs, I handed it to my developer in the end. Your script didnt work as well as he would have hoped so he had to re-do it. I will post up the end result as he is currently away sick.
Thanks for the reply though -
Hyper-Me Banned Posts: 2,059Thanks astorrs, I handed it to my developer in the end. Your script didnt work as well as he would have hoped so he had to re-do it. I will post up the end result as he is currently away sick.
Thanks for the reply though
That sounds like a typical developer response, lol -
astorrs Member Posts: 3,139 ■■■■■■□□□□Thanks astorrs, I handed it to my developer in the end. Your script didnt work as well as he would have hoped so he had to re-do it. I will post up the end result as he is currently away sick.
Either way, glad you got what you needed. -
Ahriakin Member Posts: 1,799 ■■■■■■■■□□You can get eGrep for Windows (and SED etc.) so you can get some of that nice Linux CLI text manipulation natively.We responded to the Year 2000 issue with "Y2K" solutions...isn't this the kind of thinking that got us into trouble in the first place?