Three Is It

Because two isn't enough and four is just too many

The only defensible war is a war of defense.
G. K. Chesterton
Home Blogs Genealogy Brad's Bookshelf Subscriptions Contact Sign in
 

About the author

Brad Butts is a .NET developer and architect. He is married with children and enjoys reading, working out, and genealogy is his five minutes of spare time.
E-mail me Send mail
National Debt Clock

Recent comments

Authors

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2010

Powershell vs. Lotus Notes

I’m writing this post for the same reasons that I write most of my posts:

  1. Because once I solve a problem, the solution tends to decay quickly from my mind.  If I don’t get it down on paper in some coherent form, it will be gone—and a month from now, I’ll likely need it again. 
  2. And because I’m sure there are many better ways to solve this problem.  My horrible googling skills came up empty, so this is the best I could do.  It works, but I’m sure there are more elegant ways to do this.  Perhaps a more savvy script writer could educate me.

Overview

At work, we develop components, gadgets, and gizmos that others use to do their work.  When we have a new release of said gizmo or some other relevant event, we send out an email to our internal customers with the announcement.  Recently, we learned that there are many other employees in the company that are not on our email distribution list—folks who may also benefit from our products and services—so we wanted to contact these other people to see if they’d like to be added to our mailing list. 

Later, I may post on how I compared our current mailing list to this other list of people to find out who we were actually missing.  For now, assume that I have already sent out my solicitation email and have begun receiving replies.  In my solicitation, I said something to the effect, “if you would like to be added to our list, please reply to this email with the phrase, ‘add me to the list.’”  I wasn’t specific about where to place the phrase, so some respondents put the phrase in the body of their reply while others put it in the subject line of their reply.  Still others replied back with “no, thanks” and a few even replied that they wanted to be added, but didn’t use my phrase: eg. “sounds great!  Sign me up!”

I knew I didn’t want to go through all the replies by hand and add the affirmative replies, one-by-one, to my email list.  Rather, I wanted to find a way to programmatically process the replies and build a single list of affirmatives that I could add, in bulk, to my distribution list.  Here’s what I did:

Step 1: Export to the replies from Lotus Notes into a form I could process

Lotus Notes is not my email client of choice—that’s simply what I’m forced to use.  The SMTP infrastructure at my company is dark and mysterious; while I’m sure there are APIs I could program to, I decided that the quickest approach to extract the emails I wished to process was to simply do it the manual way.  I found that I could checkmark all the email I was interested in, then select the menu File > Export… and produce a single text file I could work with.  The file looked something like this:

email1

Step 2: Transform the replies into some delimited-type format that I could better process

Now, I had to get this dump of email into some form I could work with.  The first thing that came to mind was to see if I could use Powershell to transform the email dump into a delimited file.  If I could get the fields I needed to process into a delimited file, I could write a LogParser query to identify the individuals wishing to be added to my list.  Here’s the Powershell script I came up with:

$delim = "|"
$allResponses = Get-Content d:\Docs\email_dump

$principals = $allResponses | Select-String "^Principal:"
$abstracts =  $allResponses | Select-String "Abstract:"
$subjects = $allResponses | Select-String "Subject: [^My]"
$inetfroms =  $allResponses | Select-String "INetFrom:"

$newFile = ""

for($count = 0; $count -lt $principals.Count; $count++)
{
    $newFile += ($inetfroms[$count] -replace "INetFrom:  ", "") + $delim + `
        ($abstracts[$count] -replace "Abstract:  ", "") + $delim + `
        ($subjects[$count] -replace "Subject:  ", "") + $delim + `
        ($principals[$count] -replace "Principal:  ", "") + "`n"
}

$newFile > D:\Docs\delimResponses.txt

So, let me explain some of this….  I found that there were probably four relevant fields for my objective:

  • Principal—the English name (distinguished name) of the respondents
  • Abstract—the first line or first X number of characters of the reply
  • Subject—the subject line of the reply
  • INetFrom—the email address of the respondent

Most respondents placed the “add me to your list” phrase either at the beginning of their response (Abstract field) or in the subject line (Subject field).  Of course, I needed the respondent’s email address (INetFrom) to add to my distribution list, and the respondent’s full name (Prinicipal) was just nice to have in case the email address was too cryptic and I wanted to know the actual respondent’s name.

Anyway, I used the Select-String cmdlet to find all the instances of each field.  For the most part, the pattern match (a Regular Expression) was easy; however, initially, my Select-String pattern for the Subject field was just “Subject:”.  Unfortunately, this pattern not only picked up the respondents’ subject lines, but also the subject line from my original email, which was tacked onto the bottom of every email I received.  I had to figure out how to make sure not to capture that subject line.  So, I came up with the pattern “Subject:  [^My]”.  In Regular Expression-eze, this means, “match on all lines where the line begins with “Subject:  [and any characters that don’t begin with ‘My’]”.  After all, my original subject line was “Subject:  My team’s distribution list”.

[Side note: Regular Expressions drive me batty.  Fortunately, my buddy Rob is especially gifted at them and I tend to contact him when I get in a bind.  This tool was also helpful.  Recently, Jeff Atwood recommended this book as a helpful tool in crafting your regular expressions—think I’ll pick that one up.]

So, my script built arrays for these four fields from each email; then, I wrote a loop to iterate through the arrays, building a delimited line item for each index of the array.  I assumed a couple of things: 1) that each array was built starting at the top of the email dump file and working toward the bottom (such that, say, index 23 of each array contained values from the same email) and 2) that each email actually had all four fields.  My first assumption appeared to be true, as I spot-checked the results against the raw data and the values seemed to line up; however, it turned out that there were a few emails actually missing the INetFrom field, which threw off all the line items processed after the errant email.  Perhaps I could have written more script to catch such problems, but instead, I just identified the errant emails, added the missing fields in the raw email dump, then ran the script again and all was successful.  The resulting file looked something like this:

email2

Notice the dollar sign at the beginning of the second field.  This is the Abstract field.  In the raw email dump, the Abstract field name is actually “$Abstract:”.  In regular expressions, the dollar sign is a special character and to treat it as a literal, you have to escape it.  In my replace operation, I tried “\$Abstract:  “, “\\\$Abstract:  “, and “\\\\$Abstract:  “, but nothing seemed to work.  Frustrating, but I can live with it.

 

Step 3: Query against my delimited representation of the replies to produce a report of the affirmative replies

With the raw email dump processed into a delimited format, I can now query for the affirmative responses with LogParser.  Here’s the query I came up with:

SELECT
    Field1
FROM delimResponses.txt
WHERE Field2 LIKE '%add me to%' OR Field3 LIKE '%add me to%'

I saved this query in a file called process_email_responses.sql, then, from the command line, executed this:

LogParser file:process_email_responses.sql -i:TSV -iSeparator:"|" -headerRow:OFF -rtp:-1 > AddToList.txt

This query created a list of email addresses that I could just paste into my distribution list (manually, of course, using the Lotus Notes client).  The query yielded about 95% of the affirmative responses; to get the remainder (those that didn’t explicitly include the phrase “add me to your list” in either the body or subject line of the reply), I ran the reverse query and manually reviewed the results:

SELECT
    Field1
FROM delimResponses.txt
WHERE Field2 NOT LIKE '%add me to%' AND Field3 NOT LIKE '%add me to%'

 

Step 4: Add the affirmative emails to my distribution list

As I said, I took the results of my first query (about 95% of the affirmative responses), manually added the remaining affirmative responses from the second query, and then manually pasted the resulting email addresses into the Lotus Notes Edit Group window of my distribution list.bonehead

Over all, this stuff took me a couple of hours to come up with—mainly because I’m a bonehead.  I probably would have spent about the same amount of time processing the list manually—of course, I probably would have made mistakes, too, and ended up adding someone to the list who didn’t want to be added or vice-versa.  At least now that I’ve documented my efforts here, when I send out more solicitation emails in the future (we do anticipate doing this a few more times, at least), I should be able to significantly reduce the processing time of the results. 

The biggest question I have, though, is: is there a smarter way to use Powershell to process a list like my raw email dump?  A list where the fields are listed vertically with some funky character delimiting records?  I couldn’t find examples of anyone processing such files, but I’m glad I found at least one clunky way to do it.  I guess the other question is: did I really need to use LogParser or could I also get Powershell to do my query?  I probably could have gotten Powershell to do my query—with Regular Expressions, no less—but I guess I just felt more confident with LogParser, at least for now.  Anyway, if you have answers or suggestions to these deep and troubling questions of mine, please feel free to comment.

Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Categories: Technology Blog
Posted by Brad on Sunday, June 14, 2009 12:34 PM
Permalink | Comments (5) | Post RSSRSS comment feed