Three Is It

Because two isn't enough and four is just too many

I am not blaming those who are resolved to rule, only those who show an even greater readiness to submit.
Thucydides
Home Blogs Genealogy Brad's Bookshelf Subscriptions Contact Sign in
 

About the author

Brad Butts is a .NET developer and architect. He is married with children and enjoys reading, working out, and genealogy is his five minutes of spare time.
E-mail me Send mail
National Debt Clock

Recent comments

Authors

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2010

Powershell vs. Lotus Notes

I’m writing this post for the same reasons that I write most of my posts:

  1. Because once I solve a problem, the solution tends to decay quickly from my mind.  If I don’t get it down on paper in some coherent form, it will be gone—and a month from now, I’ll likely need it again. 
  2. And because I’m sure there are many better ways to solve this problem.  My horrible googling skills came up empty, so this is the best I could do.  It works, but I’m sure there are more elegant ways to do this.  Perhaps a more savvy script writer could educate me.

Overview

At work, we develop components, gadgets, and gizmos that others use to do their work.  When we have a new release of said gizmo or some other relevant event, we send out an email to our internal customers with the announcement.  Recently, we learned that there are many other employees in the company that are not on our email distribution list—folks who may also benefit from our products and services—so we wanted to contact these other people to see if they’d like to be added to our mailing list. 

Later, I may post on how I compared our current mailing list to this other list of people to find out who we were actually missing.  For now, assume that I have already sent out my solicitation email and have begun receiving replies.  In my solicitation, I said something to the effect, “if you would like to be added to our list, please reply to this email with the phrase, ‘add me to the list.’”  I wasn’t specific about where to place the phrase, so some respondents put the phrase in the body of their reply while others put it in the subject line of their reply.  Still others replied back with “no, thanks” and a few even replied that they wanted to be added, but didn’t use my phrase: eg. “sounds great!  Sign me up!”

I knew I didn’t want to go through all the replies by hand and add the affirmative replies, one-by-one, to my email list.  Rather, I wanted to find a way to programmatically process the replies and build a single list of affirmatives that I could add, in bulk, to my distribution list.  Here’s what I did:

Step 1: Export to the replies from Lotus Notes into a form I could process

Lotus Notes is not my email client of choice—that’s simply what I’m forced to use.  The SMTP infrastructure at my company is dark and mysterious; while I’m sure there are APIs I could program to, I decided that the quickest approach to extract the emails I wished to process was to simply do it the manual way.  I found that I could checkmark all the email I was interested in, then select the menu File > Export… and produce a single text file I could work with.  The file looked something like this:

email1

Step 2: Transform the replies into some delimited-type format that I could better process

Now, I had to get this dump of email into some form I could work with.  The first thing that came to mind was to see if I could use Powershell to transform the email dump into a delimited file.  If I could get the fields I needed to process into a delimited file, I could write a LogParser query to identify the individuals wishing to be added to my list.  Here’s the Powershell script I came up with:

$delim = "|"
$allResponses = Get-Content d:\Docs\email_dump

$principals = $allResponses | Select-String "^Principal:"
$abstracts =  $allResponses | Select-String "Abstract:"
$subjects = $allResponses | Select-String "Subject: [^My]"
$inetfroms =  $allResponses | Select-String "INetFrom:"

$newFile = ""

for($count = 0; $count -lt $principals.Count; $count++)
{
    $newFile += ($inetfroms[$count] -replace "INetFrom:  ", "") + $delim + `
        ($abstracts[$count] -replace "Abstract:  ", "") + $delim + `
        ($subjects[$count] -replace "Subject:  ", "") + $delim + `
        ($principals[$count] -replace "Principal:  ", "") + "`n"
}

$newFile > D:\Docs\delimResponses.txt

So, let me explain some of this….  I found that there were probably four relevant fields for my objective:

  • Principal—the English name (distinguished name) of the respondents
  • Abstract—the first line or first X number of characters of the reply
  • Subject—the subject line of the reply
  • INetFrom—the email address of the respondent

Most respondents placed the “add me to your list” phrase either at the beginning of their response (Abstract field) or in the subject line (Subject field).  Of course, I needed the respondent’s email address (INetFrom) to add to my distribution list, and the respondent’s full name (Prinicipal) was just nice to have in case the email address was too cryptic and I wanted to know the actual respondent’s name.

Anyway, I used the Select-String cmdlet to find all the instances of each field.  For the most part, the pattern match (a Regular Expression) was easy; however, initially, my Select-String pattern for the Subject field was just “Subject:”.  Unfortunately, this pattern not only picked up the respondents’ subject lines, but also the subject line from my original email, which was tacked onto the bottom of every email I received.  I had to figure out how to make sure not to capture that subject line.  So, I came up with the pattern “Subject:  [^My]”.  In Regular Expression-eze, this means, “match on all lines where the line begins with “Subject:  [and any characters that don’t begin with ‘My’]”.  After all, my original subject line was “Subject:  My team’s distribution list”.

[Side note: Regular Expressions drive me batty.  Fortunately, my buddy Rob is especially gifted at them and I tend to contact him when I get in a bind.  This tool was also helpful.  Recently, Jeff Atwood recommended this book as a helpful tool in crafting your regular expressions—think I’ll pick that one up.]

So, my script built arrays for these four fields from each email; then, I wrote a loop to iterate through the arrays, building a delimited line item for each index of the array.  I assumed a couple of things: 1) that each array was built starting at the top of the email dump file and working toward the bottom (such that, say, index 23 of each array contained values from the same email) and 2) that each email actually had all four fields.  My first assumption appeared to be true, as I spot-checked the results against the raw data and the values seemed to line up; however, it turned out that there were a few emails actually missing the INetFrom field, which threw off all the line items processed after the errant email.  Perhaps I could have written more script to catch such problems, but instead, I just identified the errant emails, added the missing fields in the raw email dump, then ran the script again and all was successful.  The resulting file looked something like this:

email2

Notice the dollar sign at the beginning of the second field.  This is the Abstract field.  In the raw email dump, the Abstract field name is actually “$Abstract:”.  In regular expressions, the dollar sign is a special character and to treat it as a literal, you have to escape it.  In my replace operation, I tried “\$Abstract:  “, “\\\$Abstract:  “, and “\\\\$Abstract:  “, but nothing seemed to work.  Frustrating, but I can live with it.

 

Step 3: Query against my delimited representation of the replies to produce a report of the affirmative replies

With the raw email dump processed into a delimited format, I can now query for the affirmative responses with LogParser.  Here’s the query I came up with:

SELECT
    Field1
FROM delimResponses.txt
WHERE Field2 LIKE '%add me to%' OR Field3 LIKE '%add me to%'

I saved this query in a file called process_email_responses.sql, then, from the command line, executed this:

LogParser file:process_email_responses.sql -i:TSV -iSeparator:"|" -headerRow:OFF -rtp:-1 > AddToList.txt

This query created a list of email addresses that I could just paste into my distribution list (manually, of course, using the Lotus Notes client).  The query yielded about 95% of the affirmative responses; to get the remainder (those that didn’t explicitly include the phrase “add me to your list” in either the body or subject line of the reply), I ran the reverse query and manually reviewed the results:

SELECT
    Field1
FROM delimResponses.txt
WHERE Field2 NOT LIKE '%add me to%' AND Field3 NOT LIKE '%add me to%'

 

Step 4: Add the affirmative emails to my distribution list

As I said, I took the results of my first query (about 95% of the affirmative responses), manually added the remaining affirmative responses from the second query, and then manually pasted the resulting email addresses into the Lotus Notes Edit Group window of my distribution list.bonehead

Over all, this stuff took me a couple of hours to come up with—mainly because I’m a bonehead.  I probably would have spent about the same amount of time processing the list manually—of course, I probably would have made mistakes, too, and ended up adding someone to the list who didn’t want to be added or vice-versa.  At least now that I’ve documented my efforts here, when I send out more solicitation emails in the future (we do anticipate doing this a few more times, at least), I should be able to significantly reduce the processing time of the results. 

The biggest question I have, though, is: is there a smarter way to use Powershell to process a list like my raw email dump?  A list where the fields are listed vertically with some funky character delimiting records?  I couldn’t find examples of anyone processing such files, but I’m glad I found at least one clunky way to do it.  I guess the other question is: did I really need to use LogParser or could I also get Powershell to do my query?  I probably could have gotten Powershell to do my query—with Regular Expressions, no less—but I guess I just felt more confident with LogParser, at least for now.  Anyway, if you have answers or suggestions to these deep and troubling questions of mine, please feel free to comment.

Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Categories: Technology Blog
Posted by Brad on Sunday, June 14, 2009 12:34 PM
Permalink | Comments (5) | Post RSSRSS comment feed

Thinking about Logging

I think about logging pretty often as I code solutions.  To me, logging is an important part of a quality software product.  In the middle of the night, when your software goes haywire, being able to explore a well-detailed application log file (not just the event logs or IIS logs, for example) can be critical in remediating the problem and reducing the number of lost hours of sleep.

I think a good logging solution should include:

  1. The ability to be turned off or on from outside the compiled solution (eg. from the configuration file).
  2. Persistence ignorance—maybe today I want to log to a flat file, but tomorrow I want to log to a database.  I should be able to easily swap persistence providers without recompilation.  With regard to flat file logging, I’d like the option to be able to specify a relative path or absolute path for the log file (although, for web apps, that could serve as a security vulnerability).  Also, a rolling log file provider is a must.
  3. The ability to crank up or crank down the verbosity of the logging solution. 
  4. Some control over the look of the log file.  Although I might, on occasion, crack open a log file in notepad and manually walk through it, most log files I deal with are quite large—I’d rather use tools like LogParser to produce human-readable reports from the generally-not-human-readable log file.  By having some control over the data that’s added to my log files and how that data is delimited, it can be easier for me to write scripts that render powerful reports (some day, I do hope to put up a post on how I’ve used LogParser to produce meaningful reports based on my application logs).

I could also add to the list that the solution must be hidden behind an interface for easy injection via Dependency Injection.  The easy winner here, in my opinion, is log4net.  If you’re not familiar with log4net, it is a light, open-source API that meets all of these needs and more.  Here are my quick notes on implementing the library:

Step 1: Configure it

…in your application configuration file

<configSections>
<section name="log4net" type="log4net.Config.Log4NetConfigurationSectionHandler, log4net"/>
</configSections>
<log4net>
<!-- A1 is set to be a ConsoleAppender -->
<appender name="A1" type="log4net.Appender.RollingFileAppender">
<file value="logs\log4net.log"/>
<appendToFile value="true"/>
<datePattern value="yyyyMMdd"/>
<rollingStyle value="Date"/>
<!-- A1 uses PatternLayout -->
<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%-4timestamp|%thread|%-5level|%logger|%ndc|%message%newline"/>
</layout>
</appender>
<!-- Set root logger level to DEBUG and its only appender to A1 -->
<root>
<level value="ALL"/>
<appender-ref ref="A1"/>
</root>
</log4net>

…and in your AssemblyInfo.cs file

[assembly: log4net.Config.XmlConfigurator(Watch = true)]

Step 2: Add a reference to log4net.dll

Step 3: Code Away

using System;
using System.Reflection;
using log4net;
namespace WebApplication1
{
public partial class log4net : System.Web.UI.Page
{
private static readonly ILog log = LogManager.GetLogger(MethodBase.GetCurrentMethod().DeclaringType);
protected void logButton_Click(object sender, EventArgs e)
{
log.Debug("some Debug msg");
log.Error("some Error msg");
log.Fatal("some Fatal msg");
log.Info("some Info msg");
log.Warn("some Warn msg");
}
}
}

 

So, log4net’s great for your everyday application needs, but I have a new requirement: most of what I do anymore is write libraries and frameworks for others to use.  I want to litter my APIs with lots of log messages so that, should a developer want to know what my API is doing, he should just be able to flip a configuration switch and get a log of the activities of my API.  It seems a little presumptuous of me to introduce yet one more dependency on my customer—the log4net dependency—just to be able to use my API.  In this situation, I’d prefer to be able to leverage any kind of logging opportunities in the base class library itself.  The System.Diagnostics namespace gives me some of this opportunity, but none of the native options are as rich or powerful as log4net.

System.Diagnostics.TraceSource

I find the newer System.Diagnostics.TraceSource not even a viable option for me.  TraceSource seems nothing more than the older System.Diagnostics.Trace with less options (eg. no throttling).  So, let’s skip TraceSource and talk about Trace.

 

System.Diagnostics.Trace

It used to be that I would configure a TraceListener like this:

    <system.diagnostics>
<switches>
<add name="MagicTraceSwitch" value="Verbose"/>
</switches>
        <trace autoflush="true" indentsize="4">
<listeners>
<clear/>
<add name="fileListener" 
type="System.Diagnostics.TextWriterTraceListener" 
initializeData="logs\SysDiag.log" 
traceOutputOptions="DateTime"/>
            </listeners>
</trace>
    </system.diagnostics>

Then I’d start coding against the namespace as such:

public partial class _Default : System.Web.UI.Page
{
private TraceSwitch MyTraceSwitch = new TraceSwitch("MagicTraceSwitch", "switch from config file");
protected void Button1_Click(object sender, EventArgs e)
{
Trace.WriteLineIf(MyTraceSwitch.TraceInfo, "log some informational message.");
Trace.WriteLineIf(MyTraceSwitch.TraceVerbose, "log some verbose message.");
}
}

Since System.Diagnostics is part of the BCL, no extra dependencies are required and I can ship just my APIs with no other third party assemblies.  I have at least two problems with this approach, though: 1) the TextWriterTraceListener has no rolling log file capabilities and 2) the way I’m pushing messages out to the listener, via the Trace.WriteLineIf statement, means that only the exact string message in my argument gets inserted into my log—no nicely delimited line with timestamp, thread id, etc.

To address my second problem, I could use a different set of methods in the Trace class:

Trace.TraceError("an error happened.");
Trace.TraceInformation("this is an information trace {0} {1}", "parm1", "parm2");
Trace.TraceWarning("this is a warning trace {0} {1}", "parm1", "parm2");

Interestingly, though, these methods don’t throttle—they pay no attention to my TraceSwitch setting.  All these messages will get logged regardless of the verbosity setting.  The log entries will, however, include the values “Error”, “Warning”, and “Information” if that’s any consolation.  To format the log entry, the listener configuration includes a “Delimiter” attribute and a traceOutputOptions attribute.

To address the first problem, I’ve recently discovered the Microsoft.VisualBasic.Logging.FileLogTraceListener, thanks to this post on StackOverflow.  Here’s a configuration I’ve used for this listener:

    <system.diagnostics>
<switches>
<add name="MagicTraceSwitch" value="Warning"/>
</switches>
        <trace autoflush="true" indentsize="4">
<listeners>
<clear/>
<!--<add name="fileListener" 
type="System.Diagnostics.TextWriterTraceListener" 
initializeData="logs\SysDiag.log" 
traceOutputOptions="DateTime"/>-->
<add 
name="fileListener2" 
type="Microsoft.VisualBasic.Logging.FileLogTraceListener, Microsoft.VisualBasic, Version=8.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a, processorArchitecture=MSIL" 
CustomLocation="C:\data_files\app_development\LoggingTest\WebApplication1\logs"  
Location="Custom" 
BaseFileName="MyApp" 
LogFileCreationSchedule="Daily" 
Delimiter="|" 
traceOutputOptions="ProcessId, DateTime"/>
</listeners>
</trace>
    </system.diagnostics>

One thing to note about this listener is that, unlike the TextWriterTraceListener, the FileLogTraceListener seems to only accept an absolute file path, not a relative one—if you can deal with that, this listener seems to be a viable option—although if I were writing an end-to-end application, not just a framework, I’d definitely go with log4net for my logging solution.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Categories: Technology Blog
Posted by Brad on Saturday, February 21, 2009 3:15 PM
Permalink | Comments (1) | Post RSSRSS comment feed

Of low-tech address books and surname distributions

address_pageWarning: this post will either a) show that I’m obsessive/compulsive and must be stopped, b) prove that I’m a fool/moron and should be left to my own devices, or c) all of the above.

I’ve been reading David Allen’s Getting Things Done (GTD) with the hope of actually getting something done in 2009.  One of the core concepts of GTD is having a trusted system in which you place data on all your projects, next actions, calendar data, tickler entries, etc.  As a geek, I’d love my trusted system to be some electronic widget that can easily sync with my home and office Outlook, Google Calendars, and so on.  But should that widget be something I combine with my phone—as in a Blackberry, smart phone, iPhone, or whatever—or should it be something separate as in a Pocket PC, PDA, or the like?  Would a tablet PC or one of those tiny notebooks be a better choice?

Surprisingly, I’m finding that many tech-savvy GTDers are going to the low-tech moleskines, Hipster PDAs, and such.  For one thing, there’s no boot time with these low-tech organizers.  (Matthew Cornell has cataloged some discussion points on high-tech versus low-tech trusted systems here.)  So, for the time being, I’m going low-tech.

The wife bought me one of those Franklin-Covey planners and I began loading it into the seven-ring binder when I encountered a dilemma: the paper planner includes 17 double-sided blank address pages.  A little box in the upper right corner seems to shout, “write a letter in me!”  But what letter do I write?  If I write “A” on the first side of page 1, do I write “B” on the backside of page 1 or do I write “A” again and write “B” on the first side of page 2?  How many sides should I allocate for surname letters I perceive to be more popular, like R, S, and T?  To solve this dilemma, I thought I’d take a scientific approach to the problem: namely, google around to see if anyone has posted frequency data on surnames beginning with the various letters of the alphabet.  At this point, I would just be happy with data relegated to the United States.

Alas, as far as I can find, no one has published such information.  However, the United States Census Bureau has published surname frequency data from the 1990 Census (more info here).  This report is a sampling of around 88,000 surnames ranked by popularity and percentage frequency in the sample population.  Hmm.  How could I coalesce this information to show the percent frequency of surnames that begin with the various letters of the English alphabet?  I know, PowerShell (if you haven’t noticed, I’ve been on a bit of a PowerShell rant of late)!  Actually, the solution I came up with combined PowerShell and LogParser

I downloaded the report and found that it was persisted in a fixed width format.  Unlike Comma Separated Value (CSV) files, PowerShell doesn’t seem to deal well with fixed width data files.  MOW did some intense work here, but that looks like a lot of work, so I just borrowed this idea to convert the fixed width file to a CSV file and then process the results:

 

 

 

 

 

#from http://www.eggheadcafe.com/software/aspnet/33254303/fixed-width-column-import.aspx
$fixedWidthFile = 'C:\data_files\personal\other\dist.all.last'
$newCSVFile = $fixedWidthFile + ".csv"

if(!(test-path $newCSVFile))
{
    $pat = '^(.{15})(.{7})(.{10})(.{2})$'
    $rep = '$1,$2,$3,$4'
    (gc $fixedWidthFile) -replace $pat, $rep -replace ' ' > $newCSVFile
}

With the CSV file in hand, I could then write my LogParser query to perform my calculations.  I would have liked to use Linq for my queries and thus keep my solution purely within the domain of PowerShell, but I’m not quite sure how to use Linq in PowerShell—or if it’s even possible with version 1.0.  That’s one more research item to add to my list.

Anyway, while constructing my LogParser query, I found something interesting: the percent frequencies of the surnames in the sample don’t add up to 100%.  Rather, they add up to something like 147%.  I wasn’t expecting that.  So, I guess if I want to get an accurate percent frequency of a given surname (and then sum those percentages together for all surnames beginning with the same letter), I’d better divide the value by the true total, 147, and use those results.  Of course, I don’t want to hard-code that total value—I want my query to calculate it for me.

To do this, normally I would nest a SELECT SUM(Percent) AS TotalValue within my larger SELECT query that produces the report I’m interested in, but LogParser seemed to have a problem with this kind of nesting—my guess is that LogParser doesn’t like two or more SELECTs together that are reading from the same file.  Alright, I’ll perform two separate calls to LogParser: one to get my total percent value and the second to use this value to calculate the percentage of each letter in the sample population.  One thing I learned by doing that first query is that I can actually capture the LogParser results back into a PowerShell variable:

$query1 = "SELECT SUM(Field2) FROM " + $newCSVFile
$totalPercent = & logparser $query1 -i:CSV -headerRow:OFF -q:ON

Now, with that value in hand, I can use PowerShell and LogParser to render the report I’m looking for (notice that I’ve included a calculation for recommended Address Page allotment into my query by calculating the percentage frequency by 34—the number of blank address pages):

$query2 = "SELECT SUBSTR(Field1, 0, 1) AS SurnameLtr, `
        SUM(Field2) AS SummedFreqAmt, `
        DIV(SummedFreqAmt, " + $totalPercent + ") AS PercentOfFreqAmt, `
        ROUND(MUL(PercentOfFreqAmt, 34)) AS PagesToAllocate `
    FROM " + $newCSVFile + " GROUP BY SUBSTR(Field1, 0, 1) ORDER BY SurnameLtr"

& logparser $query2 -i:CSV -headerRow:OFF -rtp:26

And my report in the PowerShell console:

surnameDist1

And now I have a more reasoned way of allotting my blank address pages to particular letters of the alphabet without just guessing.  Of course, the page allotments total 35—I only have 34 to dole out.  And what happens if I meet someone with a last name starting in U or Z?

After I explained this work to my wife, she just said, “why don’t you keep the pages blank until you need them?  If you end up needing more, you can always buy more.”  Huh.  Never thought of that.  Guess that’s a more rational approach.  Darn it.

Currently rated 4.0 by 1 people

  • Currently 4/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Categories: Technology Blog
Posted by Brad on Sunday, January 04, 2009 9:26 PM
Permalink | Comments (2) | Post RSSRSS comment feed