Between Worlds

Being the Travels and Travails of a Programmer in the land of Engineers

Excel and You, Part II: .NET, C++/CLI, and Mixed-Mode DLLs

Last time, we looked at a straightforward way to create DLL “wrappers” allowing us to call libraries written in various languages from Excel and VBA, with a C++ library as our example. The general technique of providing works for any language or platform that supports an API for calling from C: O’Caml, Haskell, and Python, to name just a few.

Today I’d like to share some ideas for another common case: exposing functionality from a .NET library (assembly, really). The Microsoft .NET framework has seen remarkable uptake in the oil and gas industry and—regardless of its merits and the rumors of an impending return to a preferred native-application development model on Windows 8—it’s clearly here to stay for a while. The use of a platform providing automatic memory management and an “everything but the kitchen sink” library has considerable advantages for the development of many line-of-business and even engineering applications, and given the Microsoft monoculture we seem stuck with anyway, many companies in the industry have standardized on .NET (and C#, more often than not) as the only development platform for in-house application development.

Well, at least it’s not Java (kidding aside, while Java-the-language is dead and rotting, the JVM itself is the target platform for plenty of interesting projects). Let’s return to the case where we have some interesting functionality implemented in one or more .NET classes, compiled into a .NET assembly DLL. We may or may not have access to the (C#, VB.NET, …) source for the library, but the .DLL exposes all the metadata we’ll need to make use of it anyway. First, I’ll mention two solutions that I won’t elaborate on: one is prohibitively expensive, the other requires alteration to the library itself.

Microsoft Visual Studio Tools for Office (VSTO), which appears now to have simply been subsumed into certain editions of Visual Studio, is Microsoft’s recommended solution for creating Office add-ins in .NET. Presumably this includes some templates and tools that make this less difficult; in any case I’ve never had the opportunity to use it. Visual Studio is prohibitively expensive for a non-full-time software developer, and it’s simply unnecessary for most tasks: the .NET framework and Windows SDK are both available for free and contain all the compilers, linkers, headers, and libraries necessary to develop either native or .NET applications for Windows.

A more promising avenue is the .NET framework’s built-in capability for exposing types through COM. This can be done straightforwardly enough, but imposes constraints on the types to be exposed, requiring the library to have been built with COM interop in mind from the start. While a COM-visible type will provide trivial access to VBA (Visual Basic 6 essentially is COM reified as a language), this avenue may be untenable for a closed-source library, or one which simply can’t be altered to meet the prerequisites.

Love it or hate it (or more likely, never heard of it), C++/CLI is one of the most interesting things to come out of Microsoft in a while. Superseding the dead and little-mourned “Managed Extensions for C++”, C++/CLI takes a more radical approach to grafting the .NET Framework onto ISO C++. Syntactic extensions including new declarator syntax for GC’d handles and references to .NET types, “contextual keywords” like for each, and even the now-standard nullptr can make C++/CLI look like a completely new language, and some of the semantic additions—not to mention the “fast-track” ECMA standardization process—set C++ purists’ teeth on edge. All that aside, while I’d prefer to write ISO C++ (or Haskell, for that matter) given the chance, I think it’s a neat little tool for specific purposes that seems sadly to have fallen by the wayside at Microsoft.

While there doesn’t seem to be much of a future for C++/CLI, it’s still supported by the most recent Visual C++ compiler, and provides us with a nice way to expose arbitrary .NET types to native DLL clients such as Excel/VBA by creating mixed-mode assemblies. A mixed-mode assembly contains both .NET (managed) code and native code, allowing us to create a DLL which uses .NET types internally but exports a C-style interface for use by native clients.

Let’s assume we have a .NET assembly which exposes a type that does some interesting calculation which we want to expose. Here’s a C# sketch of the interface of this type:

namespace Util {

    public class CylonDetector : IDisposable {
        public CylonDetector()
        {
            ...
        }

        public bool isCylon(string name)
        {
            ...
        }

        public void Dispose()
        {
            // release resources---files, sockets, nuclear isotopes
            ...
        }
    };

}

Assuming this type is found in Util.dll, we can write a C++/CLI header importing the type and describing a C-style interface to it like so:

// In CylonDetectorNative.h

#using "Util.dll"

#include "msclr\auto_gcroot.h"

typedef msclr::auto_gcroot detector;

extern "C" {
    detector* get_cylon_detector();

    void release_cylon_detector(detector *det);

    bool is_cylon(detector *mdl, const char *name);
}

We can’t use a CylonDetector^ handle as an opaque “token” to return to native clients, as we did a FluxCapacitor* pointer in the last entry. Handles to CLR types on the .NET heaps aren’t simple integer “indices” like pointer but must track a single object across potential relocation due to garbage collection. As such, we’ll make use of the handy Microsoft-provided auto_gcroot<T> class template. As with the more common gcroot<T> template, auto_gcroot allows us to instantiate a C++ type wrapping a .NET handle. Unlike ordinary gcroot, its destructor will also invoke the appropriate Dispose method for the object behind the wrapped handle, if any. Microsoft’s (minimal) documentation for auto_gcroot can be found here, but its use is straightforward.

A simple implementation of the necessary glue code might look like:

// In CylonDetectorNative.cpp

#include "CylonDetectorNative.h"

detector* get_cylon_detector()
{
    try {
        return new detector(gcnew Util::CylonDetector);
    } catch (...) {
        return nullptr;
    }
}

void release_cylon_detector(detector *det)
{
    delete det;
}

bool is_cylon(detector *det, const char *name)
{
    try {
        return det->get()->isCylon(gcnew System::String(name));
    } catch (...) {
        // innocent until proven guilty...
        return false;
    }
}

Note that we must use the get member function of auto_gcroot to acquire the underlying .NET handle (CylonDetector^).

Compile and link the mixed-mode DLL using the /clr switch to the MSVC compiler, provide the appropriate VBA Declare statements as described last time, and we should be able to access our .NET type’s functionality from VBA, as below:

Public Sub UseDetector
    Dim detector As Long

    detector = get_cylon_detector()
    If is_cylon(detector, "Starbuck")
        Debug.Print "uh oh"
    End If

    release_cylon_detector detector
End Sub

As before, I’ve left out some preprocessor noise related to managing DLL name export, calling convention, and so on. I’ll end the post with links to full versions of the C++/CLI header and implementation files as well as the VBA declarations, but I first want to share some key knowledge gained during an all-day troubleshooting session for a DLL created as described above.

Suffice it to say that I had created a C++/CLI wrapper for a type much like CylonDetector, using the general technique described above. I compiled and linked a mixed-mode DLL, and was successfully able to link to and call the C-style exported interface from other C++/CLI code.

However, when I attempted to call my DLL functions from Excel, I was met with a rude surprise. On calling the equivalent of get_cylon_detector, I received neither a valid nor a null pointer. Instead, Excel crashed. At first I suspected that the fault was to be found in the component actually provided by the .NET library (hint: it was MATLAB).

After reproducing the crash several times, each yielding a slightly different error dialog (and some yielding no error dialog at all), I finally received a Windows error reporting dialog which revealed the presence of an unhandled Windows (SEH—not C++ or .NET—) exception. Expecting to find that I was seeing an error inside the MATLAB runtime wrapped by the .NET library, I google’d the strange exception code 0xE0434F4D and made a shocking discovery:
"It's IN the frakkin' ship^H^H^H^H CLR!"

The Win32 exception was a generic wrapper for an underlying CLR (.NET) exception! But how could my code, which diligently caught any and all exceptions from the .NET library, be throwing a CLR exception? After a few minutes (well, more than a few) of consternation, I decided to attach to EXCEL.EXE in a debugger and attempt to reproduce the crash, hoping something would “pop out”.

And it did. An item in the output window at the time of the crash referenced an EEFileLoadException. Another Google search led me directly to a StackOverflow page, and an answer. When a mixed-mode assembly needs to load a pure CLR assembly, the places it looks aren’t intuitively obvious, viz., they do not include the present working directory. The CLR exception thrown when loading failed wasn’t being caught because it didn’t conceptually occur inside any try-catch block, rather, my function get_cylon_detector had been invisibly rewritten as:

detector* get_cylon_detector()
{
    // not real code, may throw EEFileLoadException
    LoadManagedAssembly("Util.dll");
    try {
        return new detector(gcnew Util::CylonDetector);
    } catch (...) {
        return nullptr;
    }
}

As there was no way to catch the EEFileLoadException, it was silently translated to a Win32 exception and passed on to Excel, resulting in disaster. The solution, as presented on the StackOverflow page, was to directly add the correct path to the .NET assembly resolution process. Full code will be presented below, but the get_cylon_detector function ended up split into two parts to catch the potential “invisible” exception:

// "Invisible"
// {
// LoadManagedAssembly("Util.dll")
static detector* real_initialize()
{
    return new detector(gcnew Util::CylonDetector);
}
// } may throw EEFileLoadException

DLL detector* DLLCALL get_cylon_detector()
{
    try {
        System::AppDomain::CurrentDomain->AssemblyResolve +=
            gcnew System::ResolveEventHandler(
                    AssemblyResolver::MyResolveEventHandler);
        return real_initialize();
    } catch (...) {
        return nullptr;
    }
}

Finally, I’ve uploaded the full header, implementation, and declaration files for the CylonDetector example. Note the trick used in CylonDetectorNative.h to allow the header to be included from native C++ code to call the mixed-mode dll.

That’s all for yet another long post! Next time, hopefully something a little higher-level with less bit-slinging.

Excel, VBA, C++, and You: It’s Not Pretty, But It Pays The Bills

What’s the most popular programming language in the world? I’ll give you some hints:

  • It’s a (first-order) functional programming language.
  • It’s a dataflow language: the order of expression evaluation is generated from a topological sort on the dependency tree.
  • Like an assembly language, the only abstractions you get are addressable memory cells, built-in operation, and global names.
  • Engineers have a love for it bordering on perverse.

The answer is, of course, Microsoft Excel. While accountants and so on may press-gang the poor spreadsheet into serving as an ersatz database, engineers tend to use—often unknowingly—spreadsheets for what they truly are: a simple dataflow programming model. In the oil & gas industry, I’ve seen spreadsheets built to implement volumetric reservoir models, completion-design tools, and many other engineering calculations.

Strictly speaking, Excel is not quite as “powerful” as the programming languages one thinks of when one thinks of “programming languages”, in a specific and technical sense. Almost all general-purpose programming languages implement models of computation which are Turing-complete: the set of functions computable by programs written in these languages (ignoring obnoxious practical considerations like memory and CPU time!) is precisely the set of functions computable by a Universal Turing Machine, or in the lambda calculus, and so on. Key to Turing-equivalence is the allowance for unbounded recursion: this is why it’s possible to write infinite loops or non-terminating recursion in these languages.

Excel allows recursive formulas, but will only evaluate them up to a fixed depth of recursion which must be specified prior to execution (and is, as I recall, bounded at 32,767). This puts Excel spreadsheets in the class of computational models which can compute the primitive recursive functions (as expounded upon by Douglas Hofstadter’s “Bloop” language, this is equivalent to a programming language whose only looping construct is a FOR loop with a fixed upper bound specified prior to entering the loop), albeit with an additional implementation constraint.

In any case, writing “software” in Excel is fairly unpleasant: while the dataflow paradigm has advantages, the “language” itself feels like an assembly language: lots of memory cells, global names only, and you can’t even write re-usable subroutines! And so engineers who need to write non-trivial Excel programs end up turning to the built-in (and Turing-complete) macro language, Visual Basic for Applications. To damn VBA with faint praise, it is certainly superior to Excel formulas for writing complex algorithms. I won’t detail why VBA has such a bad rap among programmers here (google “VBA sucks” or the moral equivalent), but will note that the griping is entirely justified. Glaring design inconsistencies that create pitfalls for the unwary, a programming model that encourages shallow understanding and bad design decisions, and the most obnoxious “IDE” in the history of man, to start.

And so we prefer to write our code in Real Languages, whatever those are, but eventually somebody will need to call code we’ve written in a Real Language from Excel/VBA. One option is to use the Excel SDK and build an XLL add-in, but this gets fairly tedious and is a lot of work to expose what’s often just a few operations. For this common case, I’ve found it more useful to provide a C-style interface to the functionality in a Windows DLL and a VBA module which provides the appropriate declarations to use the DLL. This alternative I will illustrate presently.

Although it’s not my favorite Real Language, I will confess a certain (perverse?) fondness for C++: it provides a decent level of expressiveness and some useful abstractions while still compiling nicely to native binaries, as well as simple interoperation with the C substrate that underlies all modern consumer operating systems. Because of this fondness and the ready availability of tools and programmers—the last thing you want to do in a corporate environment is write code in a language that only you understand or which depends on one “reference implementation” compiler, unless you value the type of “job security” that comes from maintaining your code for all eternity—I often end up implementing engineering tools as C++ libraries.

In general, these expose interfaces in the form of C++ classes, like so:

// In FluxCapacitor.h
class FluxCapacitor {
    public:
        FluxCapacitor(double flux_intensity);
        int capacitate(double jigawatts, const string& name);
        ~FluxCapacitor();
}

Now, while it is possible to expose C++ classes from DLLs under Windows, it’s generally Not A Good Idea, and in my experience it’s much simpler and easier to expose a “C-style” interface to the same functionality. Such an interface might look like so:

// In FluxCapacitorDll.h

// only need forward declaration to use FluxCapacitor*
class FluxCapacitor;

extern "C" {
    FluxCapacitor* get_flux_capacitor(double flux_intensity);
    int flux_capacitor_capacitate(FluxCapacitor* fc,
            double jigawatts, const char* name);
    void release_flux_capacitor(FluxCapacitor* fc);
}

We could then implement the “glue” code in a straightforward way. In essence, clients will treat FluxCapacitor* pointers as opaque handles. For pointers to C++ base classes, this will even yield the correct behavior for virtual functions, including destructors.

// In FluxCapacitorDll.cpp

#include "FluxCapacitor.h"
#include <string>

FluxCapacitor* get_flux_capacitor(double flux_intensity)
{
    return new FluxCapacitor(flux_intensity);
}

int flux_capacitor_capacitate(FluxCapacitor* fc,
        double jigawatts, const char* name)
{
    return fc->capacitate(jigawatts, std::string(name));
}

void release_flux_capacitor(FluxCapacitor* fc)
{
    delete fc;
}

And so on. Of course in the messy world of practice, there are a few other issues that will pop up: calling conventions, DLL name-export machinery, exception-safety, but I’ll come back to those (in fact I’ll present my trusty basic #defines for Windows DLLs with MinGW or MSVC). In any case, once our DLL is compiled and linked, we’ll need to expose the functionality to Excel and VBA. This is accomplished through the use of VBA Declare statements.

The best reference I’ve found on the topic is the final edition (1997!) of Bruce McKinney’s Hardcore Visual Basic, available free on the web at http://vb.mvps.org/hardcore/. Chapter 2 explains how to write VBA (or VB5 or VB6) Declare statements, matching VB data types to C data types.

Of course, this correspondence is platform-specific (we’re not in ISO C++ anymore, Toto!), but then so are DLLs, VBA, two’s-complement, 8-bit bytes, and so forth. In any case, the information in McKenney’s book is correct for both current MinGW gcc/g++ and MSVC under 32-bit Windows. I have not had the opportunity to write for 64-bit Windows yet, so the determination of corresponding types between 64-bit Office VBA (I’m actually pretty sure this doesn’t exist) and object code emitted for x86_64 by MSVC or gcc is left as an exercise for the reader.

Returning to our previous example, an appropriate VBA module might look like:

' n.b. under x86 Windows sizeof(void*) == 4,
' corresponding to the VB6/VBA Long type
Public Declare _
Function get_flux_capacitor Lib "FluxCapacitor.dll" _
    (ByVal flux_intensity As Double) _
As Long

' n.b. under x86 Windows sizeof(int) == 4,
' corresponding to the VB Long type
' also, in Declare statements, ByVal String
' parameters will be passed as pointers to
' null-terminated strings (i.e. char*)
' VB natively uses COM BSTRs, which are like
' Pascal strings.
' this hack has been in VB for a long, long time
Public Declare _
Function flux_capacitor_capacitate Lib "FluxCapacitor.dll" _
    (ByVal fc As Long, ByVal jigawatts As Double, _
     ByVal name As String) _
As Long

' n.b. C/C++ void functions
' must be declared as Subs in VB
Public Declare _
Sub release_flux_capacitor Lib "FluxCapacitor.dll" ()

We can now use these functions from VBA, or—under certain conditions—from Excel formulas. Usage would look like:

Function DoItWithAFluxCapacitor (intensity As Double) As Long
    Dim fluxCapHandle As Long

    fluxCapHandle = get_flux_capacitor(intensity)
    DoItWithAFluxCapacitor = capacitate(fluxCapHandle, 1.21, _
        "Dr. Emmett Brown")
    release_flux_capacitor fluxCapHandle
End Function

That’s really all there is to it, save for some final messy practical considerations. First, the signatures for functions exported to the DLL need to be modified to reflect the platform-specific calling convention (stdcall) necessary to be called from VBA and alert the compiler/linker that the functions should be exported to (or imported from, when calling from other C or C++ code) the DLL. With appropriate preprocessor macros this can be made to work nearly automatically across MinGW and MSVC.

Additionally, I’ve left out error handling—and specifically exception handling—in my DLL interface functions. While Windows allows exceptions to be thrown from DLL functions, it’s fairly flaky (for example, MinGW and MSVC compiled code use different exception mechanisms) and most DLL clients (e.g. VBA) won’t be able to catch them anyway. Therefore, I usually catch C++ exceptions from my libraries and instead return “error codes”: a throwback to the “bad old days” perhaps, but reliable.

Depending upon linker settings and the presence or absence of a .def file, the DLL created may or may not export undecorated names. If you don’t want to use a .def file or force the export of undecorated names, it may be necessary to use the Alias option in your VBA Declare statements, like so:

Public Declare _
Function flux_capacitor_capacitate Lib "FluxCapacitor.dll" _
Alias "_flux_capacitor_capacitate@16" _
    (ByVal fc As Long, ByVal jigawatts As Double, _
     ByVal name As String) _
As Long

The exported names can be determined by inspecting the DLL; for MSVC the convention for extern "C" __stdcall functions is _func@total_size_of_arguments_in_bytes where argument types less than 4 bytes in size are promoted to 4 bytes.

Finally, it’s worth noting that there’s one big drawback to this technique: the client (VBA, etc) programmer is now responsible for ensuring that she properly manages objects returned from the library: she must be sure to release the handles to library objects when finished using them, especially if their destructors do non-trivial resource cleanup work.

I’ve uploaded some expanded sketches of the header and implementation files excerpted earlier, including my usual set of preprocessor macros for DLL building under MSVC or MinGW.

While I’ve illustrated the case of using a C++ library, the general technique is applicable to any language which provides a mechanism to be called from C. These days, that’s a very large number of languages. My next post will look at the specific case of creating a simple C-style DLL wrapper for a .NET class library that doesn’t expose itself through COM, using C++/CLI.

If I Had to Classify Programming Languages…

At last a substantive post again, although I’m going to cheat a little. As I was looking through some old files earlier today, I came across a little document which I had begun to write last year.

At the time I’d recently fielded several questions from friends about programming: specifically, about programming languages. It’s a confusing world for novice (or even soon-to-be-novice) programmers, with lots of conflicting advice about which languages are worth learning, which are easiest to start with, and whether the differences between them even matter.

Couple this with a popular understanding among practicing programmers that borders on abysmal when it comes to evaluation strategies, type systems, abstraction paradigms, and even computational limitations of programming languages, and it’s really a minor miracle that beginners ever can come to understand any of this stuff at all.

And so filled with hubris I sat down to write about how I would classify programming languages, if somebody made me. I never quite finished, but the remaining empty sections hint at my conceptual outline.

If I were starting from scratch, I think I’d drop the “s and w words” (strongly and weakly typed) entirely: I’m no longer certain they mean anything at all. However, I can’t quite buy in to the notion that soundness of a type system is binary: “sound” or “unsound”. It seems that there must be some definable way in which, say, the C++ type system makes more guarantees than the C type system.

I’d like to finish this; I think I’ll start by reworking the current sections on polymorphism in such a way to include and meaningfully delineate everything from Python’s “duck typing” to Java “interfaces” to C++ templates to Haskell typeclasses and ML modules/functors. This will require some serious thinking though!

In any case, I’d love anyone—novice and expert alike—to take a look and share their thoughts and feedback. Maybe this can help make sense of some of the confusion.

If I Had to Classify Programming Languages…

Now, to tackle that AI final!

Eleven/Eleven/Eleven

It’s now been well over two years since my last post. I guess it’s safe to say my little blogging experiment didn’t quite pan out. I’ve changed jobs twice, moved to two new cities, and generally experienced a lot of life in the intervening time.

Frankly, in 2009, I quickly found there wasn’t that much to write about: I was working as a field engineer–long hours, middle of nowhere–and opportunities to “make things better” with computing were few and far between. I read a lot, and picked up a few new ideas in my limited spare time.

Now, everything has changed. I’m working as a reservoir engineer in my employer’s exploration and new-ventures group in an increasingly bustling metropolis of the central United States. Where previously my job mostly revolved around routine calculations and overseeing ongoing operations, now every day brings me face-to-face with:

  • uncertainty
  • optimization
  • massive data sets

It is, in brief, the perfect place for a “programming engineer”. I’ve been able to apply ideas from machine learning, artificial intelligence, and global optimization to analyze data sets, make recommendations, and develop predictive models. I’m also currently enrolled in the excellent online AI class and machine learning class offered by Stanford University.

I would like to start blogging again as the new year approaches: I have a lot of new ideas to share, and many opportunities to explain how leveraging computer science can make life a little easier in other technical fields of endeavor. Welcome back, and I hope you’ll stick with me as I strive to produce some interesting content.

Walking the Line

As a first real post, I’ve decided to throw out this little script to illustrate the sort of common problems that become much easier with a few programming tools hanging off your belt.

Here’s the situation: I’m currently working a economic analysis of a proposed new compressor installation. This involved obtaining estimated costs for the construction of several thousand feet of new pipeline. In turn, this required us to accurately map the existing pipeline! This was accomplished by use of a handheld GPS unit—a process affectionately known as “walking the line.”

However, when we e-mailed the GPS coordinates to our GIS-operating counterparts, we were informed that coordinates in degrees-minutes form (e.g. 32°, 37.912′)—the form provided by our GPS unit—were not acceptable for the mapping software in use.

We were, then, faced with the task of converting dozens of coordinate pairs in this form to decimal degrees (e.g. 33.63°). Our current data was stored in a Microsoft Excel spreadsheet, with a layout like so:

Latitude Longitude Comments
32* 37.912′ N 95* 02.599 W Blah blah blah whatever.

Several options immediately presented themselves:

  • Do it by hand—manually extract the minutes from each lat or long, and use an Excel formula to do a simple divide-by-60 and add. To my mind, this represented what Larry Wall calls “false laziness.”
  • Work out by muddling through MS Excel’s on-line help whether string manipulation functions might exist within Excel to help me parse out the minute values, or write a VBA macro to do so. Um, no thanks.
  • Use the right tool for the job!

Needless to say, I chose door number three. In this case, I felt that the best tool for the job was the Perl programming language. Perl is a dynamic, interpreted programming language with very strong text processing capabilities integrated tightly with the syntax of the language. In my eyes, this—as well as various other quirks of the language—makes it ideal for “one-off” scripts like this one which “filter” text (“filter” is commonly used terminology, especially in the UNIX world, for programs which transform input text into output text according to given rules, usually on a line-by-line basis). I feel strongly that every engineer should be familiar with at least one language suitable for text processing and “scripting”—such as Perl, Python, or Ruby. But more on that later!

The first step was to extract our data from Excel’s opaque binary file format into something a little more plain-text. Luckily, Excel 2003 allows files to be saved in comma- or tab-delimited format, and Excel 2007 allows arbitrary delimiters. I happened to be using Excel 2003, so I saved my spreadsheet as a plain-text file containing a line of values, separated by tabs, for each row in the spreadsheet. I don’t particularly care for tabs as delimiters—visually, they’re hard to distinguish—so I opened this file in vim, my favorite editor (and one well worth investing the time to learn, in my opinion) and did a simple :%s/^I/:/g—problem solved. More “conventional” editors might call this something along the lines of “search and replace,” I suppose. In practice, of course, the choice of delimiter is arbitrary so long as it does not conflict with a character found in the data.

Finally, I could write a simple script to extract the “minutes” part of each lat/long and compute a decimal-degrees value. The script acts as a filter, reading delimited lines from standard input (or files specified as command-line arguments, thanks to Perl’s magic <> operator!) and writing delimited lines to standard output. dm-to-dec.pl is presented below in all its slapped-together, uncommented glory.

while (<>) {
    if (/(\d+)\* (\d+\.\d+)'? N:(\d+)\* (\d+\.\d+)'? W:(.+)/) {
        $dec_deg_lat = $1 + ($2 / 60);
        $dec_deg_long = ($3 + ($4 / 60)) * -1;
        print "$dec_deg_lat:$dec_deg_long:$5\n";
    } else {
        print $_;
    }
}

This post is not intended to be a primer on regular expressions, one of which forms the meat of the script (although they are one of the most useful programming topics to study for engineers, in my opinion). I’ll step through the pattern which is matched against each incoming row, though, to illustrate the power of regexes for this application.

Briefly, the /(\d+)\* (\d+\.\d+)'? N:(\d+)\* (\d+\.\d+)'? W:(.+)/ pattern matches the format of each row and extracts the relevant values. Specifically, it matches one or more digits [(\d+)] and captures them as $1, followed by an asterisk, followed by a space, followed by one or more digits, a decimal point, and one or more digits [(\d+\.\d+)]—captured as $2—, followed by an optional minute sign [‘?], followed by a space, followed by the sequence N-colon, followed by one or more digits [(\d+)] captured as $3, followed by another digits-decimal point-digits group [(\d+\.\d+)] captured as $4, followed by an optional minute sign, followed by a space, a W, and a colon, followed by the rest of the line [(.+)], captured as $5.

We thus end up with the following variables:

Variable name Contains
$1 Latitude, °N
$2 Latitude minutes
$3 Longitude, °W
$4 Longitude minutes
$5 Comments, etc.

Now, we can calculate the decimal degrees latitude as $dec_deg_lat = $1 + ($2 / 60); and the decimal degrees longitude as $dec_deg_long = ($3 + ($4 / 60)) * -1;. Finally, we’ll output the filtered row: print "$dec_deg_lat:$dec_deg_long:$5\n";

Now, wasn’t that easier than doing it the long way?

Welcome

Welcome to Between Worlds!

I’m Derrick Turk, your host, and I have never blogged before in my life.
In fact, I’ve never even wanted to. Crippled from early adolescence with an allergy to popularity and mainstream acceptance, I’d written off blogs as just another stupid self-indulgent fad in a history full of stupid self-indulgent fads.

I’ve realized now that blogs are here to stay—and I’ve realized that though there may be a growing chorus of air-headed morons echoing in the ether, there are also those out there with something more interesting to say.

I’ve got something to say. You get to be the judge of which category it falls into.

This is a blog based on an idea that’s been percolating in my head since I was in school, way back in aught-seven or aught-eight. I am, as you may have gathered, something of a self-taught programmer. I am also a gainfully employed (for the time being—fingers crossed!) engineer. We engineers, though, are woefully shortchanged in our educations when it comes to computing. That is to say, that though we are taught to run certain computer programs, for the most part nobody bothers to teach us how to use a computer, much less why on earth we should care to.

The truth is that nobody shows us how we can use these fantastic devices to make our lives easier. We muddle through “Hello World” in yet another proprietary “programming language” that amounts to little more than a LAPACK wrapper. We are subjected to sponsored inanities of the worst degree. And by the end of it, after the 137th attempt to click-and-drag a wire onto the right point in LabVIEW(tm), or melting our brains trying to make a VBA macro do something—anything!—a lot of us just give up on programming as a useful tool.

Try to block those memories now. There’s a world out there they didn’t show us in school. Let me take you on a little tour…

Post Navigation