Sublime Software: May 2005

Sunday, May 15, 2005

Generic Programming in C++ part 3a: Palindromes

If you've been to an interview where you've been asked to write some C++ code, there's a fair chance that you'll have come across the "isPalidrome" problem.

It goes something like this: A palidrome is a group of letters that is the same written backwards or forwards. The challenge is to write a function that tests whether a given string is a palindrome.

A typical solution may look something like this:

bool isPalindrome(const std::string &str)
{
    std::string::const_reverse_iterator itEnd = str.rbegin()++;
    for (std::string::const_iterator itStart = str.begin(); itStart < itEnd.base(); ++itStart, ++itEnd)
    {
        if (*itStart != *itEnd)
        {
            return false;
        }
    }    return true;
}

(Actually, if you're in an interview you may have forgotten to start at before the one-past-end iterator and you may not have used a reverse iterator or perhaps forgot the call to base when comparing iterators and reverse iterators. Those kind of things you can be forgiven!)

That solution works fine. Use two iterators, one pointing to the first and one to the last characters (which, as mentioned above is one before that returned by end). Compare the characters and move the iterators toward the middle of the string. If any comparison fails, the string is not a palindrome. Finish when the start iterator is past the end (note that this takes into consideration both finishing conditions ie when there is an even or an odd number of characters in the string).

[There are other algorithms but we’ll stick with this one for now.]

We can do better. Why not write it generically? It may not be obvious to you, but std::string is a container. We can apply this same algorithm for any other container. Consider writing reusable functions generically – especially after ‘getting them right’ non-generically.

template <typename RaIterator>
bool
isPalindrome(RaIterator begin, RaIterator end)
{
    for (--end; begin < end; ++begin, --end)
    {
        if (*begin != *end)
        {
            return false;
        }
    }
    return true;
}

Something has changed; we’re assuming that two random access iterators have been passed in. We need random access iterators because of the less than operator, which is only provided on these kind of iterators. And you’ll not that we’ve used a bidirectional operation – the decrement operator – instead of using a reverse iterator. That’s to avoid weird syntax – asking the user to supply two different kinds of iterators is unusual (and prone to error!) and you won’t see that done in any standard algorithm.

Usage is slightly different, now we have to pass in stringname.begin() and stringname.end() instead of just stringname. But again that’s consistent with the Way Of The Standard. (It also has an advantage - we can operate on slices of a container.)

So how is this better? Well, if we have a vector of characters we can use this function. Or if we have a vector of longs. Or a deque of longs. Or an array of longs. We could even have a vector of strings that form a sentence and use this function to determine if the sentence is a palindrome!

You get the picture. Generic is better.

Unfortunately if we have a list or set of chacters we can’t use this function – you’ll get an ugly compilation error. Why? Because lists only provide bidirectional iterators. And bidirectional iterators don’t give us that lovely less than operator. So any container that provides anything less than random access iterators will not compile. Bugger! I wonder if we can get around that limitation…

Tune in to part b where I improve our isPalindrome function further, so we can use it with lists!

[I’ll admit that thinking at this level of abstraction is difficult in an interview situation but when you start thinking generically you’ll find that it won’t be too much of a stretch. Disclaimer: In the last interview I sat I wrote the first example…]

Thursday, May 12, 2005

Spicks & Specks

Last night a friend (Heya Age! When are you going to start your own blog?) and I went and watched the recording of two episodes of the ABC’s music trivia show Spicks and Specks.

Gotta say, we both had a great time (though two episodes was bordering on too long) and I can recommend anyone go see it. You’ll have to grab the phone number (I can’t remember it and they don’t have it on the website, sheesh!) at the end of the show to register interest for tickets, which are free. Filming is at ABC studios in Elsternwick.

Adam Hills was a terrific host that earned a great deal of respect from us just because he was so honest, approachable, funny and real.

Myf and Alan were also damn funny – I lost it when Alan broke his chair! You’ll see what I mean in six weeks or so… The guests were also great – I don’t want to name them and spoil the surprise but one half of Lano & Woodley was on and in fine form!

Adam and the team really indulge the crowd, playing up on heaps of gags that will never make it to air. Thanks everyone at the ABC for a terrific show.

Anyways, if you don’t watch the show it’s great fun and is on Wednesday nights at 8:30. Enjoy!

Generic programming in C++ part 2b: A better example with CComSafeArray and standard containers

In the last post I asked the question "How do you normally copy containers". Let’s explore that.

A typical way would be to use the standard algorithm copy. (Note that there are many ways to copy containers, some are even more efficient, but this one suits the purposes of my explanation)

std::list<long> listOfLongs;

// Fill list here

std::vector<long> vectorOfLongs;
std::copy(listOfLongs.begin(), listOfLongs.end(), std::back_inserter(vectorOfLongs));

copy takes a range and an output iterator. The range is iterated over and each element is copied to the output iterator – after each copy the output iterator is incremented. It’s expected that the ‘thing’ that the ouptut iterator points to can hold all of the elements in the range.

For a vector (or, in fact, any automatically-sized container) this often isn’t what you want. Instead, you often want elements to be added to the container. But how should the algorithm add elements to a container? In fact how can an algorithm add elements to a container when, I’m sure you remember, algorithms don’t know anything about containers! Algorithms only know how to use iterators.

This may not be crystal clear yet so I’m going to make a point of it. Algorithms only know about iterators. They know nothing about containers. This is intentional and A Good Thing. Why? Because by enforcing that limitation algorithms can work for any container you provide – assuming you also provide iterators for that container (and that the iterators meet the requirements of the algorithm).

Now, we were trying to figure out how an algorithm can add elements to a container. The C++ way to do that is via an inserter which, in standardese, is a form of iterator adaptor. An inserter provides an iterator interface for a container. The algorithm ‘sees’ an iterator. But when the algorithm tries to assign to that iterator, the inserter kicks in and does something different. That ‘something different’ depends on what inserter you’re using but ultimately it (usually) adds an element to the container. It also does nothing when the iterator is incremented. If you think about that for a second that’s exactly what we were trying to do.

So how does the inserter know how to add elements to the container? Doesn’t that mean that it needs to know about the specific container type? No. Well, kind of. Inserters are templated on the container type. If we look at the example above, we’re using back_inserter. That particular inserter requires that the supplied container has a push_back method. Conveniently, vector does. Had the inserter been front_inserter, which requires push_front, compilation would have failed because vectors can’t insert elements at the front (efficiently).

So what does all this have to do with CComSafeArray? CComSafeArray is (almost) a container and we’re going to write an inserter to allow us to use them in algorithms when an output iterator is required. That means that we’ll be able to use the standard copy algorithm to copy elements from any container to a CComSafeArray.

template <typename Container>
class add_iterator : public std::iterator<std::output_iterator_tag, void, void, void, void>
{
protected:
    Container& container;

public:
    explicit add_iterator(Container & c) : container(c) {}

    template <typename ValueType>
    add_iterator<Container>&
    operator=(const ValueType& value)
    {
        container.Add(value);
        return *this;
    }

    add_iterator<Container>& operator*()
    {
        return *this;
    }

    add_iterator<Container>& operator++()
    {
        return *this;
    }

    add_iterator<Container>& operator++(int)
    {
        return *this;
    }
};

template <typename Container>
inline add_iterator<Container> add_inserter(Container& c)
{
    return add_iterator<Container>(c);
}

OK, so what’s going on here? An inserter has two parts. There is a class that is the iterator and does all the grunt work. There’s also a templated convenience function that creates and returns an instance of the iterator. That’s a fairly common idiom in the standard (see make_pair) and permits nicer syntax by taking advantage of the fact that templated functions can infer types. We mentioned back_inserter and front_inserter before, these are actually convenience functions that create an instance of the iterator classes back_insert_iterator and front_insert_iterator respectively. Without those convenience functions the syntax would be less appealing and more prone to error. For example, the code above would look like this:

std::copy(listOfLongs.begin(), listOfLongs.end(), std::back_insert_iterator<vector<long> >(vectorOfLongs));

So that explains the add_inserter function. Digging deeper, the add_iterator is actually pretty straightforward. It publically inherits from iterator which is a standard library class that defines a number of useful typedefs that all iterators are required to define. It’s just a convenience class saving you from defining a few things yourself. Assigning does what we described above, that is, it just calls Add (which must exist) on the container (you’ll notice that Add is the method that CComSafeArray provides to append elements). Incrementing (and dereferencing) are no-ops.

I glossed over the fact that assignment operator is a templated function. This usually isn’t necessary because containers are meant to expose the type that they contain. If you dig into any of the standard library provided iterators you’ll find that the type that the assignment operator takes is known. Normally we could have defined the assignment operator like so:

add_iterator<Container>&
operator=(const Container::value_type& value)
{
    container.Add(value);
    return *this;
}

Containers are meant to provide a definition for value_type.

Unfortunately, CComSafeArray doesn’t (hence the reason it’s almost a container) so we have to make the assignment operator a function template. No big problem, it just means that if you try to assign the wrong type via the inserter then your error message may be fairly cryptic – something like: “cannot convert type X to CComSafeArray<T>.Add() where T is Y”.

You may also have picked up that this inserter isn’t actually tied to CComSafeArrays. If you have any other containers that use an Add method to add elements then this inserter will work just as well for them.

And that’s it! We’ve harnessed one of the great strengths of the STL design – extensibility – and created an inserter, a form of iterator adaptor, to a container-like class so that we can make use of existing standard algorithms. Sweet!

Stay tuned for Part 3 where we give the age-old interview question “isPalindrome” a generic workover!

Friday, May 06, 2005

Generic programming in C++ part 2a: A better example with CComSafeArray and standard containers

Shortly after posting the original entry I thought of a better way to create a CComSafeArray with standard containers. Fifteen minutes later - much less time than it took to write the blog post! - I had working code.

Because I want to get some sleep now I'll have to post the code later. In the meantime I want you to think about it!

OK, you want a hint? How do you normally copy containers - say from a list to a vector?

In part 2b I'll show you how to write very similar code that works for CComSafeArrays.

Thursday, May 05, 2005

Happy Birthday Dave

Happy birthday Dave Winer! Thanks for keeping the simple things simple and for consistently demonstrating the importance of maintaining your credibility in the face of adversity.

Tuesday, May 03, 2005

Generic programming in C++: An example with CComSafeArray and standard containers

For those of you developing in C++ with COM you’ll be familiar with the all-too-common SAFEARRAY. You’re probably also familiar with CComSafeArray which is a useful class to wrap up the rather raw SAFEARRAY and aid in memory management.

This blog entry isn’t meant as a primer for SAFEARRAY and CComSafeArray usage, there are plenty of good sites for that.

What I would like to talk about is using generic programming for day-to-day tasks.

What is generic programming? Good question, tough to answer. In C++, generic programming is writing code that works for many types by coding solutions based on concepts. A concept defines what the supplied abstract datatype are required to provide in order to get the job done.

Confusing? Initially it may well be. How about we look at a real (albeit trivial) example:

template <typename T>
T 
max (T a, T b)
{
    return (b < a) ? a : b;
}

In the code above, max is a function that takes in two items (of the same type) and returns the largest of the two. The above code can be used for any T provided it supports the less-than operator. [In standard-library lingo T (the abstract datatype) must satisfy the concept of “Less Than Comparable”.]

This code is maximally reusable and, as such, is A Good Thing. It’s an example of generic programming because, in and of itself, max doesn’t know about T and thus works generically for any type (that supports the less-than operator). We can write max once and it’ll work for all types. Beautiful.

In my experience, occasions to create generic code like max occur more frequently than most developers realize. I believe it’s because it takes awhile to become accustomed to “seeing” generic solutions. For example, consider if our generic version of max didn’t exist but we needed a function to compare two integers. It would be all too easy to write:

int
max (int a, int b)
{
    return (b < a) ? a : b;
}

Then, when we have longs to compare, or floats, we could copy and paste the code and change “int” to “long” or “float”. Avoid such code! Duplicating code is evil and should be avoided. If you notice yourself copy and pasting code regularly it may be an indicator that you could benefit from generic coding techniques.

Here’s a real world example I came across the other day at work.

We use SAFEARRAYS and CComSafeArrays often. We also need to manipulate them in various ways so we often covert them to standard containers as they have a wide variety of algorithms that we can employ. Usually we have to convert them back (to pass back over a COM API). Someone had noticed this and written a conversion function:

// Remember to delete the returned pointer
SAFEARRAY *
CreateSafeArrayFromVector(std::vector<long> vec)
{
    CComSafeArray<long> sa;
    for (std::vector<long>::iterator it = vec.begin(); it != vec.end(); ++it)
    {
        sa.Add(*it);
    }
    return sa.Detach();
}

In fact there were a whole bevy of these conversion functions. CreateSafeArrayFromVector that worked for floats. For doubles. On unsigned longs. We also had to work with std::lists so CreateSafeArrayFromList was born, duplicating all the vector functionality. Remember that much of the code is practically identical, we’re just adding things to a safe array! What a mess.

A single generic function could (and has) replaced it all:

template <typename FwdIterator>
CComSafeArray<typename std::iterator_traits<FwdIterator>::value_type>
CreateCComSafeArray(FwdIterator first, FwdIterator last)
{
    CComSafeArray<typename std::iterator_traits<FwdIterator>::value_type> sa;
    while (first != last)
    {
        sa.Add(*first++);
    }
    return sa;
}

Much cleaner! How is it used?

std::vector<T> handles;
//...
CComSafeArray saHandles = CreateCComSafeArray(handles.begin(), handles.end());

You’ll notice a few changes. I chose to use CComSafeArrays because there’s no obvious reason not to. It increases type safety and decreases the chance of memory leaks. You’ll see that we pass in iterators to the function now. That allows abstraction from the specific container type. If you’re familiar with using the standard library you’ll notice that this is a fundamental tenet in algorithms – all of them do the same thing. There’s an important point here, if you’re passing containers in your function declarations you’re not programming generically. Instead, create a templated function and expect iterators to be passed in. Tricky to grasp at first but so clean when you bend your brain around it (repeat after me, “algorithms operate on iterators!”).

You’ll also see something about iterator_traits. What the? Without going in to too much detail (again, it’ll have to be another article), iterator_traits is supplied by the standard library to provide compile-time information about the supplied iterator. std::iterator_traits<T>::value_type resolves to the type that T points to. [There’s a whole bunch of things that iterator_traits provides.]

I’ve called the type “FwdIterator” because I require the iterator to support the post increment operator – the concept of “Forward Iterator” defines this requirement.

But don’t get too wrapped up in the lingo. Read the code like this:

“Give me two iterators pointing to the start and end of a range”

“Create a CComSafeArray containing the same type of objects as the iterators point to”

“Add elements to the CComSafeArray from the iterators and return the CComSafeArray when done”

If you can grasp the high level ideas then the syntax comes relatively easy.

In the future I’ll talk about the pair function, FillFromCComSafeArray. See if you can write it yourself!

Relevant links:

SGI’s standard library documentation

Boost’s Generic Programming Techniques

David Muser on Generic Programming

Sunday, May 01, 2005

Gallery: FAQ Item C.22

I use the wonderful gallery to host my photos.

Every so often I upgrade it or perform some sort of maintenance on it. And everytime I have to search for a particular link that tells me how to configure the software so that I can view it from inside my LAN and from the Internet. I want to preserve the linkso I don't lose it again!

It's item C.22 of the FAQ.

I do recommend the software though ngallery and coppermine look quite good too.

What do you use to host your photos (and BTW I'm not interested in any online facility, flickr is great and all but I want to host them on my own PC thanks very much)? :)

Sublime Software