Tuesday, November 17, 2009

Standard algorithms and boost::ptr_vector

I did something bad the other day.

OK, I can't tell if it was bad. In another environment, it would have been bad, but since this was C++, perhaps it was OK. I was in the situation where I had a boost::ptr_vector, and I wanted to use a standard algorithm on it. Specifically, I wanted to use std::partition to separate the objects that were still "alive" from those that were "dead" (where alive and dead are domain concepts in our application). The complexity here is that ptr_vector is a crazy container.

Most containers deal with a specific type T. You add Ts to the container. Dereferencing an iterator gives you a T&. It's generally assumed that a container operates on a single type, and the standard algorithms make this assumption.

The ptr_vector, on the other hand, appears to be two containers at once. Semantically, it's analogous to a std::vector<managed_ptr_type<T> >. It is intended that, by adding a pointer to the ptr_vector, the ptr_vector takes ownership of the lifetime of the memory at the end of the pointer. So, it is a container of pointers. On the other hand, when iterating a ptr_vector, it appears that it is a container of Ts.

In my case, I wanted to rearrange my ptr_vector. In particular, I wanted to partition the pointers into those whose object was still "alive", and those whose object was "dead". Since a ptr_vector is semantically a container of pointers, it made sense that I should apply std::partition to the ptr_vector. However, ptr_vector::iterator removes a level of indirection: instead of iterating T*, it iterates T&.

In fact, ptr_vector doesn't seem to provide any ways to rearrange the pointers once they are put into the container. Sure, you can mutate the object on the end of the pointer. You could operate at that level. But there doesn't appear to be a safe way to treat the ptr_vector as a container of pointers.

Fortunately, ptr_vector provides a back door. Its iterators support a base() method, which will return an iterator over T* (instead of an iterator over T&). This allows us to treat the ptr_vector as a container of pointers, and to use standard algorithms to manipulate those pointers. Now, this is not without peril. While it seems to be OK to rearrange the pointers, it wouldn't be safe to change the set of pointers. I wouldn't trust using something like std::remove_if, because it might leave garbage in the container after it is done. The container might contain duplicate pointers. Some pointers might get dropped completely. If the container then goes out of scope, it will try to delete these pointers multiple times, which would be a bad thing. It might also fail to delete some pointers, because they were overwritten (and not preserved elsewhere in the container).

This whole thing felt like the best solution possible, while at the same time leaving a lot to be desired. I felt like I was violating the encapsulation of the ptr_vector. I suppose this is one of those cases for which they put in the base() methods on the iterators. Additionally, I don't see any clear way that they could do better. For example, I think an assumption of ptr_vector is that a given pointer only occurs inside it at most once. The standard algorithms don't necessarily respect this assumption; see my commentary on remove_if in the previous paragraph. The standard algorithms, in some cases, expect more freedom than ptr_vector can provide. This disconnect is unfortunate, but not without reason.

An important first step to helping with this problem would be to add methods to ptr_vector (and its siblings) that allow you to treat it as a container of pointers. You could add, remove, and re-arrange the container using these methods. In addition, they could maybe provide specializations of some of the standard algorithms for each container. This is difficult for third party developers to do, since the actual type of a ptr_vector::iterator is implementation defined. The boost guys can cleanly provide a specialization of std::partition for this kind of iterator, but I can't. Now, this isn't perfect. It would help with the standard algorithms, but not third-party algorithms. Still, it would be a great step in the right direction.

So, did I do something bad, or did I do something necessary?

3 comments:

Marc said...

std::partition does work with boost ptr_containers. Try this:

#include <cassert>
#include <algorithm>
#include <numeric>

#include <boost/bind.hpp>
#include <boost/ptr_container/ptr_vector.hpp>

using namespace std;
using namespace boost;

bool IsEven(int i)
{
return i % 2 == 0;
}

int main()
{
typedef ptr_vector<int> IntPtrs;
IntPtrs ints;

for (int i=1; i <= 10; ++i)
ints.push_back(new int(i));

IntPtrs::iterator i = partition(ints.begin(), ints.end(), IsEven);

assert(accumulate(ints.begin(), i, true, bind(logical_and<bool>(), _1, bind(IsEven, _2))));
assert(!accumulate(i, ints.end(), false, bind(logical_or<bool>(), _1, bind(IsEven, _2))));

return 0;
}


The downside is that it works on a reference to the pointed at item, not the pointer, so it will be much slower than a simple pointer swap if your objects are expensive to copy, and it won't work at all if your objects disabled copying altogether.

Using std::remove_if is flat out wrong, see Meyers' Effective STL item 9 for why (Items 32 and 33 are also relevant). It is an easy one to mess up, and I've been confused by it too many times.

The short answer is, use the container's erase or remove member function if it is provided, otherwise use the erase-remove idiom. Since ptr_containers provide remove_if like functionality directly through the erase member function, you should use that.

Dan said...

Using std::partition on a ptr_container is also incorrect if the pointers themselves are meaningful. For example, if I have a pointer to Bob and a pointer to Alice, and I put these into the ptr_vector, then applying a std::partition to the container will have the effect that Alice is now a guy and Bob is now a chick. This isn't what I was intending.

I was more lamenting that I can't apply standard algorithms to the ptr_vector container to manipulate the order of the pointers stored within. I hear "standard algorithms are great; you should use them." I also hear "boost is great; you should use it." So I try to use them both together, and I run into issues, and it's a shame. I could write my own ptr_vector_partition function, and I could make it work, but I would prefer to use std::partition.

I was only using remove_if as an example of an algorithm that wouldn't work correctly on a ptr_vector. I hadn't read Effective STL, but I think I was poorly explaining what point 33 says. You are correct, I would instead want to use the built-in method erase_if. But I wasn't trying to use remove_if, I was trying to use partition, and it caused problems, and it made me sad.

Anonymous said...

Hi,

you need a better ptr_vector than Boost's: http://www.codeproject.com/KB/stl/ptr_vecto.aspx