Last Updated: Sept 14, 1997
© Copyright 1997 Steve Hickman
In reviewing all of the mentioned changes to STL below, please note that ALL of the changes to the code were, where possible, made in such a way that the modified templates work as before OR in shared memory. There were some places where this was not done - primarily due to the complications that didn't seem justifiable to fix at the time.
First, it should be noted that there are at least 3 fundemental weaknesses in the Standard Template Library: 1) It cannot handle shared memory; 2) It is not thread safe; and 3) There is no way to keep track of whether or not any given iterator is valid at a particular point in time. This port of STL fixes the first two problems , and attempts to fix the third.
1) Handling shared memory
The main focus of this part consists of the creation of a shared memory allocator template, shallocater(T), to replace the default local memory allocator that STL ships with. This allocator behaves the same as the default allocator except that it allocates chunks of shared memory.
In addition to shallocator, I created the shpointer(T) shared memory pointer template. This is a replacement for the standard language pointer types. The reason for this template is that we cannot store normal pointers in shared memory due to the possibility that a given shared memory segment will be attached at a different location in the address space for each process. shpointer(T) is a self-referencing offset pointer replacement. By that I mean that the offset stored in any instance of shpointer (T) needs to be added to the the 'this' pointer for the shpointer(T) instance to get the addres of what the shpointer(T) points to.
Some additional notes on the STL port:
A) The shvector.h file exists only because of three routines
unitialized_copy()
unitialized_fill()
unitialized_fill_n()
in the original (T)header file that don't dereference iterators in a way convenient to pointer(T). Because of this, I had to create copies of these routines called
shunitialized_copy()
shunitialized_fill()
shunitialized_fill_n()
in "shmemory.h" and adjust the related calls in "shvector.h". Otherwise we get duplicate definition errors if I just try to redefine the original routines.
These changes will work in the general case - if they are added to the official memory.h file, we can then get rid of shvector.h and simply use normal vectors with a shared memory allocator.
B) map.h and set.h were modified only in the addition of another template parameter to map, multimap, set and multiset. This additional parameter, rep_type, is used to indicate the type of the internal btree representation. The default value of rep_type is the value that is used with the default allocator. Replacing the allocator and/or the rep_type can place the stored data and/or the internal tree in shared memory. A diff of the files should make this obvious.
C) deque.h was modified to add another template parameter to deque. This additional parameter, InternalAllocator, is used to determine where the deque internal structure storage will take place. The default value of InternalAllocator is the value used with the default value of Allocator. Substituting new values for Allocator and/or InternalAllocator can place the stored data and/or the internal data structures in shared memory. The diff should make the changes obvious. Note that this a more general version of deque.h as opposed to the shared memory only copy mentioned in my last status report.
D) list.h became shlist.h. This could not be treated the same was as map.h, set.h or deque.h because there are several internal data types that need their own allocator. Since the types aren't defined outside of the list template, there is no way to pass them in via template parameters. Because of this, shlist is a modified version of list that uses shallocator for allocating internal data structures.
A few other changes were necessary to accomodate use of shpointer(T). There were deliberate casts to (bool) or (T *) in several places to make sure the appropriate conversion were selected by the compiler. Note that for ordinary pointers, these casts are at worst redundant. For shpointer(T), the casts call methods of shpointer(T) that return the appropriate values. In addition, operator *() in the original was replaced by two such methods, one const and one non-const. The original version of this was just sloppy programming.
Note also that by moving the actual definition of shlist_node down a few lines, I was able to convert void_pointer in list_node to link_type in shlist_node. This change should be made in the original, since it just makes sense to keep the pointer types consistent rather than cast constantly.It turns out the change was necessary to accomodate shpointer(T) pointers, because, as true objects, they cannot be automatically cast from type to type as needed.
E) tree.h became shtree.h for much the same reasons as list->shlist. Note that the definitions of rb_tree_node and rb_tree_node_buffer were moved down a few lines in shtree.h for the same reasons as list_node in list (see above). shtree.h required some of the same (T *) and (bool) casts that shlist did. These casts should be in the original code for the same reasons. I also replaced operator *() in shtree the same way I did in shlist for the same reasons.
It should be noted that there are several places in the code where default allocators instances are created and used in default constructors. The current implementation of the shared memory segment allocator requires it to keep track of the segment it is allocating from. Calling the default constructor of an allocator means that thisinformation does not get passed to that instance. This will cause problems only if there is an attempt to actually use the allocator.Places where this occurs have been marked. So far no attempt has been made to use them, so the code works as is.
F) string.h required only the change of the data_ pointer to the Allocator::pointer type and the addition of casts wherever data_ was used to insure that the actual pointer value is obtained when needed.
2) Thread safety
There is some documentation to suggest that newer versions of STL, in particular STL 2 from SGI, are thread safe. This is true only in a limited sense. STL has been modified by Stepanov et al. to provide a different allocator for each thread and class. This insures that different threads do not attempt to use the same memory for their collections. This is fine as far as it goes, but it does not allow threads to share data/collections which can be updated by all.
Communications with Hans Boehm at SGI indicates that Stepanov et al. do not intend to provide any additional thread safety mechanisms. Their view is that to provide such would place an undue performance burden on users that do not need that capability. In addition, they feel that such mechanisms should be provided external to any particular collection because there are some operations that cannot be made "atomic" internal to the collection.
While there will always be sequences of operations on any set of variables that are not naturally atomic (and must therefore use locking external to the operations themselves to create thread safety), there is value to providing locking internal to the collections themselves. In addition, my solution does not add a performance burden to those who do not need it.
The primary part of the solution is the use of counting locks. Lock.h defines Lock and affiliated classes. These locks use reference counts to keep track of the number of time the lock has been locked. Actual system calls to lock/unlock only occur when the count goes 0->1 or 1->0 or when the type of lock becomes more stringent.
In addition to Lock, the affiliated Guard class handles automatic locking/unlocking on scope boundaries. By creating a Guard instance on the stack when entering scope, we can be certain that the Guard instance will be destroyed when we exit the scope, thus unlocking the lock that the Guard constructor locked.
The Guard class makes it easy to modify the existing collection methods. Each method now requires only a single additional line - the Guard constructor. Further, the only methods that actually need this line are those public methods that either alter the collection or refer to something that can be altered. We assume that locking has already been done before any non public method is called (since it must be called from a public method.) Note that the use of counting locks means that even when public methods are nested, system calls are minimized.
In order to minimize the impact on users that do not want or need locking, LockType has been added as a template parameter to each of the templates. The default value for this template parameter is a NullLock class with null Guards that do nothing. This way, users who do not specifically ask for locking are not burdened by it.
In addition to the template parameter, the collection and associated iterators also store the lock index. The lock index is the index into a vector of lock structures in local memory for each process. Although the vector is in local memory, the same index is used irrespective of any particular process. The reason is this: each lock structure holds information about the state of the lock FOR A PARTICULAR PROCESS. For example, we can have multiple processes that all have read locks on the same collection. Each process can use the lock index of that collection to determine which lock structure IN ITS LOCAL ADDRESS SPACE to use for storing info about its lock status for that collection. Note that because the vector of locks is automatically generated, the identifiers associated with the locks are the same for all lock structures with the same index. This insures that the locks actually work the way they are supposed to.
Note that I also added locking to STL collection constructors. In order to avoid race conditions or special knowledge regarding who is actually initializing the memory for a given instance, AND avoid any specialized bookkeeping to determine which process/thread shouldbe doing the init, it makes more sense so simply build in guards that prevent multiple processes from modifying the same memory simultaneously.This enables all processes referring to objects to tree them uniformly in terms of calling constructors. This does not eliminate the race condition on memory allocation (which could result multiple instances at different locations when all we really want is one with references in multiple processes.)
In order to solve the memory allocation race condition, I resorted to a RegSegment class. This class, derived from the Segment class, keeps an internal registry of all significant objects that have been allocated in that Segment. Since the location of the map is well known, it serves as an anchor point for all processes attempting to use a given segment. Then, to avoid race conditions on allocations, we simply lock the segment, search the registry, create and insert intothe registry if needed, then unlocking the segment.
The only remaining issue is the creation & initialization of the segment, the registry, and the segment lock. Since creating and attaching a segment generate different return codes, I simply added a condition variable so that the processes that do not create the segment must wait on that variable to change before continuing, and the process that creates the segment changes the condition variable as its last step of segment initialization.
Note that the addition of the lock index and the Lock template parameter is a change beyond those mentioned in the section on shared memory. This change affects all the files listed in that section.
3) Iterator validation
While the STL doc notes the conditions under which iterators become
invalid, it does not provide a mechanism for determining if any particular iterator is invalid. This is an irritant to anyone who wants to keep an iterator around for a while. It can be potentially fatal in a multithreaded or multiprocessing environment where readers and writers run asynchronously.
The solution to this problem varies a little depending on the conditions which cause iterators to become invalid. It turns out that, except for vectors, the remaining STL collections all have the same conditions: an iterator only becomes invalid when the node it points to is deleted. This remains true regardless of the number of times the collection is modified. (Vectors have some additional ways to invalidate iterators).
Given this, it turns out that there is a simple solution to SOME of the problems of invalid iterators. What I did was add an ID field to the node structure internal to each collection and a matching ID field to the iterators for that collection. Whenever an iterator is assigned to a node, it copies the node's ID into its ID field. Nodes are assigned an ID when they are allocated, and the ID is set to 0 when the node is deleted. Checking for a valid iterator is then as simple as checking to see if the iterator ID matches the ID of the node it is pointing at.
The reason this works is because the allocator associated with any
collection maintains a free node list of nodes that have been deleted and are available for reuse (rather than returning the memory to the system). As a result, we are guaranteed that the pointer stored in the iterator
Note that the addition of the refID to iterators, and ID to the nodes, as well as the valid() method and the various checks are all in addition to the changes mentioned in the other sections. These changes affect all the files mentioned in the other sections.
A more complete solution to the invalid iterator problem is available from Cay Horstmann.
4) Eliminating static variables
Static variables are evil where shared memory is concerned. When dealing with multiple processes, statics do not live up to their purpose of providing a single global reference to some value. Instead, each process has its own set of statics.
STL uses static variables in several places. Because statics are local, the values they hold cannot be shared between programs. While not all statics need to be shared, some of the STL assumes that the statics it creates are available to the entire program. They are, but they aren't available to other copies of the program.
It turns out with STL, the only static variable we need be concerned with is NIL. Since there would be a new static instance for each different instantiation of the template, it is not a great stretch to simply create a different instance of NIL for each collection. The cost of creation and destruction of one instance in the collection should be much less than the cost of repeated valuewise comparisons (which would probably be the other alternative). However, part of the reason NIL is static is that there is a set of static routines that use it. If NIL is not static, then the routines can't be static either.
While the original version of STL rb_tree has many static routines that make direct reference to NIL, the Rogue Wave implementation, which I am using as a base, has encapsulated references to NIL in a single static routine isNil(). It turns out they have also added code for WIN16 that gets around having exact pointer matches. It turns out this same code can be used in a slightly modified form to work in shared memory.
5) Use of references
There are a number of places where pass by value semantics are used rather than pass by reference. With the change from vanilla pointers to shpointer(T) instances the copying can become significant. These will be replaced with pass by reference semantics where possible. Note that this will necessitate changing the items passed by reference to const (to get the equivalent lack of side effects).
An additional issue with references is the scattered use of const methods to return non const references. This is a no-no because a non const reference can be used as an lvalue to modify an instance. Modifying the instance or any of its contents OR making it possible to do so violates the spirit of the const contract. Consequently, places where this occurred were replaced with 2 copies of the method - a const method returning a const reference and a non const method returning a non const reference. The compile will be able to pick the one it wants.
6) Automatically generated methods
The C++ compiler automatically generates a number of methods for each class if they aren't defined in the class. Among these are the copy constructor and assignment operator. The automatically generated versions do bitwise copying. Bitwise copying has a downside. It probably won't do what you want when an instance contains pointers. It definitely won't work at all when using self relative pointers/offsets.(The first issue is left to the reader. The second is because the offset gets copied, but the base address (this pointer) is now different, so the offset puts us into uncharted territory.)
Upshot: eschew automatically generated methods. Create your own so the compiler does exactly what you want
7) Allocators
It should be noted here that the current DEC version of STL has per collection allocators. When collections are in shared memory, there are multiple processes using that allocator to get and put memory. This is OK because it is just a matter of adjusting the free list associated with that COLLECTION. Since the free list is associated with a collection rather than a process, there is no worry about one process allocating memory that another attempts to free.
Last Revised