Next: Operator Overload Choices
Up: Design Trade-offs
Previous: C vs C++ interface
Contents
The approach that Python, Java and other languages take with Strings is
to make them immutable (unchangeable). This technique stores each unique
string once in a data structure (typically a hash table) and represents
Strings as references into this data structure. Under this approach, if
a programmer sets x = "Hello"
and y = "Hello"
, there is
only one copy of "Hello" in memory and both x
and y
point
to it.
The advantage of immutable strings are
- Copying and comparing strings is very fast in that we only need to copy/compare a reference.
- In cases where there are many strings are set to the same value, an
immutable string approach can save memory.
- In a multi-threaded environment, Immutable string offer a safe
paradigm for sharing common data between threads. In other words, if
two strings in two threads happen to contain the same data, immutable
strings allow that data to be safely represented once in memory.
- As information is discovered about an immutable string (such as
countTokens()
, atoi()
, etc.), this information can potentially
be cached in the event that the same information is required again.
There are also many disadvantages however:
- Modifying a string in any way requires a new copy to be created
and positioned in the data structure. The cost of all of this copying
can easily add up.
- A form of reference counting or garbage collection is required track
ownership
- Immutable strings have to exist on the heap. Heavy string
modification will lead to a lot of heap-thrashing.
- Every time a new version of a string is created, it has to be
positioned in the data structure. This requires looking at string
data to determine how to position it. You may save time comparing
immutable strings for equality, but you pay extra in creation of these
strings, even if you never plan to copy or compare them.
- Immutable strings only offer advantages when working with
themselves. Interaction with outside data structures (such as
char*
)
offer additional problems and no performance benefit.
My final take is that immutable strings offer some advantage and some
disadvantages but tend to be more at home in a garbage-collected
environment where the programmer has less control over how code
executes.
Next: Operator Overload Choices
Up: Design Trade-offs
Previous: C vs C++ interface
Contents
2007-05-05