Few days ago I decided to check what is really the fastest method to iterate over strings in C++. As a string class I chose string class from STL as it is very popular and provides a couple of ways to iterate it. So how can one iterate over an std::string
?
- By using indexes. E.g.
str[i]
and runningi
from zero to the length of the string. - By using the
at
method.string::at(size_t pos)
provides similar interface to indexes with the exceptions that it checks whether the given position is past the end of the string and then throws an exception. One may see it as the safe version of the regular index. - Treating the string as a sequence of characters and and iterate over it using iterators.
- Using
string::c_str()
to get a pointer to a regular C string representation of the string stored in thestd::string
and treating it as array, e.g. using indexes to go over it. - The last way to iterate over the string is to get a pointer to a C string representation using
string::c_str()
and advancing the pointer itself to iterate over the string.
The third method is the native method of iterating over objects in STL, and like the last two it can’t be used if the iteration changes the string itself (e.g. inserting or deleting characters). The first and second method are similar to the fourth (treating the pointer to the C string as an array), except that they aren’t so problematic as the latter when changing the string. The second method is the safest as it’s the only one that does range checks and throws exception if trying to access positions which are outside the string.
To benchmark and find out which method is the fastest method to iterate over a string I’ve created a huge string of random characters ranging from ‘a’ to ‘z’ and five executables, each one implementing one of the above iteration methods to do a simple task
(count the number of occurrences of each letter). The string is fifty million characters long which, as the longer the string the less important the overhead becomes.
The executables for the benchmark of every version were compiled with the default setting of g++
(without optimization as the compiler might change the iteration methods when optimizing). The benchmark executables where timed by using the time
command and redirecting the executables output to /dev/null
. The tests were run both on 64bit Gentoo (with 1 GB RAM) and on 32bit Kubuntu (with 512 MB RAM), to make sure the overall results (which method it better not the runtime itself) isn’t system depended.
Now to the result itself. After running the benchmark couple of times I came up with the following conclusions: Don’t use iterators for string iteration. Iterators came last on every test and usually times up to three times more than the slowest method besides it. Iterators may be the native STL way to iterate over STL containers but it provides very slow way to so. While iterators can be called STL pointers as the work and behave much the same way pointers do, they didn’t preform even close to pointers. Iterating using pointers came up as the fastest way to iterate over a string with a small margin over string::at() and indexes (both over usual C strings representation and std::string). When inspecting the rest of the methods one may find that string::at() and the indexes came up with very close timings, but on most tests the indexes over std::string where faster. It is obvious why they should preform better than string::at(), as they don’t do range checks as the latter does. For some reason I got that iterating using indexes over the std::string directly is faster than iterating over the the C string representation (by very minor margin) and that the C string is timing roughly the same as the string::at().
To conclude the benchmark pointers are the fastest way to iterate by a small margin, but using the std::string indexes and string::at is preferable in my opinion as the performance difference isn’t that big and the latter methods provide safer way (that can handle string manipulation that may cause the string data to be copied to another place in the memory) than the pointers. The indexes over C string representation suffer the same disadvantages as the pointers but don’t operate as fast. Stay away from iterators at all costs! Iterators suffer similar disadvantages as pointers do (regarding string manipulation) and don’t give anything in return except for horrible run time.
My timings
Here is the output of one of the runs of the benchmark. The results where typical to almost all of the test. To try the benchmark on your computer, see instructions bellow.
guy@Guy_Computer ~/temp/stringbenchmark $ ./benchmark iteration using indexes real 0m0.690s user 0m0.644s sys 0m0.048s iteration using indexes over C string real 0m0.752s user 0m0.720s sys 0m0.028s iteration using string::at() real 0m0.762s user 0m0.708s sys 0m0.036s iteration using pointers over C string real 0m0.642s user 0m0.604s sys 0m0.036s iteration using iterators real 0m2.323s user 0m2.272s sys 0m0.052s
Running the Tests on Your System
If you want you can run these test on your system to see the exact results. To do so download the tar archive and go to the directory you downloaded it into. Open the command line on this directory and execute:
tar -zxvf stringbenchmark.tar.gz
cd stringbenchmark
make
./benchmark
You will see the test results printed on your screen. While exact timings might differ from run to run, the overall trends should be clear.