admin管理员组

文章数量:1186135

I was banging my head against a wall trying to figure out why this implementation of an operator << overload for printing vectors with cout was causing segfaults if used with empty vectors.

template<typename T>
ostream& operator << (ostream& out, const vector<T>& v) {
    out << '[';
    for(auto p=v.begin(); p<v.end()-1; ++p) {
        out << *p << ", ";
    }
    if(!v.empty()) {
        out << v.back();
    }
    out << ']'; 
    return out;
}

After some debugging I realized that the loop condition was returning true for empty vectors so I tried to fix it several ways until landing on this solution with the help of ChatGPT.

template<typename T>
ostream& operator << (ostream& out, const vector<T>& v) {
    out << '[';
    for(auto p=v.begin(); p+1<v.end(); ++p) {
        out << *p << ", ";
    }
    if(!v.empty()) {
        out << v.back();
    }
    out << ']'; 
    return out;
}

Notice the change in the condition of the loop essentially goes from v.start() < v.end()-1 to v.start()+1 < v.end() in the problem case of empty vectors. I tested the solution and found that it works fine now. I asked ChatGPT why this works and it basically told me that I should respect the valid range of vector iterators: [v.start(), v.end()]. Yet I find it curious that v.start()+1 results in correct behavior since in the case of empty vectors v.begin()==v.end() thus v.begin()+1 is outside of the valid range thus should be undefined behavior.

I asked again about the difference between accessing the range before v.start() and the range after v.end() and the response boiled down to its ok to do the latter but not the former since we are not dereferencing the iterator. But using this logic wouldn't it be fine if we did either? Shouldn't both be undefined behavior? Does this kind of thing apply to other types of containers that support iterators? Is container.end()+1 valid or not for arithmetic and comparison purposes?

I was banging my head against a wall trying to figure out why this implementation of an operator << overload for printing vectors with cout was causing segfaults if used with empty vectors.

template<typename T>
ostream& operator << (ostream& out, const vector<T>& v) {
    out << '[';
    for(auto p=v.begin(); p<v.end()-1; ++p) {
        out << *p << ", ";
    }
    if(!v.empty()) {
        out << v.back();
    }
    out << ']'; 
    return out;
}

After some debugging I realized that the loop condition was returning true for empty vectors so I tried to fix it several ways until landing on this solution with the help of ChatGPT.

template<typename T>
ostream& operator << (ostream& out, const vector<T>& v) {
    out << '[';
    for(auto p=v.begin(); p+1<v.end(); ++p) {
        out << *p << ", ";
    }
    if(!v.empty()) {
        out << v.back();
    }
    out << ']'; 
    return out;
}

Notice the change in the condition of the loop essentially goes from v.start() < v.end()-1 to v.start()+1 < v.end() in the problem case of empty vectors. I tested the solution and found that it works fine now. I asked ChatGPT why this works and it basically told me that I should respect the valid range of vector iterators: [v.start(), v.end()]. Yet I find it curious that v.start()+1 results in correct behavior since in the case of empty vectors v.begin()==v.end() thus v.begin()+1 is outside of the valid range thus should be undefined behavior.

I asked again about the difference between accessing the range before v.start() and the range after v.end() and the response boiled down to its ok to do the latter but not the former since we are not dereferencing the iterator. But using this logic wouldn't it be fine if we did either? Shouldn't both be undefined behavior? Does this kind of thing apply to other types of containers that support iterators? Is container.end()+1 valid or not for arithmetic and comparison purposes?

Share Improve this question edited Jan 27 at 8:45 Jarod42 217k15 gold badges194 silver badges327 bronze badges asked Jan 26 at 21:23 AldoGP5AldoGP5 576 bronze badges 10
  • 4 My approach is usually to print out the first item, if any, with no comma and then loop through the subsequent items, p<v.end(), and print the comma before the item. Others prefer looping the whole container and using an if statement in the to print the comma if not at the end. – user4581301 Commented Jan 26 at 21:26
  • 8 Tactical note about ChatGPT: Remember that it's just a language model and the language it's modelling its not C++. All it does is string together statistically likely tokens without any understanding of what those tokens mean. – user4581301 Commented Jan 26 at 21:39
  • 2 You are right, going beyond end is as wrong as goung before begin. The reason why you see a different behaviour is that undefined behaviour is undefined. So even doing what you want is one possible allowed behaviour but you can not rely on it. – gerum Commented Jan 26 at 21:39
  • 1 Sidenote: Overloading operator<< is fine for user-defined types but your overload accepts any Ts. – Ted Lyngmo Commented Jan 26 at 21:44
  • 3 So I tried to fix it several ways... --You didn't try a very simple solution like this? All you had to do is 1) Check for empty vector, and 2) Check if the iterator is not the begin() iterator to output a comma. – PaulMcKenzie Commented Jan 26 at 23:07
 |  Show 5 more comments

4 Answers 4

Reset to default 5

Both approaches can work if the vector is tested first for empty. Otherwise, you are risking undefined behavior adding/subtracting the iterator values. Outputting the first element before the loop seems more readable. Inside the loop, the comma precedes the value's output.

If you can use the ranges library, the range-for is usable similarly using drop to hide the first element for the loop.

Here is a simplified demonstration and testing code:

auto main() -> int {
   std::vector vec{1, 2, 3};
#if 0
   vec.clear();
#endif

   if (!vec.empty()) {
      std::cout << vec[0];
      for (auto v = vec.begin(); v != vec.end() - 1; v++) {
         std::cout << ", " << *v;
      }
      std::cout << '\n';
   }
   if (!vec.empty()) {
      std::cout << vec[0];
      for (auto v = vec.begin()+1; v != vec.end(); v++) {
         std::cout << ", " << *v;
      }
      std::cout << '\n';
   }

   // If you can use ranges
   if (!vec.empty()) {
      std::cout << vec[0];

      for (auto v : vec | drop(1)) {
         std::cout << ", " << v;
      }
      std::cout << '\n';
   }
}

Update 1 & 2: I couldn't resist trying the latest std::print in C++23. Added simple std::println in response to comment. I didn't try it with CLang in Compiler Explorer because it wouldn't work on my system with GCC. Duh!

Live code:

if (!vec.empty()) {
    std::print("{}", vec[0]);
    auto drop_cnt = (vec.size() > 1) ? 1 : 0;
    std::println(", {:n}", vec | vws::drop(drop_cnt));
}

println("{:n}", vec ); // :n removes the surrounding []

Shouldn't both be undefined behavior?

Yes, both are undefined behavior. The strange thing about undefined behavior is that anything can happen. Anything. The program could crash. It could erase your hard drive. It could appear to work. Anything.

The fact that you got the result you wanted does not conclusively rule out undefined behavior.

Is container.end()+1 valid or not for arithmetic and comparison purposes?

It is not valid.

Possibly where ChatGPT got confused is that container.end() is safe to use as long as you do not de-reference it. This is close to what it told you, but not quite. And to some extent, that is what generative AI is designed to do – take language as input, find "close" connections, and generate, allowing minor variations in the details. This is helpful in that the same input can produce different results each time it is used. On the other hand "close" is not so helpful when details matter, as they do in programming.

With the current state of AI, you can leverage AI to get pointed in the right direction, but don't expect the AI to get the details correct. If something from an AI seems to not make sense, there is a good chance that it does not make sense.

You're indeed running into what's technically speaking undefined behaviour - or rather, determined by the implementation of this iterator, not by its interface(+documentation). Since implementation could change under the hood in any way, you as a user shouldn't rely on it. This applies to anything, even if you don't dereference. Just gotta follow the usual programming rules - check for validity before any operations that could be invalid - and you're safe.

Based on your results, we can even guess at what the implementation of iterator is. It can hold an unsigned integer with 0 corresponding to begin() and size() corresponding to end(). In that case, incrementing it and comparing with size() gives correct results, barring overflow. However, decrementing 0 causes underflow and comparing the integer stored in the valid pointer p with maximum unsigned integer always gives true.

Note that this is similar to using plain indexes: for(int i = 0; i < v.size()-1; i++) is an infinite loop for empty v. Same if i is unsigned since it's converted to unsigned for comparison. This is a famous bug, although not the same kind since it's defined by the standard and return type of size().

Yes, v.begin() + 1 and v.end() - 1 both give undefined behaviour if v is an empty container (i.e. if v.begin() == v.end()).

Personally, I'd strongly recommend against defining such an operator<<() in the first place. If you must have such a function, give it a different (distinct) name.

There are numerous explanations/interpretations online of why the standard does not specify such an operator function for standard containers (not least of which is that it allows containers to have elements that lack an operator<<()) and explanations of why it is a bad idea to roll your own.

But - if you have the bit in your teeth and really insist on having such an operator despite such advice - a simpler fix for your code would be (also removing reliance on using namespace std or other usings that have similar net effect)

template<typename T>
   std::ostream& operator << (std::ostream& out, const std::vector<T>& v)
{
    out << '[';
    
    auto end = p.end();
     
    for(auto p = p.begin(); p != end; ++p)
    {
        out << *p;
        if (p + 1 != end)   // ok since p != end
            out << ", ";
    }
    out << ']'; 
    return out;
}

本文标签: ccontainerbegin()1 validity for empty container vs containerend()1Stack Overflow