Created attachment 11475 [details] Test program documenting the problem. When extracting doubles from stringstream the character following the double affects if extraction works or not (see attached code example). If the double is followed by a space or underscore extraction works, if a letter follows extraction fails. It seems to be related to the extraction of the double performed by num_get::do_get(). If I read the standard correctly (ISO/IEC 14882:2003(E), 22.2.2.1.2 num_get virtual functions [lib.facet.num.get.virtuals]; sorry no 2011 at hand) this should work independently of the character following the double. Tested with: $ c++ --version Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn) Target: x86_64-apple-darwin13.0.0 Thread model: posix Steps to reproduce: 1. Compile the attached test program (c++ --stdlib=libc++ streamtest.cxx) 2. Run the attached test program Actual result: '-4.9' converted to -4.9 '-4.9 X' converted to -4.9 '-4.9_' converted to -4.9 '-4.9d' failed to convert to double '-4.9X' failed to convert to double Expected result: (produced with c++ --stdlib=libstdc++ streamtest.cxx) '-4.9' converted to -4.9 '-4.9 X' converted to -4.9 '-4.9_' converted to -4.9 '-4.9d' converted to -4.9 '-4.9X' converted to -4.9 Build Date & Platform: 2013-11-01 on Mac OS 10.9
I did some additional testing; I tested all the characters from ' ' to '~' (so, testing all the strings from "-4.9 " to "-4.9~". The only ones that failed were were the ones that ended with certain alpha characters (note: lower case failed also; same as upper case) Clearly, there's some hex stuff going on here, but I don't know what's up with 'I', 'N', 'P', and 'X'. '-4.9A' failed to convert to double '-4.9B' failed to convert to double '-4.9C' failed to convert to double '-4.9D' failed to convert to double '-4.9E' failed to convert to double '-4.9F' failed to convert to double '-4.9G' converted to -4.9 '-4.9H' converted to -4.9 '-4.9I' failed to convert to double '-4.9J' converted to -4.9 '-4.9K' converted to -4.9 '-4.9L' converted to -4.9 '-4.9M' converted to -4.9 '-4.9N' failed to convert to double '-4.9O' converted to -4.9 '-4.9P' failed to convert to double '-4.9Q' converted to -4.9 '-4.9R' converted to -4.9 '-4.9S' converted to -4.9 '-4.9T' converted to -4.9 '-4.9U' converted to -4.9 '-4.9V' converted to -4.9 '-4.9W' converted to -4.9 '-4.9X' failed to convert to double '-4.9Y' converted to -4.9 '-4.9Z' converted to -4.9
Created attachment 11785 [details] Test case showing another condition of it happening Displays both inputs as "'hello' 'world' 0", instead of "'hello' 'abcdefworld' 0"
*** Bug 19354 has been marked as a duplicate of this bug. ***
Another example (from 19354): #include <iostream> #include <sstream> int main() { std::stringstream ss; double f; ss << "abc"; ss >> f; if(ss.fail()) { std::cout << "Failed!" << std::endl; ss.clear(); } std::cout << "Position: " << ss.tellg() << std::endl; } libstdc++ prints 0, libc++ prints 3
I have also filed <rdar://17766732> as this occurs in shipping OS X versions.
I think the behaviour is correct for a-f and x, via this route: 1. operator>> skips space and sends the result on to num_get 2. num_get accumulates as many characters from the set "0123456789abcdefxABCDEFX+-" (+ decimal separator) as it can (Stage 2). 3. num_get sends the result to strtold (Stage 3). 4. strtold decides there's a valid floating-point constant with nonsense at the end (according to C99 spec). 5. This means an error for num_get (Stage 3 still). As for the others, I think they're wrong, but they're understandable according to what strtold accepts: + 'I' for INF + 'N' for NAN + 'P' for C99 floating-point hex constants: "0x1ff.5p3a". Presumably libc++ is taking some shortcut in the required steps.
Well, not any kind of shortcut it turns out. It's an explicit choice in locale.cpp: const char __num_get_base::__src[33] = "0123456789abcdefABCDEFxX+-pPiInN"; Everything else seems to pretty logically follow.
(In reply to comment #6) > 5. This means an error for num_get (Stage 3 still). You may be interested in the conversation between myself and Marshall in another bug (http://llvm.org/bugs/show_bug.cgi?id=19611). The conversation here seems to be related.
(In reply to comment #6) > As for the others, I think they're wrong, but they're understandable > according to what strtold accepts: My reading of `do_get()` stage 2 (22.4.2.1.2 in the C++11 n3337) is that this is correct behaviour. > If it is not discarded, then a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1. If so, it is accumulated.
Hmm. I think it might be undefined behaviour in the standard itself (does that mean reading the pdf itself might invoke nasal demons?). That sentence assumes that "c" has been properly initialised, but it looks suspiciously like there's no check the std::find did actually succeed.
*** Bug 22372 has been marked as a duplicate of this bug. ***
LWG issue #2381 (http://cplusplus.github.io/LWG/lwg-active.html#2381) also reads on this.
I have a similar problem like this where I am parsing floats from strings using istringstream. Parsing "10.0f" as a float with value 10.0 works with libstdc++ but not with libc++. Looks like stdc++ just stops reading characters after invalid characters like 'f' are reached and converts string till invalid character as float. Where as libc++ just fails here. As per standard, look like libstdc++ is right here, so wanted to know whether libc++ intentionally fails here or it is a bug? example: std::istringstream stream(str); stream >> (*value); where str is const "std::string &" and value is "float *"
Arun, what do you mean by "As per standard, look like libstdc++ is right here"? Duncan commented earlier that this seems to be correct per the standard in comment #9. If you disagree, please state your argument.
Hi, I was talking about my interpretation of some reference documentation given below. As per some documentation, character reading is stopped by numeric extractor once invalid character is reached, so may be that f should be ignored. This could also vary across platforms as you are getting different results with your sample. Here is the reference: http://www.cplusplus.com/reference/istream/istream/operator%3E%3E/ <-"...Then (if good), it calls num_get::get (using the stream's selected locale) to perform both the extraction and the parsing operations..." http://www.cplusplus.com/reference/locale/num_get/get/ <- "...The function stops reading characters from the sequence as soon as one character cannot be part of a valid numerical expression (or end is reached)..." I had faced this problem a year back when I was solving one chromium bug - https://bugs.chromium.org/p/angleproject/issues/detail?id=1046 and we finally added a workaround in our code itself.
I just ran into this on FreeBSD (10.3). The code was like: std::istringstream ss("123abc"); double retVal; ss >> retVal; bool result = ss.eof(); The result (eof) is false on GNU libstdc++, but true on LLVM libc++. The former seems correct, by my reading of LWG #2381, particularly the example: http://cplusplus.github.io/LWG/lwg-active.html#2381 Is that the consensus?
I propose the following fix: https://reviews.llvm.org/D26920
(In reply to comment #15) > http://www.cplusplus.com/reference/locale/num_get/get/ > <- "...The function stops reading characters from the sequence as soon as > one character cannot be part of a valid numerical expression (or end is > reached)..." Emphasis added to the key part of that statement: """ ... The function stops reading characters from the sequence as soon as one character cannot be part of *A* valid numerical expression (or end is reached)... """ As Tim already mentioned above, a-f *CAN* be part of *A* valid numerical expression (eg: 0xbadf00d), so they are accumulated. libc++ seems to be following the standard correctly. libstdc++ seems to have deviated from the standard in a way that makes practical sense. Given its history, one might argue that the behavior (also as seeming expected / relied upon by multiple projects) might establish a defacto standard and may indeed be preferable to the codified standard. However, we are not here to argue about which is *better*, we are here to determine which is *correct*, and as I read it, the libc++ implementation is *correct*. This does, however, represent a binary compatibility regression between the two runtimes, and libstdc++'s implementation is potentially more desirable to clients. I would suggest that if we want to pursue this change, we should get clarification added to the C++ standard to indicate if that "a" should indeed be "the".
(In reply to comment #18) > I would suggest that if we want to pursue this change, we > should get clarification added to the C++ standard to indicate if that "a" > should indeed be "the". Marshall Clow already submitted LWG #2381. The proposed example clarifies that the validation is sensitive to the context of "the" expression.
*** Bug 36099 has been marked as a duplicate of this bug. ***
I think you are misreading this. As noted by Duncan, the key phrase is: > If it is not discarded, then a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by Stage 1. If so, it is accumulated. But you are ignoring the words "the next". They imply context dependence. If the intent of the author of that sentence was to ignore the context of the number you are currently parsing then the words "the next" wouldn't be there. Further, your current implementation doesn't conform to your interpretation of the specification either. If it did, then you would accept any arbitrary sequence of digits, plus/minus signs and decimal points but this doesn't seem to be the case. For example, under your interpretation "21-42" should be consumed as a single run of characters that are "allowed as the next character of an input field" and then rejected as an invalid double. If libc++ did this I suspect it would break a lot of software.
I recently found this bug report after discovering that both libstdc++ and libc++ have their own issues reading doubles/floats from a stream (fwiw, my notes and examples from that exercise are documented at https://github.com/tardate/LittleCodingKata/blob/master/cpp/DoubleTrouble/README.md). Am I correct in understanding the current state of the bug is basically stalled, pending approval of LWG #2381, or perhaps activity on C++20 that might supersede it? I'm a little concerned that LWG #2381 as it stands may not fully resolve the issues, as it appears limited to adding "p" to numeric character set and providing a clarifying example of how a hex float should be accumulated. Specifically, it doesn't address (for example) failing cases such as "abc" as mentioned on Bug 19354 - merged as a duplicate - which remain subject to interpretation and potentially requiring further clarification to get right. I do agree with the intent of LWG#2381, which I think covers two main points. To paraphrase: 1. Conversion should follow the same rules as strtold, meaning that it terminates at the first character that is not "... allowed as the next character..." (even it is from the set of valid numeric characters) 2. All characters not used in a valid conversion are left on the stream (i.e. no magic discards per current libc++ implementation) This is I believe "common sense" - and basically the libstdc++ approach (notwithstandanding its other issues) - but not the current libc++ implementation. But I don't think it is possible to conclude one or other is correct under the C++17 standard, because as it stands the spec is ambiguous and incomplete on these points. Looking at C++20 draft n4791, 27.4.2.1.2 does not yet seem to reflect LWG #2381 - it's basically the same as C++17, and still has the ambiguities and gaps and so still arguably contradicts LWG#2381. It seems that without clarity on the spec front, I doubt if this bug can be resolved - is that the general concensus? Is there hope for successfully addressing the specification issue at least in C++20? NB: if my reading of LWG#2381 is correct per the two points above, from my (very inexpert) reading of the code it appears that the libc++ implementation could be "fixed" with minor surgery: * if `__num_get_float` fails, then `__do_get_floating_point` should rollback the stream pointer to unwind all the characters extracted in stage 2 * if `__num_get_float` can perform a partial conversion (i.e. not all of the characters extracted in stage 2): * treat this as success, return the extracted value * and pass back the position in the stream that it was converted up to (`__p2`) - so `__do_get_floating_point` can adjust the stream position accordingly.