WebDialog encoding bug found!
-
In Windows 7 p(params) returns non utf-8
"lid1,\xC3\xA9lan"
and console shows:
return using add_action_callback lid1 = élan
return using get_element_value => élan -
which one is affected appears to have changed between v8 and v2013, and it's a different charset
and now causes the encoding error message ONLY if you try and combine strings...
so v2014 can't print the last line the same as v2013
@Jauntxo does it through a coding error if you coment out
#if (Sketchup.version.to_i > 13) && ( not param_val == elm_val) # @dlg2.execute_script("translation3.textContent='Error; #<Encoding;;CompatibilityError; incompatible character encodings; ASCII-8BIT and UTF-8>'") # else @dlg2.execute_script("translation3.textContent='#{param_val}' + ' == ' + '#{elm_val}'") # end
-
@juantxo said:
In Windows 7 p(params) returns non utf-8
"lid1,\xC3\xA9lan"..ON mac p(params)
"lid1,\xE2\x88\x9A\xC2\xA9lan"
why?
-
@driven said:
@juantxo said:
In Windows 7 p(params) returns non utf-8
"lid1,\xC3\xA9lan"..ON mac p(params)
"lid1,\xE2\x88\x9A\xC2\xA9lan"
why?
We are seeing the raw 8-bit byte sequences of the string representations and the chaos that results when Ruby and SU try to convert character encodings without knowing for sure what they had.
-
I didn't know the ASCII-8BIT encoding issue before, but it's only the cherry on the cake.
Along with the issues we have found in the other webdialog discussions (url encoding, url length limit, dropped backslashes), I recommend not to transfer user input or arbitrary text through action callbacks, only simple method names/indentifiers with a limited character range. Then one can use
get_element_value
to fetch user input of any arbitrary character range. -
@aerilius said:
...Then one can use
get_element_value
to fetch user input of any arbitrary character range.except,
get_element_value
fails pre v2013 andget_element_value
works.I got a few old scripts I was trying to update, and was unable to figure out the issue...
I think this shows it was broken to begin and got broken 'differently' with a fix...
I'm more inclined, to look at using 'unicodeEscape' in the js... before retrieving by either method...
john
-
@driven said:
@aerilius said:
...Then one can use
get_element_value
to fetch user input of any arbitrary character range.except,
get_element_value
fails pre v2013 andget_element_value
works.I got a few old scripts I was trying to update, and was unable to figure out the issue...
I think this shows it was broken to begin and got broken 'differently' with a fix...
I'm more inclined, to look at using 'unicodeEscape' in the js... before retrieving by either method...
john
It's possible to get deeply confused trying to sort this out
Here's what I have observed:The UTF-8 byte sequence for élan is \xC3\xA9lan (that is, UTF-8 encodes é as the two-byte sequence \xC3\xA9).
On Mac, get_element_value returns this byte sequence exactly in both SU8 and 2014. However, because Ruby 8 thinks it is a 5-character string in ASCII-8BIT, it treats \xC3 as à and \xA9 as , the extended ASCII interpretations of these bytes. Ruby 2.0 happily assumes it is UTF-8 and gets it correct.
The action callback parameters are handled differently and inconsistently between Windows and Mac.
On the Mac, in SU8 the action callback also returns the original UTF-8 5-byte sequence, but somehow when Ruby prints it as a string, it gets it right Further, it believes this string and the one from get_element_value (that prints as Ãlan) are equal. This makes no sense to me...
But in 2014, the original 5-byte UTF-8 string is somehow transmuted into the 8-byte string \xE2\x88\x9A\xC2\xA9lan. Note that \xE2\x88\x9A is the UTF-8 for the square-root sign √ and \xC2\xA9 is UTF-8 for . So it appears the translation of the copyright is an attempt to handle misreading of the UTF-8 as ASCII-8BIT, but I don't know where that square root sign byte-sequence came from since as I mentioned above, \xC3 is à in ASCII-8BIT, which is \xC3\x83 in UTF-8. It's as if the callback processing code has an incorrect implementation of the transcoding.
And then, as juantx0 reports, SU8 on Windows 7 returns the 5 UTF-8 bytes unconverted yet again somehow manages to make sense of it in both cases .
-
@slbaumgartner said:
... Further, it believes this string and the one from get_element_value (that prints as Ãlan) are equal. This makes no sense to me...
I create this confusion... I don't check for equality it's a 'puts', it should be
@dlg2.execute_script("translation3.textContent='#{param_val}' + ' == ' + '#{elm_val}' + ' is ' + '#{param_val == elm_val}'")
which returns
√©lan == élan is false
sorry I'll get my coat...
john
-
Sorry, I didn't update my profile.
In Windows SketchUp 2013 p(params) returns
"lid1,élan"
so works fine. (I think in SU8 also)
Problem is in Windows SU2014 that returns non utf string. -
@juantxo said:
Sorry, I didn't update my profile.
I suspected that...
which locale do you use
Sketchup.get_locale
en-US?, it may have a bearing?john
-
Yes, en-Us.
-
@driven said:
@slbaumgartner said:
... Further, it believes this string and the one from get_element_value (that prints as Ãlan) are equal. This makes no sense to me...
I create this confusion... I don't check for equality it's a 'puts', it should be
@dlg2.execute_script("translation3.textContent='#{param_val}' + ' == ' + '#{elm_val}' + ' is ' + '#{param_val == elm_val}'") >
which returns
√©lan == élan is false
sorry I'll get my coat...
john
Nah, I should know enough to check your code . With a real test, equality fails in all cases (as it should!).
-
@slbaumgartner said:
Nah, I should know enough to check your code :oops: . With a real test, equality fails in all cases (as it should!).
it's the tangents that count...
now you can write a proper test...
john
-
The text that displays in your test, says UFT-8 in many places. It is UTF-8.
(The word format is at the end, of Unicode Transformation Format.)(nag, nag)
-
@slbaumgartner said:
But in 2014, the original 5-byte UTF-8 string is somehow transmuted into the 8-byte string \xE2\x88\x9A\xC2\xA9lan. ... It's as if the callback processing code has an incorrect implementation of the transcoding.
YES, agree.
To me it looks like it IS UTF-8, but Ruby thinks it is some other encoding, and doubly transcodes* it into UTF-8 AGAIN.
- P.S. - isn't transmute a math term ?
-
@driven said:
except,
get_element_value
fails pre v2013 andget_element_value
works.? It does? Are you sure?
-
@dan rathbun said:
To me it looks like it IS UTF-8, but Ruby thinks it is some other encoding, and doubly transcodes* it into UTF-8 AGAIN.
That's a similar problem we have with FILE $LOAD_PATH, $LOADED_FEATURES and ENV. UTF-8 byte sequences isn't labeled with the correct encoding.
-
@dan rathbun said:
@slbaumgartner said:
But in 2014, the original 5-byte UTF-8 string is somehow transmuted into the 8-byte string \xE2\x88\x9A\xC2\xA9lan. ... It's as if the callback processing code has an incorrect implementation of the transcoding.
YES, agree.
To me it looks like it IS UTF-8, but Ruby thinks it is some other encoding, and doubly transcodes* it into UTF-8 AGAIN.
- P.S. - isn't transmute a math term ?
I was thinking more of alchemy
-
@tt_su said:
? It does? Are you sure?
It certainly looks like it...
I cleaned up my initial code an per the Nanny and the Professors prodding...
and these are the new images...
and the new script# encoding; UTF-8 def show_problem @lang_hash = {'lid1'=>"élan"} # I'm using a hash because it's what I use in my plugin... @dlg2 = UI;;WebDialog.new("Problem_Main", false,"main_prob", 700, 500, 600, 0, false) html = %Q( <!DOCTYPE html> <html> <head> <title>Problem_Main</title> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body> <p>Tested using Sketchup.version #{Sketchup.version}</p> <p>Tested on #{RUBY_PLATFORM =~ /(darwin)/ ? ((%x(sw_vers).sub(/ProductName;/,'').sub(/ProductVersion;/,'').sub(/BuildVersion;/,'_'))) ; 'windows'}</p> <p>click into the box and hit return or change to use another non ASCII-8BIT string...</p> <input id="#{@lang_hash.keys[0]}" value="#{@lang_hash.values[0]}" type="text" onkeydown="if (event.keyCode == 13) trans_L8();" title="type text and then 'Enter'"> <p>below is the return using 'get_element_value'</p> <h4 id="translation1"><!-- return appears here --></h4> <p>below is the return using 'add_action_callback' </p> <h4 id="translation2"><!-- return appears here --></h4> <p>below is the return of the query 'add_action_callback' == 'get_element_value'</p> <h4 id="translation3"><!-- return appears here --></h4> <script type="text/javascript" charset="UTF-8"> function trans_L8() { window.location = 'skp;trans_L8@'+#{@lang_hash.keys[0]}.id+','+(#{@lang_hash.keys[0]}.value); } </script> </body> </html> ) @dlg2.set_html html RUBY_PLATFORM =~ /(darwin)/ ? @dlg2.show_modal() ; @dlg2.show() @dlg2.add_action_callback("trans_L8") {|dialog, params| param_id = params.split(',')[0].to_s param_val = params.split(',')[1].to_s callback = 'return using add_action_callback ' # + param_id.to_s + ' = ' + param_val.to_s # commented out to avoid the error puts callback p(params) elm_val = (@dlg2.get_element_value(param_id)).to_s element_value = 'return using get_element_value => ' + elm_val puts element_value @dlg2.execute_script("translation1.textContent='#{(elm_val)}'") @dlg2.execute_script("translation2.textContent='#{param_val}'") @dlg2.execute_script("translation3.textContent='#{elm_val == param_val}'") } end show_problem # load("/Users/johns_iMac/Library/Application Support/SketchUp 2014/SketchUp/Plugins/jcb_ViewPortResize/dev/show_encoding_issue.rb") # load("[add your path]/show_encoding_issue.rb")
and a question? Is there code to get the Windows operating system details?
john
-
a CLUE perhaps...
Using js codepoints for my input, 'get_element_value' has switched all the separators...
return using add_action_callback "lid1,/uD83D/uDC7D/u20AC/u00A3/u0061/u0009" return using get_element_value => \uD83D\uDC7D\u20AC\u00A3\u0061\u0009
so I modified my code
@dlg2.add_action_callback("trans_L8") {|dialog, params| param_id = params.split(',')[0].to_s param_val = params.split(',')[1].to_s param_val_sub = params.split(',')[1].gsub('/','\\').to_s p param_val_sub callback = 'return using add_action_callback ' # + param_id.to_s + ' = ' + param_val.to_s # commented out to avoid the error puts callback p(params) elm_val = (@dlg2.get_element_value(param_id)).to_s element_value = 'return using get_element_value => ' + elm_val puts element_value @dlg2.execute_script("translation1.textContent='#{elm_val}'") @dlg2.execute_script("translation2.textContent='#{param_val}'") @dlg2.execute_script("translation3.textContent='#{param_val_sub}'") }
and this is the result
john
Advertisement