WebDialog encoding bug found!
-
@driven said:
@aerilius said:
...Then one can use
get_element_value
to fetch user input of any arbitrary character range.except,
get_element_value
fails pre v2013 andget_element_value
works.I got a few old scripts I was trying to update, and was unable to figure out the issue...
I think this shows it was broken to begin and got broken 'differently' with a fix...
I'm more inclined, to look at using 'unicodeEscape' in the js... before retrieving by either method...
john
It's possible to get deeply confused trying to sort this out
Here's what I have observed:The UTF-8 byte sequence for élan is \xC3\xA9lan (that is, UTF-8 encodes é as the two-byte sequence \xC3\xA9).
On Mac, get_element_value returns this byte sequence exactly in both SU8 and 2014. However, because Ruby 8 thinks it is a 5-character string in ASCII-8BIT, it treats \xC3 as à and \xA9 as , the extended ASCII interpretations of these bytes. Ruby 2.0 happily assumes it is UTF-8 and gets it correct.
The action callback parameters are handled differently and inconsistently between Windows and Mac.
On the Mac, in SU8 the action callback also returns the original UTF-8 5-byte sequence, but somehow when Ruby prints it as a string, it gets it right Further, it believes this string and the one from get_element_value (that prints as Ãlan) are equal. This makes no sense to me...
But in 2014, the original 5-byte UTF-8 string is somehow transmuted into the 8-byte string \xE2\x88\x9A\xC2\xA9lan. Note that \xE2\x88\x9A is the UTF-8 for the square-root sign √ and \xC2\xA9 is UTF-8 for . So it appears the translation of the copyright is an attempt to handle misreading of the UTF-8 as ASCII-8BIT, but I don't know where that square root sign byte-sequence came from since as I mentioned above, \xC3 is à in ASCII-8BIT, which is \xC3\x83 in UTF-8. It's as if the callback processing code has an incorrect implementation of the transcoding.
And then, as juantx0 reports, SU8 on Windows 7 returns the 5 UTF-8 bytes unconverted yet again somehow manages to make sense of it in both cases .
-
@slbaumgartner said:
... Further, it believes this string and the one from get_element_value (that prints as Ãlan) are equal. This makes no sense to me...
I create this confusion... I don't check for equality it's a 'puts', it should be
@dlg2.execute_script("translation3.textContent='#{param_val}' + ' == ' + '#{elm_val}' + ' is ' + '#{param_val == elm_val}'")
which returns
√©lan == élan is false
sorry I'll get my coat...
john
-
Sorry, I didn't update my profile.
In Windows SketchUp 2013 p(params) returns
"lid1,élan"
so works fine. (I think in SU8 also)
Problem is in Windows SU2014 that returns non utf string. -
@juantxo said:
Sorry, I didn't update my profile.
I suspected that...
which locale do you use
Sketchup.get_locale
en-US?, it may have a bearing?john
-
Yes, en-Us.
-
@driven said:
@slbaumgartner said:
... Further, it believes this string and the one from get_element_value (that prints as Ãlan) are equal. This makes no sense to me...
I create this confusion... I don't check for equality it's a 'puts', it should be
@dlg2.execute_script("translation3.textContent='#{param_val}' + ' == ' + '#{elm_val}' + ' is ' + '#{param_val == elm_val}'") >
which returns
√©lan == élan is false
sorry I'll get my coat...
john
Nah, I should know enough to check your code . With a real test, equality fails in all cases (as it should!).
-
@slbaumgartner said:
Nah, I should know enough to check your code :oops: . With a real test, equality fails in all cases (as it should!).
it's the tangents that count...
now you can write a proper test...
john
-
The text that displays in your test, says UFT-8 in many places. It is UTF-8.
(The word format is at the end, of Unicode Transformation Format.)(nag, nag)
-
@slbaumgartner said:
But in 2014, the original 5-byte UTF-8 string is somehow transmuted into the 8-byte string \xE2\x88\x9A\xC2\xA9lan. ... It's as if the callback processing code has an incorrect implementation of the transcoding.
YES, agree.
To me it looks like it IS UTF-8, but Ruby thinks it is some other encoding, and doubly transcodes* it into UTF-8 AGAIN.
- P.S. - isn't transmute a math term ?
-
@driven said:
except,
get_element_value
fails pre v2013 andget_element_value
works.? It does? Are you sure?
-
@dan rathbun said:
To me it looks like it IS UTF-8, but Ruby thinks it is some other encoding, and doubly transcodes* it into UTF-8 AGAIN.
That's a similar problem we have with FILE $LOAD_PATH, $LOADED_FEATURES and ENV. UTF-8 byte sequences isn't labeled with the correct encoding.
-
@dan rathbun said:
@slbaumgartner said:
But in 2014, the original 5-byte UTF-8 string is somehow transmuted into the 8-byte string \xE2\x88\x9A\xC2\xA9lan. ... It's as if the callback processing code has an incorrect implementation of the transcoding.
YES, agree.
To me it looks like it IS UTF-8, but Ruby thinks it is some other encoding, and doubly transcodes* it into UTF-8 AGAIN.
- P.S. - isn't transmute a math term ?
I was thinking more of alchemy
-
@tt_su said:
? It does? Are you sure?
It certainly looks like it...
I cleaned up my initial code an per the Nanny and the Professors prodding...
and these are the new images...
and the new script# encoding; UTF-8 def show_problem @lang_hash = {'lid1'=>"élan"} # I'm using a hash because it's what I use in my plugin... @dlg2 = UI;;WebDialog.new("Problem_Main", false,"main_prob", 700, 500, 600, 0, false) html = %Q( <!DOCTYPE html> <html> <head> <title>Problem_Main</title> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body> <p>Tested using Sketchup.version #{Sketchup.version}</p> <p>Tested on #{RUBY_PLATFORM =~ /(darwin)/ ? ((%x(sw_vers).sub(/ProductName;/,'').sub(/ProductVersion;/,'').sub(/BuildVersion;/,'_'))) ; 'windows'}</p> <p>click into the box and hit return or change to use another non ASCII-8BIT string...</p> <input id="#{@lang_hash.keys[0]}" value="#{@lang_hash.values[0]}" type="text" onkeydown="if (event.keyCode == 13) trans_L8();" title="type text and then 'Enter'"> <p>below is the return using 'get_element_value'</p> <h4 id="translation1"><!-- return appears here --></h4> <p>below is the return using 'add_action_callback' </p> <h4 id="translation2"><!-- return appears here --></h4> <p>below is the return of the query 'add_action_callback' == 'get_element_value'</p> <h4 id="translation3"><!-- return appears here --></h4> <script type="text/javascript" charset="UTF-8"> function trans_L8() { window.location = 'skp;trans_L8@'+#{@lang_hash.keys[0]}.id+','+(#{@lang_hash.keys[0]}.value); } </script> </body> </html> ) @dlg2.set_html html RUBY_PLATFORM =~ /(darwin)/ ? @dlg2.show_modal() ; @dlg2.show() @dlg2.add_action_callback("trans_L8") {|dialog, params| param_id = params.split(',')[0].to_s param_val = params.split(',')[1].to_s callback = 'return using add_action_callback ' # + param_id.to_s + ' = ' + param_val.to_s # commented out to avoid the error puts callback p(params) elm_val = (@dlg2.get_element_value(param_id)).to_s element_value = 'return using get_element_value => ' + elm_val puts element_value @dlg2.execute_script("translation1.textContent='#{(elm_val)}'") @dlg2.execute_script("translation2.textContent='#{param_val}'") @dlg2.execute_script("translation3.textContent='#{elm_val == param_val}'") } end show_problem # load("/Users/johns_iMac/Library/Application Support/SketchUp 2014/SketchUp/Plugins/jcb_ViewPortResize/dev/show_encoding_issue.rb") # load("[add your path]/show_encoding_issue.rb")
and a question? Is there code to get the Windows operating system details?
john
-
a CLUE perhaps...
Using js codepoints for my input, 'get_element_value' has switched all the separators...
return using add_action_callback "lid1,/uD83D/uDC7D/u20AC/u00A3/u0061/u0009" return using get_element_value => \uD83D\uDC7D\u20AC\u00A3\u0061\u0009
so I modified my code
@dlg2.add_action_callback("trans_L8") {|dialog, params| param_id = params.split(',')[0].to_s param_val = params.split(',')[1].to_s param_val_sub = params.split(',')[1].gsub('/','\\').to_s p param_val_sub callback = 'return using add_action_callback ' # + param_id.to_s + ' = ' + param_val.to_s # commented out to avoid the error puts callback p(params) elm_val = (@dlg2.get_element_value(param_id)).to_s element_value = 'return using get_element_value => ' + elm_val puts element_value @dlg2.execute_script("translation1.textContent='#{elm_val}'") @dlg2.execute_script("translation2.textContent='#{param_val}'") @dlg2.execute_script("translation3.textContent='#{param_val_sub}'") }
and this is the result
john -
So it seems there are in-fact two issues...
1: return having separators reversed
2: encoding being mis-read
I added the
.gsub('/','\\')
to both types of return, so it works on v8 or v13/v14I added js to convert the string to UNICODE on keydown, before it retrieved by SU...
and the full workaround...
# encoding; UTF-8 def show_problem @lang_hash = {'lid1'=> %Q(élan ümlet)} # I'm using a hash because it's what I use in my plugin... @dlg2 = UI;;WebDialog.new("Problem_Main", false,"main_prob", 700, 500, 600, 0, false) html = %Q( <!DOCTYPE html> <html> <head> <title>Problem_Main</title> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body> <p>Tested using Sketchup.version #{Sketchup.version}</p> <p>Tested on #{RUBY_PLATFORM =~ /(darwin)/ ? ((%x(sw_vers).sub(/ProductName;/,'').sub(/ProductVersion;/,'').sub(/BuildVersion;/,'_'))) ; 'windows'}</p> <p>click into the box and hit return or change to use another non ASCII-8BIT string...</p> <input id="#{@lang_hash.keys[0]}" value="#{@lang_hash.values[0]}" type="text" onkeydown=" if (event.keyCode == 13) this.value=unicodeLiteral(this.value);trans_L8();" title="type text and then 'Enter'"> <p>below is the return using 'get_element_value'</p> <h4 id="translation1"><!-- return appears here --></h4> <p>below is the return using 'add_action_callback' </p> <h4 id="translation2"><!-- return appears here --></h4> <h4 id="translation3"><!-- return appears here --></h4> <script type="text/javascript" charset="UTF-8"> function trans_L8() { window.location = 'skp;trans_L8@'+#{@lang_hash.keys[0]}.id+','+(#{@lang_hash.keys[0]}.value); } /* Creates a uppercase hex number with at least length digits from a given number */ function fixedHex(number, length){ var str = number.toString(16).toUpperCase(); while(str.length < length) str = "0" + str; return str; } /* Creates a unicode literal based on the string */ function unicodeLiteral(str){ var i; var result = ""; for( i = 0; i < str.length; ++i){ /* You should probably replace this by an isASCII test */ if(str.charCodeAt(i) > 126 || str.charCodeAt(i) < 32) result += "\\\\" + "u" + fixedHex(str.charCodeAt(i),4); else result += str[i]; } return result; } </script> </body> </html> ) @dlg2.set_html html RUBY_PLATFORM =~ /(darwin)/ ? @dlg2.show_modal() ; @dlg2.show() @dlg2.add_action_callback("trans_L8") {|dialog, params| param_id = params.split(',')[0].to_s param_val = params.split(',')[1].gsub('/',"\\").to_s callback = 'return using add_action_callback ' # + param_id.to_s + ' = ' + param_val.to_s # commented out to avoid the error puts callback p(params) elm_val = (@dlg2.get_element_value(param_id)).gsub('/',"\\").to_s element_value = 'return using get_element_value => ' + elm_val puts element_value @dlg2.execute_script("translation1.textContent='#{elm_val}'") @dlg2.execute_script("translation2.textContent='#{param_val}'") } end show_problem # load("/Users/johns_iMac/Library/Application Support/SketchUp 2014/SketchUp/Plugins/jcb_ViewPortResize/dev/show_encoding_issue.rb") # load("[add your path]/show_encoding_issue.rb")
BIG question is will this work on other PC's, or would I need a platform conditional?
In review,
I think the http header has the incorrect encoding and as SU is the server it must come from there...
The internal 'separator' variations is still a mystery...john
-
@driven said:
... and a question? Is there code to get the Windows operating system details?
Get the default system Encoding:
Encoding::find("filesystem")
or
Encoding::find("locale")
(On my machine it returns the
#<Encoding:Windows-1252>
object reference.)If you want the Windows version:
%x[ver]
On my machine it returns:
"Microsoft Windows [Version 6.1.7601]"
XP is 5.1
Vista is 6.0
Win7 is 6.1What else do you want to know ?
-
@dan rathbun said:
What else do you want to know ?
does my last script run on your PC?
if swap these in, can you show me a screen shot?
a more complex input...@lang_hash = {'lid1'=> %Q(élan 勢い Schwung импульс)} # I'm using a hash because it's what I use in my plugin...
and the widows versioning...
<p>Tested on #{RUBY_PLATFORM =~ /(darwin)/ ? ((%x(sw_vers).sub(/ProductName;/,'').sub(/ProductVersion;/,'').sub(/BuildVersion;/,'_'))) ; %x[ver]}</p>
also, did you try it without the fix? did you get both returns on v2014?
john
-
I had not actually run the code at all, before.
Here goes:@driven said:
..., did you try it without the fix? did you get both returns on v2014?
On SU2014, without the fix:
load "test/WebDialog_param_encoding_bug.rb" %(#004000)[true return using add_action_callback lid1 = élan return using get_element_value => élan]
- After hitting return the elements are NOT replaced in the WebDialog because
%(#8000BF)[element.textContent=]
does not work on MSIE.
You need a platform code branch and use%(#8000BF)[element.innerText=]
on PC.
On SU2014, WITH the fix, AFTER hitting return:
load "test/WebDialog_param_encoding_fix.rb" %(#004000)[true return using add_action_callback "lid1,\\u00E9lan \\uF8FF \\u00FCmlet" return using get_element_value => \u00E9lan \uF8FF \u00FCmlet]
First setting:
Encoding::default_internal="UTF-8" %(#404000)[UTF-8]
.. has not effect (no difference.) - After hitting return the elements are NOT replaced in the WebDialog because
-
With the two changes on SU2014.
Before hitting ENTER:
.. but when I click in the text control, and hit the END key, the ruby console shows:
%(#004000)[return using add_action_callback "lid1,\xC3\xA9lan \xE5\x8B\xA2\xE3\x81\x84 Schwung \xD0\xB8\xD0\xBC\xD0\xBF\xD1\x83\xD0\xBB\xD1\x8C\xD1\x81" return using get_element_value => élan 勢い Schwung импульс]
After hitting ENTER:
.. and then the Ruby console shows:
%(#004000)[return using add_action_callback "lid1,\\u00E9lan \\u52E2\\u3044 Schwung \\u0438\\u043C\\u043F\\u0443\\u043B\\u044C\\u0441" return using get_element_value => \u00E9lan \u52E2\u3044 Schwung \u0438\u043C\u043F\u0443\u043B\u044C\u0441]
-
thanks dan
@unknownuser said:
You need a platform code branch and use element.innerText= on PC.
innerText works on mac, so I'll just change that...
can you run with the tweak...
and see if they show in the dialog...the extra // in your console return confuses me. I had to add more to escape ruby escaping javascript...
john
Advertisement