• Login
sketchucation logo sketchucation
  • Login
🤑 SketchPlus 1.3 | 44 Tools for $15 until June 20th Buy Now

WebDialog encoding bug found!

Scheduled Pinned Locked Moved Developers' Forum
44 Posts 10 Posters 2.0k Views 10 Watching
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D Offline
    driven
    last edited by driven 31 Mar 2014, 14:56

    I have started a new topic because I although I suspect this effects more than just WebDialogs and possibly other PC's, I can only test it on my mac...

    I'll start with a game of 'spot the diffence'v8 with the 'fixed' code
    v13v14
    then the script

    # encoding; UTF-8
    def show_problem
      @lang_hash = {'lid1'=>"élan"}
      @dlg2 = UI;;WebDialog.new("Problem_Main", false,"main_prob", 700, 500, 600, 0, false);
      html = %Q(
    <html>
      <head>
      <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
      </head>
      <body>
        <p>mac's like UFT-8, on Sketchup.version #{Sketchup.version}</p>
        <p>HTML likes UFT-8, and this html has meta content="text/html; charset=utf-8" http-equiv="Content-Type"</p>
        <p>Ruby 2 likes UFT-8, and this script has the magic comment # encoding; UTF-8</p>
        <p>so where this encoding error coming from?</p>
        <p>Use élan or change to use another non ASCII-8BIT string...</p>
        <p>click into the box and hit return</p>
        <input id="#{@lang_hash.keys[0]}" value="#{@lang_hash.values[0]}" type="text" onkeydown="if (event.keyCode == 13) trans_L8();" title="type text and then 'Enter'">
        <p>below should be return using add_action_callback in UFT-8</p>
        <h4 id="translation1"><!-- return appears here --></h4>
    
        <p>below should be return using get_element_value in UFT-8</p>
        <h4 id="translation2"><!-- return appears here --></h4>
    
        <p>both should appear on next line... UNLESS one is not UFT-8</p>
        <h4 id="translation3"><!-- return appears here --></h4>
    
        <script>
          function trans_L8() {
            window.location = 'skp;trans_L8@'+#{@lang_hash.keys[0]}.id+','+#{@lang_hash.keys[0]}.value;
          }
    
        </script>
      </body>
    </html>
    )
    
      @dlg2.set_html html
         RUBY_PLATFORM =~ /(darwin)/ ? @dlg2.show_modal() ; @dlg2.show()
    
      @dlg2.add_action_callback("trans_L8") {|dialog, params|
        param_id = params.split(',')[0].to_s
        param_val = params.split(',')[1].to_s
        #
        #param_convert_val = params.split(',')[1].force_encoding("UTF-16LE").encode("UTF-8").to_s
        #
        elm_val = (@dlg2.get_element_value(param_id)).to_s
        callback = 'return using add_action_callback ' + param_id.to_s + ' = ' + param_val.to_s
        element_value = 'return using get_element_value => ' + elm_val
        puts callback
        puts element_value
        @dlg2.execute_script("translation2.textContent='#{(elm_val)}'")
        @dlg2.execute_script("translation1.textContent='#{param_val}'")
        if (Sketchup.version.to_i > 13) && ( not param_val == elm_val)
        @dlg2.execute_script("translation3.textContent='Error; #<Encoding;;CompatibilityError; incompatible character encodings; ASCII-8BIT and UTF-8>'")
        else
        @dlg2.execute_script("translation3.textContent='#{param_val}' + ' == ' + '#{elm_val}' + ' is ' + '#{param_val == elm_val}'")
        end
    }
    end
    show_problem
    # load("[add your path]/show_encoding_issue.rb")
    
    

    I have the script saved as a file and load from 'Ruby Console'.
    I get an error in ALL of these versions of SU.
    however, it's a different error...

    EDITED to fix minor error and update images...

    it depends on which you use...
    .get_element_value[/ruby] and [ruby:zxzpw0wk].add_action_callback[/ruby:zxzpw0wk].

    I really think this is a bug... but maybe my test is flawed [it happens]

    can people check my code and test to confirm.

    john

    another tweak as the cracks emerge... still the bug remains...

    learn from the mistakes of others, you may not live long enough to make them all yourself...

    1 Reply Last reply Reply Quote 0
    • S Offline
      slbaumgartner
      last edited by 31 Mar 2014, 16:21

      John,

      I haven't tested 1.8 yet, but I can reproduce your results in 2014. By the way, your code has param_val and elm_val swapped in the lines that put them into translation1 and translation2, which makes the display misleading. The bug in 2014 is that the action callback is returning the parameters ASCII-8BIT encoded even though everything else is UTF-8.

      1 Reply Last reply Reply Quote 0
      • D Offline
        driven
        last edited by 31 Mar 2014, 16:38

        Cheers Steve,

        I fixed it and re-ran on three SU versions...

        now I have Three different results from the identical script...

        go figure...

        john

        learn from the mistakes of others, you may not live long enough to make them all yourself...

        1 Reply Last reply Reply Quote 0
        • S Offline
          slbaumgartner
          last edited by 31 Mar 2014, 16:41

          I managed to fire up SU8 and I also confirm your results there (given the correction to which value displays where). What is happening is that the action callback converts its values to ASCII-8BIT when sending them from javascript to Ruby. In SU8 (Ruby 1.8) this works because that's what Ruby 1.8 expects. In SU 2014, it causes an issue because Ruby 2.0 expects UTF-8 and receives ASCII-8BIT. You can print out the encodings for the values in 2.0 to confirm that is what is happening.

          Steve

          1 Reply Last reply Reply Quote 0
          • S Offline
            slbaumgartner
            last edited by 31 Mar 2014, 16:45

            to further the analysis, get_element_value does not do any conversion. Since you loaded the javascript side on the mac with UTF-8, that makes 2014 happy but messes up SU8!

            1 Reply Last reply Reply Quote 0
            • JuantxoJ Offline
              Juantxo
              last edited by 31 Mar 2014, 16:50

              In Windows 7 p(params) returns non utf-8
              "lid1,\xC3\xA9lan"
              and console shows:
              return using add_action_callback lid1 = élan
              return using get_element_value => élan

              1 Reply Last reply Reply Quote 0
              • D Offline
                driven
                last edited by 31 Mar 2014, 16:59

                which one is affected appears to have changed between v8 and v2013, and it's a different charset

                and now causes the encoding error message ONLY if you try and combine strings...

                so v2014 can't print the last line the same as v2013

                @Jauntxo does it through a coding error if you coment out

                    #if (Sketchup.version.to_i > 13) && ( not param_val == elm_val)
                   # @dlg2.execute_script("translation3.textContent='Error; #<Encoding;;CompatibilityError; incompatible character encodings; ASCII-8BIT and UTF-8>'")
                   # else
                    @dlg2.execute_script("translation3.textContent='#{param_val}' + ' == ' + '#{elm_val}'")
                   # end
                

                learn from the mistakes of others, you may not live long enough to make them all yourself...

                1 Reply Last reply Reply Quote 0
                • D Offline
                  driven
                  last edited by 31 Mar 2014, 17:05

                  @juantxo said:

                  In Windows 7 p(params) returns non utf-8
                  "lid1,\xC3\xA9lan"..

                  ON mac p(params)
                  "lid1,\xE2\x88\x9A\xC2\xA9lan"

                  why?

                  learn from the mistakes of others, you may not live long enough to make them all yourself...

                  1 Reply Last reply Reply Quote 0
                  • S Offline
                    slbaumgartner
                    last edited by 31 Mar 2014, 17:21

                    @driven said:

                    @juantxo said:

                    In Windows 7 p(params) returns non utf-8
                    "lid1,\xC3\xA9lan"..

                    ON mac p(params)
                    "lid1,\xE2\x88\x9A\xC2\xA9lan"

                    why?

                    We are seeing the raw 8-bit byte sequences of the string representations and the chaos that results when Ruby and SU try to convert character encodings without knowing for sure what they had.

                    1 Reply Last reply Reply Quote 0
                    • A Offline
                      Aerilius
                      last edited by 31 Mar 2014, 17:33

                      I didn't know the ASCII-8BIT encoding issue before, but it's only the cherry on the cake.

                      Along with the issues we have found in the other webdialog discussions (url encoding, url length limit, dropped backslashes), I recommend not to transfer user input or arbitrary text through action callbacks, only simple method names/indentifiers with a limited character range. Then one can use get_element_value to fetch user input of any arbitrary character range.

                      1 Reply Last reply Reply Quote 0
                      • D Offline
                        driven
                        last edited by 31 Mar 2014, 17:47

                        @aerilius said:

                        ...Then one can use get_element_value to fetch user input of any arbitrary character range.

                        except, get_element_value fails pre v2013 and get_element_value works.

                        I got a few old scripts I was trying to update, and was unable to figure out the issue...

                        I think this shows it was broken to begin and got broken 'differently' with a fix...

                        I'm more inclined, to look at using 'unicodeEscape' in the js... before retrieving by either method...

                        john

                        learn from the mistakes of others, you may not live long enough to make them all yourself...

                        1 Reply Last reply Reply Quote 0
                        • S Offline
                          slbaumgartner
                          last edited by 31 Mar 2014, 20:58

                          @driven said:

                          @aerilius said:

                          ...Then one can use get_element_value to fetch user input of any arbitrary character range.

                          except, get_element_value fails pre v2013 and get_element_value works.

                          I got a few old scripts I was trying to update, and was unable to figure out the issue...

                          I think this shows it was broken to begin and got broken 'differently' with a fix...

                          I'm more inclined, to look at using 'unicodeEscape' in the js... before retrieving by either method...

                          john

                          It's possible to get deeply confused trying to sort this out
                          Here's what I have observed:

                          The UTF-8 byte sequence for élan is \xC3\xA9lan (that is, UTF-8 encodes é as the two-byte sequence \xC3\xA9).

                          On Mac, get_element_value returns this byte sequence exactly in both SU8 and 2014. However, because Ruby 8 thinks it is a 5-character string in ASCII-8BIT, it treats \xC3 as à and \xA9 as ©, the extended ASCII interpretations of these bytes. Ruby 2.0 happily assumes it is UTF-8 and gets it correct.

                          The action callback parameters are handled differently and inconsistently between Windows and Mac.

                          On the Mac, in SU8 the action callback also returns the original UTF-8 5-byte sequence, but somehow when Ruby prints it as a string, it gets it right 😲 Further, it believes this string and the one from get_element_value (that prints as élan) are equal. This makes no sense to me...

                          But in 2014, the original 5-byte UTF-8 string is somehow transmuted into the 8-byte string \xE2\x88\x9A\xC2\xA9lan. Note that \xE2\x88\x9A is the UTF-8 for the square-root sign √ and \xC2\xA9 is UTF-8 for ©. So it appears the translation of the copyright is an attempt to handle misreading of the UTF-8 as ASCII-8BIT, but I don't know where that square root sign byte-sequence came from since as I mentioned above, \xC3 is à in ASCII-8BIT, which is \xC3\x83 in UTF-8. It's as if the callback processing code has an incorrect implementation of the transcoding.

                          And then, as juantx0 reports, SU8 on Windows 7 returns the 5 UTF-8 bytes unconverted yet again somehow manages to make sense of it in both cases 😮 .

                          1 Reply Last reply Reply Quote 0
                          • D Offline
                            driven
                            last edited by 31 Mar 2014, 21:08

                            @slbaumgartner said:

                            ... Further, it believes this string and the one from get_element_value (that prints as élan) are equal. This makes no sense to me...

                            I create this confusion... I don't check for equality it's a 'puts', it should be

                                @dlg2.execute_script("translation3.textContent='#{param_val}' + ' == ' + '#{elm_val}' + ' is ' + '#{param_val == elm_val}'")
                            
                            

                            which returns
                            √©lan == élan is false

                            sorry I'll get my coat...

                            john

                            learn from the mistakes of others, you may not live long enough to make them all yourself...

                            1 Reply Last reply Reply Quote 0
                            • JuantxoJ Offline
                              Juantxo
                              last edited by 31 Mar 2014, 21:50

                              Sorry, I didn't update my profile.
                              In Windows SketchUp 2013 p(params) returns
                              "lid1,élan"
                              so works fine. (I think in SU8 also)
                              Problem is in Windows SU2014 that returns non utf string.

                              1 Reply Last reply Reply Quote 0
                              • D Offline
                                driven
                                last edited by 31 Mar 2014, 21:55

                                @juantxo said:

                                Sorry, I didn't update my profile.

                                I suspected that...

                                which locale do you use Sketchup.get_locale en-US?, it may have a bearing?

                                john

                                learn from the mistakes of others, you may not live long enough to make them all yourself...

                                1 Reply Last reply Reply Quote 0
                                • JuantxoJ Offline
                                  Juantxo
                                  last edited by 31 Mar 2014, 22:01

                                  Yes, en-Us.

                                  1 Reply Last reply Reply Quote 0
                                  • S Offline
                                    slbaumgartner
                                    last edited by 31 Mar 2014, 23:20

                                    @driven said:

                                    @slbaumgartner said:

                                    ... Further, it believes this string and the one from get_element_value (that prints as élan) are equal. This makes no sense to me...

                                    I create this confusion... I don't check for equality it's a 'puts', it should be

                                        @dlg2.execute_script("translation3.textContent='#{param_val}' + ' == ' + '#{elm_val}' + ' is ' + '#{param_val == elm_val}'")
                                    > 
                                    

                                    which returns
                                    √©lan == élan is false

                                    sorry I'll get my coat...

                                    john

                                    Nah, I should know enough to check your code 😳 . With a real test, equality fails in all cases (as it should!).

                                    1 Reply Last reply Reply Quote 0
                                    • D Offline
                                      driven
                                      last edited by 31 Mar 2014, 23:23

                                      @slbaumgartner said:

                                      Nah, I should know enough to check your code :oops: . With a real test, equality fails in all cases (as it should!).

                                      it's the tangents that count...

                                      now you can write a proper test...

                                      john

                                      learn from the mistakes of others, you may not live long enough to make them all yourself...

                                      1 Reply Last reply Reply Quote 0
                                      • Dan RathbunD Offline
                                        Dan Rathbun
                                        last edited by 1 Apr 2014, 00:16

                                        The text that displays in your test, says UFT-8 in many places. It is UTF-8 .
                                        (The word format is at the end, of Unicode Transformation Format.)

                                        (nag, nag) 😉

                                        I'm not here much anymore.

                                        1 Reply Last reply Reply Quote 0
                                        • Dan RathbunD Offline
                                          Dan Rathbun
                                          last edited by 1 Apr 2014, 05:02

                                          @slbaumgartner said:

                                          But in 2014, the original 5-byte UTF-8 string is somehow transmuted into the 8-byte string \xE2\x88\x9A\xC2\xA9lan. ... It's as if the callback processing code has an incorrect implementation of the transcoding.

                                          YES, agree.

                                          To me it looks like it IS UTF-8, but Ruby thinks it is some other encoding, and doubly transcodes* it into UTF-8 AGAIN.

                                          • P.S. - isn't transmute a math term ?

                                          I'm not here much anymore.

                                          1 Reply Last reply Reply Quote 0
                                          • 1
                                          • 2
                                          • 3
                                          • 1 / 3
                                          1 / 3
                                          • First post
                                            1/44
                                            Last post
                                          Buy SketchPlus
                                          Buy SUbD
                                          Buy WrapR
                                          Buy eBook
                                          Buy Modelur
                                          Buy Vertex Tools
                                          Buy SketchCuisine
                                          Buy FormFonts

                                          Advertisement