sketchucation logo sketchucation
    • Login
    🤑 SketchPlus 1.3 | 44 Tools for $15 until June 20th Buy Now

    Ruby 2.0 __FILE__ contains incorrect encoding.

    Scheduled Pinned Locked Moved Developers' Forum
    18 Posts 6 Posters 3.5k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • tt_suT Offline
      tt_su
      last edited by

      Just a heads up on the usage of FILE

      Given a username "Zé" where the user folder is C:/Users/Zé/

      Using load or require based on any string from FILE with cause an error similar to this:
      Error: #<LoadError: No such file or directory -- C:/Users/Zé/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/sketchup-stl/SKUI/embed_skui.rb> C:/Users/Zé/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/sketchup-stl/exporter.rb:14:inload'`

      (Full error: https://github.com/SketchUp/sketchup-stl/issues/134)

      The reason is that FILE returns a string with incorrect encoding, in my case: #Encoding:Windows-1252
      The string does contain the bytes of an UTF-8 string however, so you cannot convert it - that's why you see the odd characters there, because Ruby had tried to convert the string from Windows-1252 to UTF-8, which is wrong.

      Instead you have to force the encoding to be UTF-8.

      
      current_path = File.dirname(__FILE__)
      if current_path.respond_to?(;force_encoding)
        current_path.force_encoding("UTF-8")
      end
      
      

      Attaching example to reproduce and work aroun.


      encoding_test.zip

      1 Reply Last reply Reply Quote 0
      • S Offline
        slbaumgartner
        last edited by

        This issue may be Windows only or maybe only certain system configurations. The FILE I get on my Mac is already UTF-8.

        Steve

        1 Reply Last reply Reply Quote 0
        • A Offline
          Aerilius
          last edited by

          I don't understand why such encodings are not yet extinct. For me everything is utf8, well except maybe things connected with Windows.
          I just tested on a Windows7 (6.1.7600 Ultimate, 32bit, locale: en-US, VirtualBox), and the encoding test succeeded perfectly in all cases.
          Could we narrow down under what circumstances it fails?

          1 Reply Last reply Reply Quote 0
          • tt_suT Offline
            tt_su
            last edited by

            @slbaumgartner said:

            This issue may be Windows only or maybe only certain system configurations. The FILE I get on my Mac is already UTF-8.

            Indeed, it's a Windows issue. It turns out that Ruby isn't declaring itself as a Unicode application despite it's claimed support. And it's not always calling Unicode version of the Windows file functions - it's calling legacy ASCII versions in some scenarios.

            We're working on mapping out where and when this happens.

            @aerilius said:

            I don't understand why such encodings are not yet extinct. For me everything is utf8, well except maybe things connected with Windows.
            I just tested on a Windows7 (6.1.7600 Ultimate, 32bit, locale: en-US, VirtualBox), and the encoding test succeeded perfectly in all cases.
            Could we narrow down under what circumstances it fails?

            Native Windows? Or Linux and Wine?
            What tests did you do?

            I've been working on an updated version of our diagnostics tool to include the encoding data for the environment data.

            So far I see FILE, ENV, $LOAD_PATH and $LOADED_FEATURES yield inconsistent and some times incorrect encoding labels.

            1 Reply Last reply Reply Quote 0
            • tt_suT Offline
              tt_su
              last edited by

              @aerilius said:

              I just tested on a Windows7 (6.1.7600 Ultimate, 32bit, locale: en-US, VirtualBox), and the encoding test succeeded perfectly in all cases.

              Succeeded in what sense? did it succeed in all examples?
              In my testing it fails when you feed strings from FILE directly into require or load. But if you force the encoding then it succeed.

              I consider it a failure that you have to force the encoding. Because that means Ruby is messing up stuff. If the strings where marked with proper encoding it should have been able to correctly transpose the string.

              Another issue is that C Extensions doesn't load with Unicode paths. 😞
              I found a workaround where I was able to call the Win32 API to convert the folder part of the path into short format (DOS 8+3 style) - which indicate that for loading the SO files Ruby calls the ASCII version of the file functions.

              I'll come back with more detail as I've dug into this deeper. But for now I just wanted to give the heads up so you know what'd going on. We've suddenly started to get several reports of this now.

              1 Reply Last reply Reply Quote 0
              • Dan RathbunD Offline
                Dan Rathbun
                last edited by

                @tt_su said:

                @aerilius said:

                I consider it a failure that you have to force the encoding. Because that means Ruby is messing up stuff. If the strings where marked with proper encoding it should have been able to correctly transpose the string.

                Well set the default bleepin' encodin' then... 😛

                Encoding_default_internal.png

                I'm not here much anymore.

                1 Reply Last reply Reply Quote 0
                • fredo6F Offline
                  fredo6
                  last edited by

                  Thomthom,

                  I use FILE in my plugins (actually mostly in the top rb files. I also use $LOAD_PATH in a few places.

                  Although many users seem to have no problem (but how many do use a non-ascii username?), it is good to understand if the problem is critical and require an urgent fix.

                  What do you think?

                  Fredo

                  1 Reply Last reply Reply Quote 0
                  • A Offline
                    Aerilius
                    last edited by

                    When it fails, it would show something different (?). This is what I get:

                    
                    C;/Users/Administrator/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/encoding_test/hello.rb
                    > Hello World
                    require succeeded!
                    
                    > Hello World
                    load succeeded!
                    
                    Forcing UTF-8 encoding...
                    
                    require succeeded!
                    
                    

                    „Native Windows? Or Linux and Wine?“
                    If you had a Linux version of SU (nice!), if SU2014 ran in Wine (nice!), otherwise for all my bug reports I test on a native Windows in a virtual machine. Linux is the luxury and comfort of a Mac with the freedom to work on any hardware. But I have to accept what I can get from SketchUp. I wished SketchUp would become kindof usable on higher-dpi screens. All my other applications scale perfectly. I need patience, at least I'm still with SketchUp after years because I'm addicted.

                    1 Reply Last reply Reply Quote 0
                    • Dan RathbunD Offline
                      Dan Rathbun
                      last edited by

                      Ruby does not support Windows-1258 encoding. It is a known bug.

                      @unknownuser said:

                      SketchUcation Tools 2.5](http://sketchucation.com/forums/viewtopic.php?f)":3vtmbnty]
                      @kienhp said:

                      Not run in skechUp 2014, why?

                      Because Vietnamese encoding is not supported by Ruby yet.

                      Bug # 7742 : System encoding (Windows-1258) is not recognized by Ruby to convert back to UTF-8

                      One user changed his Regional and Langauge settings to US English to overcome this:
                      StackOverflow : Error installing Rubygems on ruby command prompt in Win7

                      🤓

                      I'm not here much anymore.

                      1 Reply Last reply Reply Quote 0
                      • tt_suT Offline
                        tt_su
                        last edited by

                        @aerilius said:

                        When it fails, it would show something different (?). This is what I get:

                        You are logged in as "Administrator" - the issue occur when the username contain non-English characters.

                        1 Reply Last reply Reply Quote 0
                        • tt_suT Offline
                          tt_su
                          last edited by

                          @fredo6 said:

                          I use FILE in my plugins (actually mostly in the top rb files. I also use $LOAD_PATH in a few places.

                          Although many users seem to have no problem (but how many do use a non-ascii username?), it is good to understand if the problem is critical and require an urgent fix.

                          What do you think?

                          We're still digging into this. The Ruby Unicode issues under Windows is a deep rabbit hole.
                          On one side there is the wrong encoding begin returned for strings that contain UTF-8 data.
                          On the other side there is Ruby C Extensions that doesn't load with non-English characters.

                          Now, there might be cases where paths with non-english characters will load, if the system code page fits. Though we've not got around to test this yet. But on my machine, which is an English system with Windows-1252 code page - ASCII calls to file functions fail to load non-English characters.

                          Now, other users, say Japanese users, might have a Japenese code page configured for their machine and they might experience that having a Japanese username works for them - even though it fails for me.

                          Encoding under Windows is a jumble. Under OSX it's all fine because it all UTF-8 by default - even the file functions.

                          As I mentioned, we're still digging into this and we'll come back with more info. I wouldn't necessarily start updating scripts right away. We don't know what's the best recommendation yet. But just so you are aware there is a known issue that's being investigated - and it can crop up differently from machine to machine.

                          1 Reply Last reply Reply Quote 0
                          • tt_suT Offline
                            tt_su
                            last edited by

                            @driven said:

                            file = __FILE__
                            > file_encoding = file.encoding.name
                            > puts file_encoding.to_s
                            > if not file_encoding.valid_encoding?
                            > puts "bad encoding, fixing"
                            > data = File.open(file, (file_encoding + ';utf-8')).read
                            > else
                            > puts "nothing to fix???"
                            > data = File.open(file).read
                            > end
                            

                            I don't quite understand what your code snippet is doing here. It seems like you're mixing up the encoding of the filename string with the encoding of the File object.

                            And this: file_encoding + ':utf-8' - what is that doing? Appending UTF-8 to another encoding declaration?

                            @driven said:

                            so whats going on???

                            I don't know. 😕 Can you explain what you are testing with that snippet?

                            1 Reply Last reply Reply Quote 0
                            • D Offline
                              driven
                              last edited by

                              I have a test file encoded in Vietnamesse(Windows) with CRLF
                              using unix I can see ...

                               
                              > %x(file "path../dev/file•enc•test.rb")
                              path../dev/file•enc•test.rb; ASCII text, with CRLF line terminators
                              

                              In SU...

                              load("path../dev/file•enc•test.rb")
                              UTF-8
                              nothing to fix???
                              true
                              

                              EDIT.... added missing notation [ 'r:' ]

                              file = __FILE__
                              file_encoding = file.encoding.name
                              puts file_encoding.to_s
                              if not file_encoding.valid_encoding?
                              puts "bad encoding, fixing"
                              # specify both external and internal encodings
                              data = File.open(file, ('r;' + file_encoding + ';utf-8')).read
                              else
                              puts "nothing to fix???"
                              data = File.open(file).read
                              end
                              

                              so whats going on???

                              john

                              learn from the mistakes of others, you may not live long enough to make them all yourself...

                              1 Reply Last reply Reply Quote 0
                              • D Offline
                                driven
                                last edited by

                                @tt_su said:

                                I don't know. :? Can you explain what you are testing with that snippet?

                                It was missing a critical piece of notation... [r:] I modified the snippet, will retest later...
                                I was trying to set both external and internal encoding before the read...
                                based on this
                                http://stackoverflow.com/questions/8610100/changing-character-encoding

                                john

                                learn from the mistakes of others, you may not live long enough to make them all yourself...

                                1 Reply Last reply Reply Quote 0
                                • Dan RathbunD Offline
                                  Dan Rathbun
                                  last edited by

                                  @tt_su said:

                                  And this: file_encoding + ':utf-8' - what is that doing? Appending UTF-8 to another encoding declaration?

                                  NO. John is building a Ruby v2+ IO Mode String.

                                  Many of the IO and subclasses (like File,) use it in methods, which transparently call IO::new within themselves. (Examples, are IO.read, File.read, File.open, etc.)

                                  So if you read the doc for IO.new, you'll see that wherever you previously used just the filemode ("r" "w+" "rb" etc.,) that argument can now have up to 3 sections:

                                  " *filemode* : *external_encoding* : *internal_encoding* "

                                  The online doc shows this example (among others.):

                                  open("transcoded.txt", "r;ISO-8859-1;UTF-8") do |io|
                                    puts "transcoded text;"
                                    p io.read
                                  end
                                  

                                  So the new mode string above is: "r:ISO-8859-1:UTF-8"

                                  Also for better readability, many of these methods that take that 1|2|3 part string (separated by colons,) can instead, take hash arguments, like this:

                                  open(
                                    "transcoded.txt", 
                                    ;mode => "r",
                                    ;external_encoding => "ISO-8859-1",
                                    ;internal_encoding => "UTF-8"
                                  ) do |io|
                                    puts "transcoded text;"
                                    p io.read
                                  end
                                  

                                  In some cases you can use an or'ed encoding string, like: "BOM|UTF-8"

                                  But really ya'll should read the extensive info now listed under IO::new() AND the extensive information at the top of the Encoding class page.

                                  💭

                                  I'm not here much anymore.

                                  1 Reply Last reply Reply Quote 0
                                  • fredo6F Offline
                                    fredo6
                                    last edited by

                                    @tt_su said:

                                    As I mentioned, we're still digging into this and we'll come back with more info. I wouldn't necessarily start updating scripts right away. We don't know what's the best recommendation yet. But just so you are aware there is a known issue that's being investigated - and it can crop up differently from machine to machine.

                                    OK. I'll wait.

                                    But can you confirm that your suggested fix is safe anyway.

                                    Fredo

                                    1 Reply Last reply Reply Quote 0
                                    • tt_suT Offline
                                      tt_su
                                      last edited by

                                      @fredo6 said:

                                      But can you confirm that your suggested fix is safe anyway.

                                      That's why I ask that people to hold. We need to perform some testing.

                                      1 Reply Last reply Reply Quote 0
                                      • D Offline
                                        driven
                                        last edited by

                                        I started a new encoding bug topic incase it's unrelated...
                                        http://sketchucation.com/forums/viewtopic.php?f=180%26amp;t=57074#p518534
                                        can you all have a look?
                                        john

                                        learn from the mistakes of others, you may not live long enough to make them all yourself...

                                        1 Reply Last reply Reply Quote 0
                                        • 1 / 1
                                        • First post
                                          Last post
                                        Buy SketchPlus
                                        Buy SUbD
                                        Buy WrapR
                                        Buy eBook
                                        Buy Modelur
                                        Buy Vertex Tools
                                        Buy SketchCuisine
                                        Buy FormFonts

                                        Advertisement