• Login
sketchucation logo sketchucation
  • Login
🤑 SketchPlus 1.3 | 44 Tools for $15 until June 20th Buy Now

Ruby 2.0 __FILE__ contains incorrect encoding.

Scheduled Pinned Locked Moved Developers' Forum
18 Posts 6 Posters 3.5k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T Offline
    tt_su
    last edited by 28 Mar 2014, 09:42

    Just a heads up on the usage of FILE

    Given a username "Zé" where the user folder is C:/Users/Zé/

    Using load or require based on any string from FILE with cause an error similar to this:
    Error: #<LoadError: No such file or directory -- C:/Users/Zé/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/sketchup-stl/SKUI/embed_skui.rb> C:/Users/Zé/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/sketchup-stl/exporter.rb:14:inload'`

    (Full error: https://github.com/SketchUp/sketchup-stl/issues/134 )

    The reason is that FILE returns a string with incorrect encoding, in my case: #Encoding:Windows-1252
    The string does contain the bytes of an UTF-8 string however, so you cannot convert it - that's why you see the odd characters there, because Ruby had tried to convert the string from Windows-1252 to UTF-8, which is wrong.

    Instead you have to force the encoding to be UTF-8.

    
    current_path = File.dirname(__FILE__)
    if current_path.respond_to?(;force_encoding)
      current_path.force_encoding("UTF-8")
    end
    
    

    Attaching example to reproduce and work aroun.


    encoding_test.zip

    1 Reply Last reply Reply Quote 0
    • S Offline
      slbaumgartner
      last edited by 28 Mar 2014, 12:29

      This issue may be Windows only or maybe only certain system configurations. The FILE I get on my Mac is already UTF-8.

      Steve

      1 Reply Last reply Reply Quote 0
      • A Offline
        Aerilius
        last edited by 28 Mar 2014, 15:23

        I don't understand why such encodings are not yet extinct. For me everything is utf8, well except maybe things connected with Windows.
        I just tested on a Windows7 (6.1.7600 Ultimate, 32bit, locale: en-US, VirtualBox), and the encoding test succeeded perfectly in all cases.
        Could we narrow down under what circumstances it fails?

        1 Reply Last reply Reply Quote 0
        • T Offline
          tt_su
          last edited by 28 Mar 2014, 17:20

          @slbaumgartner said:

          This issue may be Windows only or maybe only certain system configurations. The FILE I get on my Mac is already UTF-8.

          Indeed, it's a Windows issue. It turns out that Ruby isn't declaring itself as a Unicode application despite it's claimed support. And it's not always calling Unicode version of the Windows file functions - it's calling legacy ASCII versions in some scenarios.

          We're working on mapping out where and when this happens.

          @aerilius said:

          I don't understand why such encodings are not yet extinct. For me everything is utf8, well except maybe things connected with Windows.
          I just tested on a Windows7 (6.1.7600 Ultimate, 32bit, locale: en-US, VirtualBox), and the encoding test succeeded perfectly in all cases.
          Could we narrow down under what circumstances it fails?

          Native Windows? Or Linux and Wine?
          What tests did you do?

          I've been working on an updated version of our diagnostics tool to include the encoding data for the environment data.

          So far I see FILE, ENV, $LOAD_PATH and $LOADED_FEATURES yield inconsistent and some times incorrect encoding labels.

          1 Reply Last reply Reply Quote 0
          • T Offline
            tt_su
            last edited by 28 Mar 2014, 17:26

            @aerilius said:

            I just tested on a Windows7 (6.1.7600 Ultimate, 32bit, locale: en-US, VirtualBox), and the encoding test succeeded perfectly in all cases.

            Succeeded in what sense? did it succeed in all examples?
            In my testing it fails when you feed strings from FILE directly into require or load. But if you force the encoding then it succeed.

            I consider it a failure that you have to force the encoding. Because that means Ruby is messing up stuff. If the strings where marked with proper encoding it should have been able to correctly transpose the string.

            Another issue is that C Extensions doesn't load with Unicode paths. 😞
            I found a workaround where I was able to call the Win32 API to convert the folder part of the path into short format (DOS 8+3 style) - which indicate that for loading the SO files Ruby calls the ASCII version of the file functions.

            I'll come back with more detail as I've dug into this deeper. But for now I just wanted to give the heads up so you know what'd going on. We've suddenly started to get several reports of this now.

            1 Reply Last reply Reply Quote 0
            • D Offline
              Dan Rathbun
              last edited by 28 Mar 2014, 19:15

              @tt_su said:

              @aerilius said:

              I consider it a failure that you have to force the encoding. Because that means Ruby is messing up stuff. If the strings where marked with proper encoding it should have been able to correctly transpose the string.

              Well set the default bleepin' encodin' then... 😛

              Encoding_default_internal.png

              I'm not here much anymore.

              1 Reply Last reply Reply Quote 0
              • F Offline
                fredo6
                last edited by 28 Mar 2014, 21:20

                Thomthom,

                I use FILE in my plugins (actually mostly in the top rb files. I also use $LOAD_PATH in a few places.

                Although many users seem to have no problem (but how many do use a non-ascii username?), it is good to understand if the problem is critical and require an urgent fix.

                What do you think?

                Fredo

                1 Reply Last reply Reply Quote 0
                • A Offline
                  Aerilius
                  last edited by 28 Mar 2014, 22:00

                  When it fails, it would show something different (?). This is what I get:

                  
                  C;/Users/Administrator/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/encoding_test/hello.rb
                  > Hello World
                  require succeeded!
                  
                  > Hello World
                  load succeeded!
                  
                  Forcing UTF-8 encoding...
                  
                  require succeeded!
                  
                  

                  „Native Windows? Or Linux and Wine?“
                  If you had a Linux version of SU (nice!), if SU2014 ran in Wine (nice!), otherwise for all my bug reports I test on a native Windows in a virtual machine. Linux is the luxury and comfort of a Mac with the freedom to work on any hardware. But I have to accept what I can get from SketchUp. I wished SketchUp would become kindof usable on higher-dpi screens. All my other applications scale perfectly. I need patience, at least I'm still with SketchUp after years because I'm addicted.

                  1 Reply Last reply Reply Quote 0
                  • D Offline
                    Dan Rathbun
                    last edited by 29 Mar 2014, 01:35

                    Ruby does not support Windows-1258 encoding. It is a known bug.

                    @unknownuser said:

                    SketchUcation Tools 2.5](http://sketchucation.com/forums/viewtopic.php?f )":3vtmbnty]
                    @kienhp said:

                    Not run in skechUp 2014, why?

                    Because Vietnamese encoding is not supported by Ruby yet.

                    Bug # 7742 : System encoding (Windows-1258) is not recognized by Ruby to convert back to UTF-8

                    One user changed his Regional and Langauge settings to US English to overcome this:
                    StackOverflow : Error installing Rubygems on ruby command prompt in Win7

                    🤓

                    I'm not here much anymore.

                    1 Reply Last reply Reply Quote 0
                    • T Offline
                      tt_su
                      last edited by 29 Mar 2014, 10:30

                      @aerilius said:

                      When it fails, it would show something different (?). This is what I get:

                      You are logged in as "Administrator" - the issue occur when the username contain non-English characters.

                      1 Reply Last reply Reply Quote 0
                      • T Offline
                        tt_su
                        last edited by 29 Mar 2014, 10:37

                        @fredo6 said:

                        I use FILE in my plugins (actually mostly in the top rb files. I also use $LOAD_PATH in a few places.

                        Although many users seem to have no problem (but how many do use a non-ascii username?), it is good to understand if the problem is critical and require an urgent fix.

                        What do you think?

                        We're still digging into this. The Ruby Unicode issues under Windows is a deep rabbit hole.
                        On one side there is the wrong encoding begin returned for strings that contain UTF-8 data.
                        On the other side there is Ruby C Extensions that doesn't load with non-English characters.

                        Now, there might be cases where paths with non-english characters will load, if the system code page fits. Though we've not got around to test this yet. But on my machine, which is an English system with Windows-1252 code page - ASCII calls to file functions fail to load non-English characters.

                        Now, other users, say Japanese users, might have a Japenese code page configured for their machine and they might experience that having a Japanese username works for them - even though it fails for me.

                        Encoding under Windows is a jumble. Under OSX it's all fine because it all UTF-8 by default - even the file functions.

                        As I mentioned, we're still digging into this and we'll come back with more info. I wouldn't necessarily start updating scripts right away. We don't know what's the best recommendation yet. But just so you are aware there is a known issue that's being investigated - and it can crop up differently from machine to machine.

                        1 Reply Last reply Reply Quote 0
                        • T Offline
                          tt_su
                          last edited by 29 Mar 2014, 10:43

                          @driven said:

                          file = __FILE__
                          > file_encoding = file.encoding.name
                          > puts file_encoding.to_s
                          > if not file_encoding.valid_encoding?
                          > puts "bad encoding, fixing"
                          > data = File.open(file, (file_encoding + ';utf-8')).read
                          > else
                          > puts "nothing to fix???"
                          > data = File.open(file).read
                          > end
                          

                          I don't quite understand what your code snippet is doing here. It seems like you're mixing up the encoding of the filename string with the encoding of the File object.

                          And this: file_encoding + ':utf-8' - what is that doing? Appending UTF-8 to another encoding declaration?

                          @driven said:

                          so whats going on???

                          I don't know. 😕 Can you explain what you are testing with that snippet?

                          1 Reply Last reply Reply Quote 0
                          • D Offline
                            driven
                            last edited by 29 Mar 2014, 11:54

                            I have a test file encoded in Vietnamesse(Windows) with CRLF
                            using unix I can see ...

                             
                            > %x(file "path../dev/file•enc•test.rb")
                            path../dev/file•enc•test.rb; ASCII text, with CRLF line terminators
                            

                            In SU...

                            load("path../dev/file•enc•test.rb")
                            UTF-8
                            nothing to fix???
                            true
                            

                            EDIT.... added missing notation [ 'r:' ]

                            file = __FILE__
                            file_encoding = file.encoding.name
                            puts file_encoding.to_s
                            if not file_encoding.valid_encoding?
                            puts "bad encoding, fixing"
                            # specify both external and internal encodings
                            data = File.open(file, ('r;' + file_encoding + ';utf-8')).read
                            else
                            puts "nothing to fix???"
                            data = File.open(file).read
                            end
                            

                            so whats going on???

                            john

                            learn from the mistakes of others, you may not live long enough to make them all yourself...

                            1 Reply Last reply Reply Quote 0
                            • D Offline
                              driven
                              last edited by 29 Mar 2014, 11:59

                              @tt_su said:

                              I don't know. :? Can you explain what you are testing with that snippet?

                              It was missing a critical piece of notation... [r:] I modified the snippet, will retest later...
                              I was trying to set both external and internal encoding before the read...
                              based on this
                              http://stackoverflow.com/questions/8610100/changing-character-encoding

                              john

                              learn from the mistakes of others, you may not live long enough to make them all yourself...

                              1 Reply Last reply Reply Quote 0
                              • D Offline
                                Dan Rathbun
                                last edited by 29 Mar 2014, 19:10

                                @tt_su said:

                                And this: file_encoding + ':utf-8' - what is that doing? Appending UTF-8 to another encoding declaration?

                                NO. John is building a Ruby v2+ IO Mode String.

                                Many of the IO and subclasses (like File,) use it in methods, which transparently call IO::new within themselves. (Examples, are IO.read, File.read, File.open, etc.)

                                So if you read the doc for IO.new, you'll see that wherever you previously used just the filemode ("r" "w+" "rb" etc.,) that argument can now have up to 3 sections:

                                " *filemode* : *external_encoding* : *internal_encoding* "

                                The online doc shows this example (among others.):

                                open("transcoded.txt", "r;ISO-8859-1;UTF-8") do |io|
                                  puts "transcoded text;"
                                  p io.read
                                end
                                

                                So the new mode string above is: "r:ISO-8859-1:UTF-8"

                                Also for better readability, many of these methods that take that 1|2|3 part string (separated by colons,) can instead, take hash arguments, like this:

                                open(
                                  "transcoded.txt", 
                                  ;mode => "r",
                                  ;external_encoding => "ISO-8859-1",
                                  ;internal_encoding => "UTF-8"
                                ) do |io|
                                  puts "transcoded text;"
                                  p io.read
                                end
                                

                                In some cases you can use an or'ed encoding string, like: "BOM|UTF-8"

                                But really ya'll should read the extensive info now listed under IO::new() AND the extensive information at the top of the Encoding class page.

                                💭

                                I'm not here much anymore.

                                1 Reply Last reply Reply Quote 0
                                • F Offline
                                  fredo6
                                  last edited by 29 Mar 2014, 22:04

                                  @tt_su said:

                                  As I mentioned, we're still digging into this and we'll come back with more info. I wouldn't necessarily start updating scripts right away. We don't know what's the best recommendation yet. But just so you are aware there is a known issue that's being investigated - and it can crop up differently from machine to machine.

                                  OK. I'll wait.

                                  But can you confirm that your suggested fix is safe anyway.

                                  Fredo

                                  1 Reply Last reply Reply Quote 0
                                  • T Offline
                                    tt_su
                                    last edited by 31 Mar 2014, 09:01

                                    @fredo6 said:

                                    But can you confirm that your suggested fix is safe anyway.

                                    That's why I ask that people to hold. We need to perform some testing.

                                    1 Reply Last reply Reply Quote 0
                                    • D Offline
                                      driven
                                      last edited by 31 Mar 2014, 15:03

                                      I started a new encoding bug topic incase it's unrelated...
                                      http://sketchucation.com/forums/viewtopic.php?f=180%26amp;t=57074#p518534
                                      can you all have a look?
                                      john

                                      learn from the mistakes of others, you may not live long enough to make them all yourself...

                                      1 Reply Last reply Reply Quote 0
                                      • 1 / 1
                                      1 / 1
                                      • First post
                                        5/18
                                        Last post
                                      Buy SketchPlus
                                      Buy SUbD
                                      Buy WrapR
                                      Buy eBook
                                      Buy Modelur
                                      Buy Vertex Tools
                                      Buy SketchCuisine
                                      Buy FormFonts

                                      Advertisement