Ruby 2.0 __FILE__ contains incorrect encoding.
-
Just a heads up on the usage of FILE
Given a username "ZĆ©" where the user folder is
C:/Users/ZĆ©/
Using load or require based on any string from FILE with cause an error similar to this:
Error: #<LoadError: No such file or directory -- C:/Users/ZĆĀ©/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/sketchup-stl/SKUI/embed_skui.rb> C:/Users/ZĆ©/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/sketchup-stl/exporter.rb:14:in
load'`(Full error: https://github.com/SketchUp/sketchup-stl/issues/134)
The reason is that FILE returns a string with incorrect encoding, in my case: #Encoding:Windows-1252
The string does contain the bytes of an UTF-8 string however, so you cannot convert it - that's why you see the odd characters there, because Ruby had tried to convert the string from Windows-1252 to UTF-8, which is wrong.Instead you have to force the encoding to be UTF-8.
current_path = File.dirname(__FILE__) if current_path.respond_to?(;force_encoding) current_path.force_encoding("UTF-8") end
Attaching example to reproduce and work aroun.
-
This issue may be Windows only or maybe only certain system configurations. The FILE I get on my Mac is already UTF-8.
Steve
-
I don't understand why such encodings are not yet extinct. For me everything is utf8, well except maybe things connected with Windows.
I just tested on a Windows7 (6.1.7600 Ultimate, 32bit, locale: en-US, VirtualBox), and the encoding test succeeded perfectly in all cases.
Could we narrow down under what circumstances it fails? -
@slbaumgartner said:
This issue may be Windows only or maybe only certain system configurations. The FILE I get on my Mac is already UTF-8.
Indeed, it's a Windows issue. It turns out that Ruby isn't declaring itself as a Unicode application despite it's claimed support. And it's not always calling Unicode version of the Windows file functions - it's calling legacy ASCII versions in some scenarios.
We're working on mapping out where and when this happens.
@aerilius said:
I don't understand why such encodings are not yet extinct. For me everything is utf8, well except maybe things connected with Windows.
I just tested on a Windows7 (6.1.7600 Ultimate, 32bit, locale: en-US, VirtualBox), and the encoding test succeeded perfectly in all cases.
Could we narrow down under what circumstances it fails?Native Windows? Or Linux and Wine?
What tests did you do?I've been working on an updated version of our diagnostics tool to include the encoding data for the environment data.
So far I see FILE, ENV, $LOAD_PATH and $LOADED_FEATURES yield inconsistent and some times incorrect encoding labels.
-
@aerilius said:
I just tested on a Windows7 (6.1.7600 Ultimate, 32bit, locale: en-US, VirtualBox), and the encoding test succeeded perfectly in all cases.
Succeeded in what sense? did it succeed in all examples?
In my testing it fails when you feed strings from FILE directly into require or load. But if you force the encoding then it succeed.I consider it a failure that you have to force the encoding. Because that means Ruby is messing up stuff. If the strings where marked with proper encoding it should have been able to correctly transpose the string.
Another issue is that C Extensions doesn't load with Unicode paths.
I found a workaround where I was able to call the Win32 API to convert the folder part of the path into short format (DOS 8+3 style) - which indicate that for loading the SO files Ruby calls the ASCII version of the file functions.I'll come back with more detail as I've dug into this deeper. But for now I just wanted to give the heads up so you know what'd going on. We've suddenly started to get several reports of this now.
-
-
Thomthom,
I use FILE in my plugins (actually mostly in the top rb files. I also use $LOAD_PATH in a few places.
Although many users seem to have no problem (but how many do use a non-ascii username?), it is good to understand if the problem is critical and require an urgent fix.
What do you think?
Fredo
-
When it fails, it would show something different (?). This is what I get:
C;/Users/Administrator/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/encoding_test/hello.rb > Hello World require succeeded! > Hello World load succeeded! Forcing UTF-8 encoding... require succeeded!
āNative Windows? Or Linux and Wine?ā
If you had a Linux version of SU (nice!), if SU2014 ran in Wine (nice!), otherwise for all my bug reports I test on a native Windows in a virtual machine. Linux is the luxury and comfort of a Mac with the freedom to work on any hardware. But I have to accept what I can get from SketchUp. I wished SketchUp would become kindof usable on higher-dpi screens. All my other applications scale perfectly. I need patience, at least I'm still with SketchUp after years because I'm addicted. -
Ruby does not support Windows-1258 encoding. It is a known bug.
@unknownuser said:
SketchUcation Tools 2.5](http://sketchucation.com/forums/viewtopic.php?f)":3vtmbnty]
@kienhp said:Not run in skechUp 2014, why?
Because Vietnamese encoding is not supported by Ruby yet.
Bug # 7742 : System encoding (Windows-1258) is not recognized by Ruby to convert back to UTF-8
One user changed his Regional and Langauge settings to US English to overcome this:
StackOverflow : Error installing Rubygems on ruby command prompt in Win7 -
@aerilius said:
When it fails, it would show something different (?). This is what I get:
You are logged in as "Administrator" - the issue occur when the username contain non-English characters.
-
@fredo6 said:
I use FILE in my plugins (actually mostly in the top rb files. I also use $LOAD_PATH in a few places.
Although many users seem to have no problem (but how many do use a non-ascii username?), it is good to understand if the problem is critical and require an urgent fix.
What do you think?
We're still digging into this. The Ruby Unicode issues under Windows is a deep rabbit hole.
On one side there is the wrong encoding begin returned for strings that contain UTF-8 data.
On the other side there is Ruby C Extensions that doesn't load with non-English characters.Now, there might be cases where paths with non-english characters will load, if the system code page fits. Though we've not got around to test this yet. But on my machine, which is an English system with Windows-1252 code page - ASCII calls to file functions fail to load non-English characters.
Now, other users, say Japanese users, might have a Japenese code page configured for their machine and they might experience that having a Japanese username works for them - even though it fails for me.
Encoding under Windows is a jumble. Under OSX it's all fine because it all UTF-8 by default - even the file functions.
As I mentioned, we're still digging into this and we'll come back with more info. I wouldn't necessarily start updating scripts right away. We don't know what's the best recommendation yet. But just so you are aware there is a known issue that's being investigated - and it can crop up differently from machine to machine.
-
@driven said:
file = __FILE__ > file_encoding = file.encoding.name > puts file_encoding.to_s > if not file_encoding.valid_encoding? > puts "bad encoding, fixing" > data = File.open(file, (file_encoding + ';utf-8')).read > else > puts "nothing to fix???" > data = File.open(file).read > end
I don't quite understand what your code snippet is doing here. It seems like you're mixing up the encoding of the filename string with the encoding of the File object.
And this:
file_encoding + ':utf-8'
- what is that doing? Appending UTF-8 to another encoding declaration?@driven said:
so whats going on???
I don't know. Can you explain what you are testing with that snippet?
-
I have a test file encoded in Vietnamesse(Windows) with CRLF
using unix I can see ...> %x(file "path../dev/fileā¢encā¢test.rb") path../dev/fileā¢encā¢test.rb; ASCII text, with CRLF line terminators
In SU...
load("path../dev/fileā¢encā¢test.rb") UTF-8 nothing to fix??? true
EDIT.... added missing notation [ 'r:' ]
file = __FILE__ file_encoding = file.encoding.name puts file_encoding.to_s if not file_encoding.valid_encoding? puts "bad encoding, fixing" # specify both external and internal encodings data = File.open(file, ('r;' + file_encoding + ';utf-8')).read else puts "nothing to fix???" data = File.open(file).read end
so whats going on???
john
-
@tt_su said:
I don't know. :? Can you explain what you are testing with that snippet?
It was missing a critical piece of notation... [r:] I modified the snippet, will retest later...
I was trying to set both external and internal encoding before the read...
based on this
http://stackoverflow.com/questions/8610100/changing-character-encodingjohn
-
@tt_su said:
And this:
file_encoding + ':utf-8'
- what is that doing? Appending UTF-8 to another encoding declaration?NO. John is building a Ruby v2+ IO Mode String.
Many of the
IO
and subclasses (likeFile
,) use it in methods, which transparently callIO::new
within themselves. (Examples, areIO.read
,File.read
,File.open
, etc.)So if you read the doc for
IO.new
, you'll see that wherever you previously used just the filemode ("r" "w+" "rb" etc.,) that argument can now have up to 3 sections:" *filemode* : *external_encoding* : *internal_encoding* "
The online doc shows this example (among others.):
open("transcoded.txt", "r;ISO-8859-1;UTF-8") do |io| puts "transcoded text;" p io.read end
So the new mode string above is:
"r:ISO-8859-1:UTF-8"
Also for better readability, many of these methods that take that 1|2|3 part string (separated by colons,) can instead, take hash arguments, like this:
open( "transcoded.txt", ;mode => "r", ;external_encoding => "ISO-8859-1", ;internal_encoding => "UTF-8" ) do |io| puts "transcoded text;" p io.read end
In some cases you can use an or'ed encoding string, like:
"BOM|UTF-8"
But really ya'll should read the extensive info now listed under
IO::new()
AND the extensive information at the top of theEncoding
class page. -
@tt_su said:
As I mentioned, we're still digging into this and we'll come back with more info. I wouldn't necessarily start updating scripts right away. We don't know what's the best recommendation yet. But just so you are aware there is a known issue that's being investigated - and it can crop up differently from machine to machine.
OK. I'll wait.
But can you confirm that your suggested fix is safe anyway.
Fredo
-
@fredo6 said:
But can you confirm that your suggested fix is safe anyway.
That's why I ask that people to hold. We need to perform some testing.
-
I started a new encoding bug topic incase it's unrelated...
http://sketchucation.com/forums/viewtopic.php?f=180%26amp;t=57074#p518534
can you all have a look?
john
Advertisement