Encode in 2014 again.
-
My function reads the string from csv file with utf-8 without BOM and all rb file, html files are utf-8 too.
Now, i using "File.open pathAttrFile, "r:utf-8"" to read the string file csv file having simple chinese encode content, my problem is that the function can not work when using "split" method directly for the read utf-8 string. "fileline.force_encoding("ISO-8859-1").split "," " should be used and it works. I do not know the reason, can anyone explain it? thanks.
fileAttr = File.open pathAttrFile, "r;utf-8" arrayFileLines = fileAttr.readlines arrayFileLines.each do |fileline| attrs = fileline.force_encoding("ISO-8859-1").split "," .....
And if i need to pass the iso8859-1 string to html, the string should be encoded to utf-8 using .encode("UTF-8") if defined?(Encoding).
These convert is not very convenient, does anyone have simple method to address the issue?
thanks
-
Try setting
Encoding::default_internal= "UTF-8"
It has not been set (it'snil
on startup,) because it makes no difference on PC, which is a Ruby Core bug.
I still think it should be set just as external is:
Encoding::default_external %(#004000)[#<Encoding:UTF-8>]
Also make sure you have a meta tag in the head of your HTML file that sets encoding ("charset=".)
-
thanks for your reply. Encoding::default_internal= "UTF-8" and Encoding::default_external= "UTF-8" are set in my entry main.rb and it works. the simple chinese character can be put correctly. However, when the split method can not work as expected. Following is the log and what is the problem with split method? thanks
def getUsersInfoFromFile(fileName, path) arrayAttributes = Array.new pathAttrFile = Sketchup.find_support_file fileName, path if (pathAttrFile) fileAttr = File.open pathAttrFile,"r;utf-8" arrayFileLines = fileAttr.readlines arrayFileLines.each do|fileline| arrayFileLine = fileline.split "," arrayAttributes << arrayFileLine end fileAttr.close end return arrayAttributes end
also, when arrayFileLines = fileAttr.readlines is changed to arrayFileLines = fileAttr.encode("UTF-8").readlines, the problem remains and split error can be fixed when fileAttr.encode("ISO-8859-1").readlinesis used. can you explain? thanks again.
Error:
#<ArgumentError: invalid byte sequence in UTF-8>
C:/Users/tc/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/CRSD/RequireFiles/UserManager/UserManager.rb:130:insplit' C:/Users/tc/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/CRSD/RequireFiles/UserManager/UserManager.rb:130:in
readLines'
C:/Users/tc/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/CRSD/RequireFiles/UserManager/UserManager.rb:138:ingetUsersInfoFromFile' C:/Users/tc/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/CRSD/RequireFiles/UserManager/UserManager.rb:27:in
checkLogin'
C:/Users/tc/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/CRSD/RequireFiles/UserManager/UserManager.rb:97:inblock in login' C:/Users/tc/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/CRSD/RequireFiles/UserManager/UserManager.rb:106:in
call'
C:/Users/tc/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/CRSD/RequireFiles/UserManager/UserManager.rb:106:inshow_modal' C:/Users/tc/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/CRSD/RequireFiles/UserManager/UserManager.rb:106:in
login'
C:/Users/tc/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/CRSD/RequireFiles/Main.rb:7:in<top (required)>' C:/remove/remove_SketchUp/Tools/RubyStdLib/rubygems/core_ext/kernel_require.rb:45:in
require'
C:/remove/remove_SketchUp/Tools/RubyStdLib/rubygems/core_ext/kernel_require.rb:45:inrequire' C:/Users/tc/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/CRSD.rb:1:in
<top (required)>' -
i find the solution but can not know the reason.
fileline.force_encoding("ISO-8859-1").encode("utf-8", replace: nil)
can anyone explain it? And does anyone have better solution?
thanks.
-
Can you share your file? The error indicates you are opening a file in UTF-8 mode without the file actually being UTF-8. Are you sure it's UTF-8 encoded?
-
-
It's not the RB that is wrongly encoded - that is properly encoded as UTF8-without-BOM anyway.
BUT related files like the CSV [and HTML?] are also best when similarly encoded - but they are not.By changing the v2014 read in string back and forth it makes it acceptable.
Note that the encoding change will break earlier SketchUp version users so you need to check if it's defined before using it.
Also writing/reading a file to the main folders could have issues with permissions with earlier versions <v2014 AND break if the user has a custom folder setup -
BUT related files like the CSV [and HTML?] are also best when similarly encoded - but they are not.
I have used the notepad++ to change the files encoding to UTF-8 without BOM.
"also best when similarly encoded - but they are not" how can i to do this similarly encoded?
-
Sorry, you misunderstand me.
'Similarly encoded' simply means encoded as UTF8-without-BOM - which you say you have now done ? -
I have set all files to UTF-8 without BOM.
but the error remain:
Error: #<ArgumentError: invalid byte sequence in UTF-8>
C:/Users/tc/AppData/Roaming/SketchUp/SketchUp 2014/SketchUp/Plugins/CRSD/RequireFiles/UserManager/UserManager.rb:24:in `split'Can anyone have it? thanks.
-
Try this version...
I have marked changes with a final ###
Also try replacing the Chinese with some Western text in case of issues there ?
-
Thanks for all. The problem is done.
The solution is to use ultraedit convert function(convert ascii to utf-8(unicode edition)). Originally, i use the notepad++ to convert utf-8 without BOM, and it seems not work as expected.
Now, the split function can work correctly without any force_encoding, but the input from html should be converted using force_encoding("UTF-8").
Thanks again for TIG.
Advertisement