[Code] UnicodeEx - (0.2.0a) Sketchup + Character Encoding
-
@dan rathbun said:
Well... the book says "$-K Sets the multibyte coding system for strings and regular expressions."
hmm... when does Ruby 1.8 ever treat strings as multibyte? From all my testing I found it to always treat strings as sets of single bytes. Though, please enlighten me if I'm incorrect - as that would be very interesting.
For treating strings I've been using
pack('U*)
andunpack('U*)
- then using the source code for the original String methods to recreate them in Unicode. -
@thomthom said:
Example, for the
Kernel32
functionFindFirstFile
I must callFindFirstFileW
directly, because trying to callFindFirstFile
will useFindFirstFileA
. At least in SU7.0. I have not tried this after 7.1.This is not something that is caused by SU or Ruby... this is a Windows 'thang'. (Unless Ruby is somehow screwin' it up...)
Sounds like you have an ANSI Windows version. Windows is 'supposed' to map the
FindFirstFile
call to either the ANSI version of the function (FindFirstFileA
) or to the Wide version (FindFirstFileW
) based on if the UNICODE flag is set at compile time.The MSDN website mentions 'extra' files are needed for Unicode support on Windows.
I thought (maybe I'm wrong,) that most foreign sold Windows versions were specially compiled as Unicode versions.
But like I said, I was having similar problems, seemed like it was the Wide versions that were being called for me, instead of the ANSI versions. This is strange...
-
@thomthom said:
hmm... when does Ruby 1.8 ever treat strings as multibyte? From all my testing I found it to always treat strings as sets of single bytes. Though, please enlighten me if I'm incorrect -
Looks like your right, in that respect (referencing your testing.)
Maybe there's a hidden single/multi-byte flag or switch [for strings] setting we don't know about...
-
@dan rathbun said:
This is not something that is caused by SU or Ruby... this is a Windows 'thang'. (Unless Ruby is somehow screwin' it up...)
Sounds like you have an ANSI Windows version. Windows is 'supposed' to map the FindFirstFile call to either the ANSI version of the function (FindFirstFileA) or to the Wide version (FindFirstFileW) based on if the UNICODE flag is set at compile time.
No. That is set per application. If I had an ANSI version of Windows I'd have some big problems with my other applications.
-
@dan rathbun said:
Maybe there's a hidden single/multi-byte flag or switch [for strings] setting we don't know about...
From all I read on this - multibyte support in String wasn't added until 1.9.
-
Some further reading from MS:
http://msdn.microsoft.com/en-us/library/dd374089%28VS.85%29.aspx - General Overviewhttp://msdn.microsoft.com/en-us/library/dd317766%28VS.85%29.aspx - Described the A vs W
http://msdn.microsoft.com/en-us/library/dd317748%28VS.85%29.aspx - Character sets in file names
-
@thomthom said:
@dan rathbun said:
This is not something that is caused by SU or Ruby... this is a Windows 'thang'. (Unless Ruby is somehow screwin' it up...)
Sounds like you have an ANSI Windows version. Windows is 'supposed' to map the FindFirstFile call to either the ANSI version of the function (FindFirstFileA) or to the Wide version (FindFirstFileW) based on if the UNICODE flag is set at compile time.
No. That is set per application. If I had an ANSI version of Windows I'd have some big problems with my other applications.
To elaborate:
http://en.wikibooks.org/wiki/Windows_Programming/Unicode#Windows_ImplementationApplications need to define UNICODE
#define UNICODE
before including the windows headers - where the compiler then decides to map the API call to the A or W version.
-
@thomthom said:
@thomthom said:
@dan rathbun said:
This is not something that is caused by SU or Ruby... this is a Windows 'thang'. (Unless Ruby is somehow screwin' it up...)
Sounds like you have an ANSI Windows version. Windows is 'supposed' to map the FindFirstFile call to either the ANSI version of the function (FindFirstFileA) or to the Wide version (FindFirstFileW) based on if the UNICODE flag is set at compile time.
No. That is set per application. If I had an ANSI version of Windows I'd have some big problems with my other applications.
To elaborate:
http://en.wikibooks.org/wiki/Windows_Programming/Unicode#Windows_ImplementationApplications need to define UNICODE
#define UNICODE
before including the windows headers - where the compiler then decides to map the API call to the A or W version.
Thinking of it. It might not be Sketchup that doesn't define the UNICODE. That would be odd considering SU deals with UTF-8 internally and can open SU models with Unicode characters.
But I'm guessing it's the Ruby binaries that isn't compiled with that flag. -
What I am trying to achieve is make Ruby create this file:
file = File.new('C:\Półka\Test.xml',"w")
instead of stopping a script execution with an error:
Error: #<Errno::ENOENT: No such file or directory - C:\Półka\Test.xml>
For the time being IO operations on a file are not a problem for me. Is there way to convert 'C:\Półka\Test.xml' string into something that will be recognized by Windows?Thanks
Tomasz -
But that is an IO error. You're trying to create a new file with Unicode characters in the path.
You won't get around it by converting the string with the Unicode path to a different encoding - because the file is located under the folder named "Półka" and that's where you need to tell windows to look. Which means you need to give a Unicode string - which the ruby IO methods doesn't handle.
What you need is to call the Unicode APIs that creates a file. -
@thomthom said:
... Which means you need to give a Unicode string - which the ruby IO methods doesn't handle.
What you need is to call the Unicode APIs that creates a file.OK I agree with that.
It's the Fileand Dirclasses that STILLseem to have problems on Windows, even for Ruby ver 1.9.1
see this bug report
(I'd think the easiest solution would be to add a new parameter to many of the File and Dir class methods, ie "ANSI|UNICODE" for the mswin32 edition, that would give ruby coders a 'high-level' ruby way of forcing which API call to use, [ie: Ansi or Wide] without having to resort to direct API calls.)By the way several people have created unicode libraries (extensions) for string and character.. also iconvis mentioned.
An old (2005) unicode library, this may be obsolete
A list of extensions or gems at rubyforge for unicode and unidecode -
@thomthom said:
But that is an IO error. You're trying to create a new file with Unicode characters in the path.
Can a file be created through WIN32ole.so and returned as a Ruby variable and could all writing to the file go through that extension?
-
I have no experiences with .so files.
Advertisement