[Code] PCFileTools
-
Took a quick look.
It's a good start.
First thoughts:
- The methods should act the same as the native Ruby methods, so that they can be a replacement (via aliasing within an Author's namespace.)
- They need to be wrapped within some library namespace. (
TIG::Lib
,SKX::WIN
, etc.) - The container's need to be classes, not modules.
- They need to be subclasses of the Ruby baseclasses, so that they inherit all the methods that you will not be overriding.
So, as an example, let's say you put them in
SKX::WIN
They would need to be declared:
class SKX;;WIN;;Dir < ;;Dir ### the overriden methods end#class class SKX;;WIN;;File < ;;File ### the overriden methods end#class
Within an author's namespace, he could use a local constant
File
, that points at your subclasses:module Author class FancyTool File = SKX;;WIN;;File # now he can use it as he's always done. # he can always refer to the base class, # by using the toplevel scope prefix, thus; ;;File.basename(pathstring) # or if he's done using the PC specific class, # he can make the local constant to point at # the base class again, by reassignment; File = ;;File # # or by removing the local constant, thus; remove_const(;File) end#class end#module
-
Here's a better conditional load wrapper:
$LOAD_MSGS = '' unless defined($LOAD_MSGS) unless RUBY_PLATFORM =~ /(mswin|mingw)/ msg = "Error; #{File.basename(__FILE__)}; loads PC only subclasses!\n" msg<< "This platform's #{RUBY_PLATFORM} is not recognized." puts(msg) $LOAD_MSGS << msg else # running on Windows begin require('win32ole.so') rescue LoadError => e msg = "Error; #<LoadError; in #{File.basename(__FILE__)}.>\n" msg<< "The shared library file; 'win32ole.so' could not " msg<< "be found using the $LOAD_PATH search array.\n" msg<< "Please install the 'win32ole.so' where it can be found.\n" puts(msg) $LOAD_MSGS << msg else # The 'win32ole.so' file WAS loaded... # ### define subclasses HERE # unless file_loaded?(File.basename(__FILE__)) file_loaded(File.basename(__FILE__)) end end# win32ole required # end#unless on PC
The
### define subclasses HERE
line can actually load the definition file (to save indents):
require('skx/win/file_dir_defs.rb')
and the file can have aexit
clause at the top (just in case):exit(2) unless defined?(WIN32OLE) module SKX; end module SKX;;Win; end class SKX;;WIN;;Dir < ;;Dir ### method defs ... end#class class SKX;;WIN;;File < ;;File ### method defs ... end#class
-
Dan
I'm sure that the 'use' of the core of the code can be made better - although I'd be wary of overwriting base class ?
A lot of this is 'above my pay grade'...I've kept everything separated for testing - so it still works provided you substitute
PCFile.exist?(filepath)
forFile.exist?(filepath)
and so on...What I'm asking initially is for developers to test these new/replacement
PCFile
methods and see if they are buggy and/or work as hoped.The biggest bind with 1.8~ Ruby
File
on PCs is that many operations likeFile.exist?()
can returnfalse
when you know the file really does exist, these 'failures' are caused by UTF-8 characters in the filepath string [typically these are 'accented letters' in FR/DE/ES/PT/NO etc, but results from Chinese users etc would be great too...].I've tried to write simple code using
pack
/unpack
and equivalentWin32OLE
methods that return 'true
' correctly when the filepath exists and it contains such characters...There are many other failures with
File
that I hope I have now trapped with these equivalentPCFile
methods. I've had to make slightly different read/write code to get around someWin32OLE
vagaries etc... and I can't see currently how to do the equivalent of 'binmode
' etc. Any suggestions/additions gratefully received...At this stage I really just want feedback on the efficacy of these methods...
We can then debate later how the finalized 'fixed' methods are shoehorned into Ruby... -
It's UTF-8 (not UFT-8, BTW.)
-
WTF... Doh! my fingers often type in the wrong order
-
The main point, I'm trying to get across.. is future implementation.
Ie, how scripters will wish to use unicode extensions toFile
,Dir
andString
.The goal is to write cross-platform plugins.
Ex:
MAC =( RUBY_PLATFORM =~ /(darwin)/ ? true ; false ) unless defined?(MAC) WIN =( not MAC ) unless defined?(WIN) module Author class FancyTool if WIN require('skx/win/File') File = SKX;;WIN;;File end # Now he can use File as he's always done. # # On Mac, there is no local constant "File", and the # call is evaluated to the object, that is referenced, # by the toplevel constant "File". # # But on PC, within this namespace ONLY, the call # is evaluated, to the object pointed to, by the # local (constant) reference "File", or fully # qualified; "Author;;FancyTool;;File", # which does not touch ;;File (aka; Object;;File) end#class end#module
There is no way, scripters will want to changed all calls to methods of class
File
orDir
, into platform conditional statements, like:filename =( WIN ? PCFile.basename(pathstr) ; File.basename(pathstr) )
or (worse):
filename = if WIN PCFile.basename(pathstr) else File.basename(pathstr) end
-
@tig said:
A lot of this is 'above my pay grade'...
No doubt that goes for any single ONE rubyist. Something like this needs to be a group project.
Others have done some work in this area already. ThomThom on wide strings (which also need to be addressed, because many of the base classString
methods, will garble unicode strings.)
Dan Berger's "win32-api" toolkit plays a bit with creating aWideString
class. (I think he considers it beta.)
Also, we should not ignore the extended classPathname
(which is actually a wrapper class, not aString
subclass.)@tig said:
We can then debate later how the finalized 'fixed' methods are shoehorned into Ruby...
OK, I made my point, on this issue.
The only thing to add is to state the obvious. There are really only 2 alternatives.
My suggestion is the least invasive, with the least responsibility. (Authors are left to decide to use the extension on a namespace by namespace basis.)
The other option, to actually redefine any Ruby base classes that will use unicode strings, on the Windows platform(s).
This would mean taking responsibility for string functionality of ALL plugins (running on Windows.)
Not what (I think,) you would wish to do. Nor any project group. (Not at least without compensation, of some kind,.... and a lot of time to devote to the "cause.")
Even the Ruby Core guys are taking forever to implement unicode support.Enough said...
-
@tig said:
... although I'd be wary of overwriting base class ?
I didn't say anyone should. I said they should be subclasses of the base class. If they are, Ruby would not let anyone make the subclass overwrite the baseclass, because it would be a circular reference.
EDIT: I tested this at the console. Ruby does not check for circular references using the c-side
**=**
operator. Anyway, it does not actually create a circular reference. The reference to original baseclass becomes un-identified by any constant, but the object can still be got, via thesuperclass()
method.Making a LOCAL constant point at an object (in this case a Class definition,) does not overwrite anything.
Now obviously, if some stupid 'newb' types
File = SKX::WIN::File
in the Ruby Console, or in an unwrapped script, THAT global constant that references the base class definition object, is changed (and affects all scripts that are also using it.) So, in order to set it back [without restarting,] it might be a good idea to have a 'secret' reference to the base class definition, kept "out of sight."module Ruby Refs = {} # base classes Object.constants.sort.each {|ref| obj = Object.class_eval "#{ref}" Refs[ref]= obj if obj.class==(Class) } # base modules [Comparable,Enumerable,Errno,FileTest,GC, Kernel,Marshal,Math,ObjectSpace,Process].each {|obj| Refs[obj.name]= obj } Refs['TOPLEVEL_BINDING']= TOPLEVEL_BINDING Refs.freeze def self.reset_object_ref(ref) if ref.is_a?(String) || ref.is_a?(Symbol) ref = ref.to_s elsif ref.is_a?(Module) # includes Class ref = ref.name else return nil end if Ref.has_key?(ref) begin eval( "#{ref} = ObjectSpace._id2ref(#{Ref[ref].object_id})", TOPLEVEL_BINDING ) rescue return false else return true end end return false end end#module ###
EDIT: I added a reset method to the example, just for kicks.
@tig said:
I've kept everything separated for testing - so it still works provided you substitute
PCFile.exist?(filepath)
forFile.exist?(filepath)
and so on...You won't find many who are willing to go through their code and search and replace, "File." with "PCFile.", besides you haven't provided aliases for the methods you did not implement, which makes a simple editor search and replace, into a tedious manual edit session.
You might consider, overriding the
method_missing
callback, whilst you still have them as custom modules:def method_missing( sym, *args ) if File.respond_to?(sym) File.method(sym).call(*args) else raise(NoMethodError,"undefined method `#{sym.to_s}' for #{self.name};#{self.class.name}",caller) end end#def
and similar for PCDir:
def method_missing( sym, *args ) if Dir.respond_to?(sym) Dir.method(sym).call(*args) else raise(NoMethodError,"undefined method `#{sym.to_s}' for #{self.name};#{self.class.name}",caller) end end#def
-
@tig said:
... and I can't see currently how to do the equivalent of '
binmode
' etc. Any suggestions/additions gratefully received..Ya know.. it's weird that the Core guys made this switch, without a way to test later IF the stream was IN
binmode
or not.I checked the docs, and the Ruby 1.8.7 branch is still the same.
BUT... in the 1.9.x trunk, they have added a
binmode?()
boolean query method. (See the online docs forIO
class. They also added methodsbinread
andbinwrite
, special open methods.)
Of course, they have added a bunch of options, to read and write in several encodings. -
What effect do you imagine
WIN32OLE.codepage= WIN32OLE::CP_UTF8
would have overCP_ACP
(ANSI?/ASCII?), which seems to be the default. -
Ok TIG.. a general question:
most all of the methods are doing this:
arg = arg.unpack("U*").map{|c|c.chr}.join
which seems to convert a UTF-8 string (if it is one,) to an ANSI string, before passing it to Windows FSO methods that take and return Unicode strings ...
1) correct ??
2) and why ??
-
I found that if I didn't do that change to the string then any tests of UTF-8 strings, like
PCFile.exist?(path)
do not work properly and return 'false
' when it should be 'true
'... just like theFile.exist?(path)
version; BUT making that change to the string before testing it seems to return correct results - consistently 'true
' when it should be 'true
' and 'false
' when it should be 'false
'. For a simple ANSI character string it works fine either way [the unpack/join has no affect], but if you test with a UTF-8 string with accented characters [perhaps obtained from aUI.openpanel()
], that is unpack/joined etc then you can see the difference between what theFile..
andPCFile..
versions return...
There's probably a more elegant way to do this... BUT it seems to work the way I've bodged it together, so now perhaps we can think of better ways of achieving the same difference... -
That seems to indicate that the FSO methods are doing ANSI comparisons (perhaps by default.)
-
So, does
PCFile.exist?()
return true for a file with, for example, Japanese characters? -
Would the Japanese Kanji chars be in the UTF-16 set?
Ya know we are all back to the ol'
String
encoding problem, really.I thot about using Dan Berger's
String
subclass(es)WideString
or whatever he called them, but it seem like it would be combersome. Unless they converted themselves automatically similar to howNumeric
s usecoerce()
.
Currently the interpreter always makes ANSI strings from**" "**
and**' '**
literals. (and their**%**
dilimeter equivs.)I wonder if possible to create a
%u
function that creates UTF8 strings. And maybe a%U
that creates UTF16 ?
(Are these defined inKernel
, or are they C-side interpreter functions?(Just throwing issues in the air, "musing out load.")
-
The underlying problem in Ruby 1.8 under windows is that it calls the A version of the file functions instead of the W versions. If a function is called FileFunction is used in C/C++ - when compiled it will translate to FileFunctionA or FileFunctionW depending on whether UNICODE is defined.
I was thinking that a C Extension that would forcefully call the FileFunctionW variants would be sure to work as it would be the system doing all the work. Meddling with the string in Ruby is quite likely to cause data to be lost or corrupted. -
I only works for UTF-8 [i.e. 'European' accented-characters etc] - the more complex Chinese/Japanese return false when it should be true
However, if we have a way of resolving one hopefully the other will follow... -
http://www.danielstutzman.com/2011/04/how-to-write-unicode-filenames-in-ruby-1-8-6 uses Win32API and iconv ...
-
BTW.. if interested:
This is the Extended lib module
FileUtils
from Ruby v1.8.6-p287
module FileUtils (Ruby v1.8.6-p287) -
from the old Pick-axe Book:
@unknownuser said:
Strings are stored as sequences of 8-bit bytes,[For use in Japan, the jcode library supports a set of operations of strings written with EUC, SJIS, or UTF-8 encoding. The underlying string, however, is still accessed as a series of bytes.] and each byte may contain any of the 256 8-bit values, including null and newline. The substitution mechanisms in Table 18.2* on page 203 allow nonprinting characters to be inserted conveniently and portably.
- refers to the table of ** codes
So it seems that (in my mind,) since Sketchup sets
$KCODE
to UTF8 when it loads the interpreter, we may not actually have as much of a problem on the Rubyside as I thought.So we have a choice...
1) A pure-Ruby patch, that either accesses the system calls (for File functions,) via WIN32OLE or WIN32API (the so libraries.)
2) A compiled C patch, ie: "Cut out" the c code files that define classes
IO
,Dir
andFile
(perhaps alsoFileTest
,) and recompile with either UNICODE #defined, or change the C function calls explicitly to the wide versions. These would be ".so" files, and they would redefine the old methods. (What happens on the C-side when you re-define a C function that has already been defined? Do the C functions that the new Ruby wrappers call, need to be renamed as well?)
Advertisement