• Login
sketchucation logo sketchucation
  • Login
🤑 SketchPlus 1.3 | 44 Tools for $15 until June 20th Buy Now

Unicode, UTF8 and Ruby

Scheduled Pinned Locked Moved Developers' Forum
19 Posts 6 Posters 2.5k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D Offline
    Didier Bur
    last edited by 2 Mar 2008, 13:59

    Hi all,
    Anyone ever challenged to deal with unicode and UTF8 strings conversions ?
    This is driving me mad.
    We have in French some special letters like à,é,è,ç,ê, and so on, and many other languages have their own characters set as well

    When I try to retrieve the name of a material (material.display_name) in a script, I have to translate it first with a LanguageHandler object and the materials.strings local file.
    This returns a string, from the prepared $mat_strings I have built with LanguageHandler.
    Of course sometimes french materials names have special characters in them: béton, plâtre, etc.
    When the script sends such strings to an output file, these characters are NOT converted, for instance the string "Matière 1" is output as "Matière 1".
    I've searched through several ruby forums and it appears there is no easy method to get the correct translation. iconv library is a pain.
    Can anyone point me to the right direction, or has an idea ?

    DB

    1 Reply Last reply Reply Quote 0
    • G Offline
      Gaieus
      last edited by 2 Mar 2008, 14:19

      When I was translating Fredo's bezier spline rb, I was asked to use backslashes before these special characters (like \é for é). Or were they slashes?

      And I guess you know this - to make your scripts work with French menus etc. so I suppose this is not the problem...

      Gai...

      1 Reply Last reply Reply Quote 0
      • D Offline
        Didier Bur
        last edited by 2 Mar 2008, 14:31

        I was not aware of the backslash thing. So I'll try that first. Thanks Gaeius.
        Functions like: str=str.gsub(/(è)/, 'e') works also but is not universal and supresses the accentuation.

        DB

        1 Reply Last reply Reply Quote 0
        • D Offline
          Didier Bur
          last edited by 2 Mar 2008, 16:02

          MMmmmm, slash and backslash doesn't work either.
          And using special characters in the ruby code generate errors when loading 👿

          DB

          1 Reply Last reply Reply Quote 0
          • F Offline
            fredo6
            last edited by 2 Mar 2008, 18:43

            @didier bur said:

            Hi all,
            When I try to retrieve the name of a material (material.display_name) in a script, I have to translate it first with a LanguageHandler object and the materials.strings local file.
            This returns a string, from the prepared $mat_strings I have built with LanguageHandler.
            Of course sometimes french materials names have special characters in them: béton, plâtre, etc.
            When the script sends such strings to an output file, these characters are NOT converted, for instance the string "Matière 1" is output as "Matière 1".

            Didier,

            I am unclear of where are the French strings coming from in your exemple. Is it from a file? or from a constant definition?

            As Gaieus mentioned, in Ruby, it is careful to put a backslashbefore any character which is not straight ASCII, like many accentuated characters.
            So, to define a constant:
            Text = "b\éton"
            and not
            Text = "béton"
            Otherwise you may get an error when loading the script (but not always)

            This also works from and to the Ruby Console
            Now, I don't know what happens when reading and writing from files, as I never tried.
            Could you attach your files so that I try

            Thanks

            Fredo

            PS: The only things I noticed concerns the dialog boxes, where you have a different encoding and decoding of the accentuated characters, which makes the == comparison fail. This seems to be due to the fact that Sketchup uses Windows SDK dialog boxes, which have a different encoding method.

            1 Reply Last reply Reply Quote 0
            • D Offline
              Didier Bur
              last edited by 2 Mar 2008, 18:59

              Bonjour Fredo,
              Le problème ne vient pas d'un fichier. J'ai des noms de matériaux à récupérer dans un modèle SketchUp, pour les re-exporter vers une feuille Excel. Quand tu récupère le nom d'un matériau pour une face f, f.material renvoie par exemple une chaîne s "béton". Quand tu écris cette chaîne dans le fichier Excel, par exemple fichier.puts(s) tu n'obtiens pas "béton", mais "béton", parce que les caractères accentués sont codés sur 2 octets au lieu d'un. Et Ruby n'a pas de méthode pour convertir de l'UTF8 en Unicode.
              Je suis obligé de faire une fonction comme celle-là:

              def ocr_change_name(str)
                # replace non-digit non-letter with empty string
                str = str.gsub(/([ -#;'"$£=()|{}&+<>,;@-])/, '')
                #replace french characters
                str=str.gsub(/(Ã )/, 'a')
                str=str.gsub(/(â)/, 'a')
                str=str.gsub(/(é)/, 'e')
                str=str.gsub(/(è)/, 'e')
                str=str.gsub(/(ê)/, 'e')
                str=str.gsub(/(ë)/, 'e')
                str=str.gsub(/(î)/, 'i')
                str=str.gsub(/(ï)/, 'i')
                str=str.gsub(/(ô)/, 'o')
                str=str.gsub(/(ù)/, 'u')
                str=str.gsub(/(ç)/, 'c')
                end
              
              

              Mais c'est valable juste pour le français, pas pour les autres langues. Galère...

              DB

              1 Reply Last reply Reply Quote 0
              • T Offline
                todd burch
                last edited by 2 Mar 2008, 21:14

                UTF8 doesn't work with the SU Ruby API. I figured out this sad bit of news when I wrote the 3DTextTool.

                UTF8 works in Ruby just fine.

                Google knows. They've known for the past several maintenance updates.

                Todd

                1 Reply Last reply Reply Quote 0
                • TIGT Offline
                  TIG Moderator
                  last edited by 2 Mar 2008, 22:53

                  I came upon this somewhere... Don't know if it has any ideas that help ?


                  US-ASCII.rb

                  TIG

                  1 Reply Last reply Reply Quote 0
                  • F Offline
                    fredo6
                    last edited by 3 Mar 2008, 20:26

                    @didier bur said:

                    Bonjour Fredo,
                    Le problème ne vient pas d'un fichier. J'ai des noms de matériaux à récupérer dans un modèle SketchUp, pour les re-exporter vers une feuille Excel. Quand tu récupère le nom d'un matériau pour une face f, f.material renvoie par exemple une chaîne s "béton". Quand tu écris cette chaîne dans le fichier Excel, par exemple fichier.puts(s) tu n'obtiens pas "béton", mais "béton", parce que les caractères accentués sont codés sur 2 octets au lieu d'un. Et Ruby n'a pas de méthode pour convertir de l'UTF8 en Unicode.

                    Then, with the explanation from Todd, I understand why I had problem with the dialog boxes, as Windows does support UTF8.

                    1 Reply Last reply Reply Quote 0
                    • D Offline
                      Didier Bur
                      last edited by 3 Mar 2008, 21:21

                      @unknownuser said:

                      Don't know if it has any ideas that help

                      TIG, it seems the "register" method is missing. Apparently not a standard method...

                      DB

                      1 Reply Last reply Reply Quote 0
                      • TIGT Offline
                        TIG Moderator
                        last edited by 3 Mar 2008, 22:09

                        But couldn't we (you!) use the pack / unpack tricks to convert between the two encoding ?

                        TIG

                        1 Reply Last reply Reply Quote 0
                        • thomthomT Offline
                          thomthom
                          last edited by 28 Jun 2009, 22:22

                          @unknownuser said:

                          UTF8 doesn't work with the SU Ruby API. I figured out this sad bit of news when I wrote the 3DTextTool.

                          UTF8 works in Ruby just fine.

                          Google knows. They've known for the past several maintenance updates.

                          Todd

                          That was my first problem when I first tried to write ruby plugins; writing in UTF-8. From doing websites I've grown into the custom of using UTF-8 to account for most languages. I figured that I was doing something wrong and meant to go back and have another look at some point. So, essentially UTF-8 is no-go? And this is due to the SU API - not Ruby?

                          Thomas Thomassen — SketchUp Monkey & Coding addict
                          List of my plugins and link to the CookieWare fund

                          1 Reply Last reply Reply Quote 0
                          • D Offline
                            Didier Bur
                            last edited by 29 Jun 2009, 10:29

                            Good advice, thanks TIG 👍

                            DB

                            1 Reply Last reply Reply Quote 0
                            • TIGT Offline
                              TIG Moderator
                              last edited by 29 Jun 2009, 11:09

                              Yes. You can't use FileTest.exist?(Sketchup.active_model.path) if the file has unicode. The 'path' SUp reports looks OK with say ascii_chr=233 for 'é', however the FileTest sees the 'é' as a unicode and so returns false - although they both 'look' the same, the character encoding is different.
                              My clunky fix only works on the top-most file (or folder) containing the unicode parts, as the Dir.entities(dir) falls over if there are accents earlier in the path...
                              It can't be beyond the wit of man to take 'Sketchup.active_model.path' and encode it as unicode in a way that would match the Ruby built-ins like FileTest.exist?(path) or Dir.entities(dir)... however it is beyond the wit of me... 😕

                              TIG

                              1 Reply Last reply Reply Quote 0
                              • thomthomT Offline
                                thomthom
                                last edited by 29 Jun 2009, 11:25

                                I think that I couldn't even get UTF-8 scripts to run... I'll have a look at Ruby + SU + UTF. Wonder if Ruby has some nice encoding methods.
                                Seeing how there's many scripts that uses localisation it's be very nice to have UTF-8.
                                .SKP has a weird combination of UTF+8 and regular ACSII. Seems that it wasn't originally UTF-8 and it was later added. Maybe we're running into problems due to this.

                                Thomas Thomassen — SketchUp Monkey & Coding addict
                                List of my plugins and link to the CookieWare fund

                                1 Reply Last reply Reply Quote 0
                                • TIGT Offline
                                  TIG Moderator
                                  last edited by 29 Jun 2009, 11:54

                                  233.chr ### a plain ascii é
                                  é
                                  233.chr+233.chr ### 2 number plain ascii é make éé
                                  éé
                                  195.chr ### a plain ascii capital A with an umlaut
                                  Ã
                                  169.chr ### a plain ascii the (c)opyright symbol
                                  ©
                                  195.chr+169.chr ### BUT these 2 number ascii codes added together = one unicode é that looks like an ascii é !!!
                                  é

                                  ??? go figure ???

                                  TIG

                                  1 Reply Last reply Reply Quote 0
                                  • thomthomT Offline
                                    thomthom
                                    last edited by 29 Jun 2009, 12:19

                                    UTF only uses two bytes for some of the characters. For most of the latin characters it uses 1byte equal to normal ASCII.

                                    Thomas Thomassen — SketchUp Monkey & Coding addict
                                    List of my plugins and link to the CookieWare fund

                                    1 Reply Last reply Reply Quote 0
                                    • TIGT Offline
                                      TIG Moderator
                                      last edited by 1 Jul 2009, 13:58

                                      Didieret al...

                                      After more than a year and a bit...

                                      typical usage: file_found?(Sketchup.active_model.path)

                                      returns trueif the file found,

                                      even with accented unicode characters in name/path,

                                      e.g. qualisé.skp

                                      EDIT: see here for latest file... http://forums.sketchucation.com/viewtopic.php?p=169225#p169225

                                      TIG

                                      1 Reply Last reply Reply Quote 0
                                      • TIGT Offline
                                        TIG Moderator
                                        last edited by 1 Jul 2009, 14:01

                                        file_found?(path) that fixes ascii in SUp Ruby path and unicode in returned filepath returning false negatives with 'FileTest.exist?(path)' - even with accented characters - is updated and moved here... http://forums.sketchucation.com/viewtopic.php?p=169225#p169225

                                        TIG

                                        1 Reply Last reply Reply Quote 0
                                        • 1 / 1
                                        • First post
                                          Last post
                                        Buy SketchPlus
                                        Buy SUbD
                                        Buy WrapR
                                        Buy eBook
                                        Buy Modelur
                                        Buy Vertex Tools
                                        Buy SketchCuisine
                                        Buy FormFonts

                                        Advertisement