sketchucation logo sketchucation
    • Login
    ℹ️ Licensed Extensions | FredoBatch, ElevationProfile, FredoSketch, LayOps, MatSim and Pic2Shape will require license from Sept 1st More Info

    Splitting strings around 2 parameters

    Scheduled Pinned Locked Moved Developers' Forum
    18 Posts 4 Posters 800 Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Chris FullmerC Offline
      Chris Fullmer
      last edited by

      Hi, I really have not mastered strings, so this is probably a pretty beginner question. But I want to parse a string from an html. I want to find everything between the <span> and </span> tags in the string. How would I do that? Here is an example string that I could be parsing:

      <dl class="apireference"> <dt id="copyright"><span class="myclass">I want all this text.  All of it as a single string.</span><span class="version">SketchUp 6.0+</span></dt>
      

      Any thoughts? Thanks folks,

      Chris

      Lately you've been tan, suspicious for the winter.
      All my Plugins I've written

      1 Reply Last reply Reply Quote 0
      • TIGT Offline
        TIG Moderator
        last edited by

        txt1="<dl class=\"apireference\"> <dt id=\"copyright\"><span class=\"myclass\">I want all this text.  All of it as a single string.</span><span class=\"version\">SketchUp 6.0+</span></dt>"
        puts txts1=txt1.split("<span class=\"myclass\">")
        puts txt2=txts1[1]###***
        puts txts2=txt2.split("</span>")
        puts txt3=txts2[0]
        
        

        Split the string at the designated text into pieces in an array.
        ***If you are not sure which array item it'll be in 'txts1' then you can add a test for each until you find one that doesn't start with a '<' etc - i.e. it's your 'string'...

        txt2=""
        txts1.each{|txt|
          if not txt=~/^[<]/
            txt2=txt
            break
          end#if
        }
        

        TIG

        1 Reply Last reply Reply Quote 0
        • thomthomT Offline
          thomthom
          last edited by

          Or a RegEx: http://ruby-doc.org/core/classes/String.html#M000812

          ` > match = str.scan(/<span(?:\s+.?)>(.?)</span>/)
          [["I want all this text. All of it as a single string."], ["SketchUp 6.0+"]]

          match[0]
          ["I want all this text. All of it as a single string."]
          match[1]
          ["SketchUp 6.0+"]

          str1 = match[0][0]
          "I want all this text. All of it as a single string."
          str2 = match[1][0]
          "SketchUp 6.0+"`

          And also be accessed by a block:
          > str.scan(/<span(?:\s+.*?)>(.*?)<\/span>/) { |match| p match[0] } "I want all this text. All of it as a single string." "SketchUp 6.0+"

          Thomas Thomassen β€” SketchUp Monkey & Coding addict
          List of my plugins and link to the CookieWare fund

          1 Reply Last reply Reply Quote 0
          • TIGT Offline
            TIG Moderator
            last edited by

            Your last method to extract all of the strings is very elegant compared to my clumsy hack... however, I do find the construction of the RegEx test somewhat difficult - after many unsuccessful tests my quick 'hack' looked more appealing - but now you've made an example the 'crib' is there... πŸ˜„

            TIG

            1 Reply Last reply Reply Quote 0
            • thomthomT Offline
              thomthom
              last edited by

              RegEx are a pain to learn IMO. I started meddling with them when I was doing webdesign, since you need to do a lot of string processing. But for a long time I created my regex on a hit an miss basis. But slowly I've managed to get a better grasp of them. But there are still many features of the system I don't know how to use. But I know the basics to sniff out and extract basic data.

              A very nice tool to use for testing regex expressions is this: http://www.rubular.com/
              Live update as you modify the expression and you have that quick reference at the bottom to jog your memory.

              Thomas Thomassen β€” SketchUp Monkey & Coding addict
              List of my plugins and link to the CookieWare fund

              1 Reply Last reply Reply Quote 0
              • TIGT Offline
                TIG Moderator
                last edited by

                Thanks for the site - useful... πŸ€“

                TIG

                1 Reply Last reply Reply Quote 0
                • Chris FullmerC Offline
                  Chris Fullmer
                  last edited by

                  Awesome guys! Thanks so much, I'll be working these into my script later today. Thanks again,

                  Chris

                  Lately you've been tan, suspicious for the winter.
                  All my Plugins I've written

                  1 Reply Last reply Reply Quote 0
                  • Dan RathbunD Offline
                    Dan Rathbun
                    last edited by

                    easier to understand:

                    
                    # htstr would be the html you grab
                    htstr='<dl class="apireference"> <dt id="copyright"><span class="myclass">I want all this text.  All of it as a single string.</span><span class="version">SketchUp 6.0+</span></dt>'
                    #
                    # replace first html tag with <***>
                    s1=htstr.sub('<span class="myclass">','<***>')
                    #
                    # replace second html tag with <***>
                    s2=s1.sub('</span>','<***>')
                    #
                    # now split using your custom <***> delimiter
                    # and take the second array element [1]
                    apistr=s2.split('<***>')[1]
                    #
                    # >> I want all this text.  All of it as a single string.
                    
                    

                    it could be condensed into a one-liner method:

                    
                    def grabAPI( htstr )
                      return htstr.sub('<span class="myclass">','<***>').sub('</span>','<***>').split('<***>')[1]
                    end #
                    
                    

                    I'm not here much anymore.

                    1 Reply Last reply Reply Quote 0
                    • Chris FullmerC Offline
                      Chris Fullmer
                      last edited by

                      Thats awesome, thanks Dan! I'm going to play with this tonight. String parsing is not my favorite thing currently, but you guys are making it bareable.

                      Chris

                      Lately you've been tan, suspicious for the winter.
                      All my Plugins I've written

                      1 Reply Last reply Reply Quote 0
                      • Dan RathbunD Offline
                        Dan Rathbun
                        last edited by

                        Here's another example using substrings specified by range offsets:
                        (I dup'd the string just in case because I'm slicing off the first unsued part.)

                        
                        def grabAPI( htstr )
                          temp = htstr.dup
                          temp.slice!(0..temp.index('<span class="myclass">')+21)
                          return temp[0..(temp.index('</span>')-1)]
                        end #
                        
                        

                        I'm not here much anymore.

                        1 Reply Last reply Reply Quote 0
                        • Chris FullmerC Offline
                          Chris Fullmer
                          last edited by

                          ok, this is remarkably painful, but still somehow keeping me amused. I stay up late everynight trying to figure out how to parse this text. Thanks to everyone who is chiming in.

                          New question. What is this error?

                          (eval):62: warning: string pattern instead of regexp; metacharacters no longer effective

                          I am getting it for 2 different lines of code:

                          temp_info_str_array.sub(" ", "") if temp_info_str_array[0] == 32
                          and
                          temp_str = str.split("***")
                          In the first one I just wanted to remove the first character of the string if it is a space. And the second one seems pretty simple, just split a string at the *** delimeter. But each of these lines seems to to be throwing that error, and I'm not exactly sure what it means. But I'm guessing I'm just doing something wrong. Any ideas what it is?

                          Chris

                          Lately you've been tan, suspicious for the winter.
                          All my Plugins I've written

                          1 Reply Last reply Reply Quote 0
                          • thomthomT Offline
                            thomthom
                            last edited by

                            Not an error, just warning that your match pattern is not a regex.

                            Thomas Thomassen β€” SketchUp Monkey & Coding addict
                            List of my plugins and link to the CookieWare fund

                            1 Reply Last reply Reply Quote 0
                            • TIGT Offline
                              TIG Moderator
                              last edited by

                              temp_info_str_array.gsub!(/^ /,'') should remove just the first white-space, or try
                              temp_info_str_array.strip! to remove all leadings and trailing white-spaces
                              str.lstrip! to remove all leading white-spaces
                              str.rstrip! to remove all trailing white-spaces
                              str.slice!() to remove the specified portion(s) of the string,
                              e.g. str.slice1(0) removes the first character, also
                              str.chomp! typically to remove the \n etc
                              str.chop! to remove the last character
                              etc etc there are very many 'string' methods

                              TIG

                              1 Reply Last reply Reply Quote 0
                              • Dan RathbunD Offline
                                Dan Rathbun
                                last edited by

                                @chris fullmer said:

                                temp_info_str_array.sub(" ", "") if temp_info_str_array[0] == 32

                                The if condition has an error, should be:
                                ... if temp_info_str_array[0] == **32.chr**

                                but as TIG said, temp_info_str_array.lstrip! is much easier.

                                I'm not here much anymore.

                                1 Reply Last reply Reply Quote 0
                                • thomthomT Offline
                                  thomthom
                                  last edited by

                                  @dan rathbun said:

                                  The if condition has an error, should be:
                                  ... if temp_info_str_array[0] == **32.chr**

                                  Nope - not under Ruby 1.8.

                                  "string"[0] 115 "string"[0,1] s

                                  This was changed in 1.9 though.

                                  Thomas Thomassen β€” SketchUp Monkey & Coding addict
                                  List of my plugins and link to the CookieWare fund

                                  1 Reply Last reply Reply Quote 0
                                  • Dan RathbunD Offline
                                    Dan Rathbun
                                    last edited by

                                    @thomthom said:

                                    @dan rathbun said:

                                    The if condition has an error, should be:
                                    ... if temp_info_str_array[0] == **32.chr**

                                    Nope - not under Ruby 1.8.
                                    "string"[0] 115 "string"[0,1] s
                                    I stand corrected. (Confused with Pascal, a min there.)
                                    I always think of Strings as Arrays of Char; and a subscript should return the character at that index.
                                    So for Ruby I'd need probably do: " string"[0..0]==32.chr
                                    It's just kinda weird.

                                    @thomthom said:

                                    This was changed in 1.9 though.
                                    What did they change it to?

                                    %(#4000BF)[EDIT: n/m I see they changed it to the way I expected it to work.
                                    And added the]String.ordmethod to return the ASCII ordinal. That's the way it should work! like:
                                    " string"[0].ord==32 >> true # in ver1.9.x

                                    I'm not here much anymore.

                                    1 Reply Last reply Reply Quote 0
                                    • thomthomT Offline
                                      thomthom
                                      last edited by

                                      I got caught on this the first time I tried to extract characters at indexes as well, being used to PHP. And it really is counter-intuitive the way Ruby 1.8 works.

                                      Thomas Thomassen β€” SketchUp Monkey & Coding addict
                                      List of my plugins and link to the CookieWare fund

                                      1 Reply Last reply Reply Quote 0
                                      • Dan RathbunD Offline
                                        Dan Rathbun
                                        last edited by

                                        @thomthom said:

                                        And it really is counter-intuitive the way Ruby 1.8 works.

                                        Agree! .. but at least they revising Ruby to correct things the way they should be.

                                        I'm not here much anymore.

                                        1 Reply Last reply Reply Quote 0
                                        • 1 / 1
                                        • First post
                                          Last post
                                        Buy SketchPlus
                                        Buy SUbD
                                        Buy WrapR
                                        Buy eBook
                                        Buy Modelur
                                        Buy Vertex Tools
                                        Buy SketchCuisine
                                        Buy FormFonts

                                        Advertisement