Splitting strings around 2 parameters

TIG

Your last method to extract all of the strings is very elegant compared to my clumsy hack... however, I do find the construction of the RegEx test somewhat difficult - after many unsuccessful tests my quick 'hack' looked more appealing - but now you've made an example the 'crib' is there...

thomthom

RegEx are a pain to learn IMO. I started meddling with them when I was doing webdesign, since you need to do a lot of string processing. But for a long time I created my regex on a hit an miss basis. But slowly I've managed to get a better grasp of them. But there are still many features of the system I don't know how to use. But I know the basics to sniff out and extract basic data.

A very nice tool to use for testing regex expressions is this: http://www.rubular.com/
Live update as you modify the expression and you have that quick reference at the bottom to jog your memory.

TIG

Thanks for the site - useful...

Chris Fullmer

Awesome guys! Thanks so much, I'll be working these into my script later today. Thanks again,

Chris

Dan Rathbun

easier to understand:


# htstr would be the html you grab
htstr='<dl class="apireference"> <dt id="copyright"><span class="myclass">I want all this text.  All of it as a single string.</span><span class="version">SketchUp 6.0+</span></dt>'
#
# replace first html tag with <***>
s1=htstr.sub('<span class="myclass">','<***>')
#
# replace second html tag with <***>
s2=s1.sub('</span>','<***>')
#
# now split using your custom <***> delimiter
# and take the second array element [1]
apistr=s2.split('<***>')[1]
#
# >> I want all this text.  All of it as a single string.

it could be condensed into a one-liner method:


def grabAPI( htstr )
  return htstr.sub('<span class="myclass">','<***>').sub('</span>','<***>').split('<***>')[1]
end #

Chris Fullmer

Thats awesome, thanks Dan! I'm going to play with this tonight. String parsing is not my favorite thing currently, but you guys are making it bareable.

Chris

Dan Rathbun

Here's another example using substrings specified by range offsets:
(I dup'd the string just in case because I'm slicing off the first unsued part.)


def grabAPI( htstr )
  temp = htstr.dup
  temp.slice!(0..temp.index('<span class="myclass">')+21)
  return temp[0..(temp.index('</span>')-1)]
end #

Chris Fullmer

ok, this is remarkably painful, but still somehow keeping me amused. I stay up late everynight trying to figure out how to parse this text. Thanks to everyone who is chiming in.

New question. What is this error?

(eval):62: warning: string pattern instead of regexp; metacharacters no longer effective

I am getting it for 2 different lines of code:

temp_info_str_array.sub(" ", "") if temp_info_str_array[0] == 32
and
temp_str = str.split("***")
In the first one I just wanted to remove the first character of the string if it is a space. And the second one seems pretty simple, just split a string at the *** delimeter. But each of these lines seems to to be throwing that error, and I'm not exactly sure what it means. But I'm guessing I'm just doing something wrong. Any ideas what it is?

Chris

thomthom

Not an error, just warning that your match pattern is not a regex.

TIG

temp_info_str_array.gsub!(/^ /,'') should remove just the first white-space, or try
temp_info_str_array.strip! to remove all leadings and trailing white-spaces
str.lstrip! to remove all leading white-spaces
str.rstrip! to remove all trailing white-spaces
str.slice!() to remove the specified portion(s) of the string,
e.g. str.slice1(0) removes the first character, also
str.chomp! typically to remove the \n etc
str.chop! to remove the last character
etc etc there are very many 'string' methods

Dan Rathbun

@chris fullmer said:

temp_info_str_array.sub(" ", "") if temp_info_str_array[0] == 32

The if condition has an error, should be:
... if temp_info_str_array[0] == **32.chr**

but as TIG said, temp_info_str_array.lstrip! is much easier.

thomthom

@dan rathbun said:

The if condition has an error, should be:
... if temp_info_str_array[0] == **32.chr**

Nope - not under Ruby 1.8.

"string"[0] 115 "string"[0,1] s

This was changed in 1.9 though.

Dan Rathbun

@thomthom said:

@dan rathbun said:

The if condition has an error, should be:
... if temp_info_str_array[0] == **32.chr**

Nope - not under Ruby 1.8.
"string"[0] 115 "string"[0,1] s
I stand corrected. (Confused with Pascal, a min there.)
I always think of Strings as Arrays of Char; and a subscript should return the character at that index.
So for Ruby I'd need probably do: " string"[0..0]==32.chr
It's just kinda weird.

@thomthom said:

This was changed in 1.9 though.
What did they change it to?

EDIT: n/m I see they changed it to the way I expected it to work.
And added the String.ord method to return the ASCII ordinal. That's the way it should work! like:
" string"[0].ord==32 >> true # in ver1.9.x

thomthom

I got caught on this the first time I tried to extract characters at indexes as well, being used to PHP. And it really is counter-intuitive the way Ruby 1.8 works.

Dan Rathbun

@thomthom said:

And it really is counter-intuitive the way Ruby 1.8 works.

Agree! .. but at least they revising Ruby to correct things the way they should be.