Ruby "split" in file line reading
-
hi,
I used to use:
b = file.gets()
c=b.split(" ")
to read lines from a data file in text.but the following lines doesn't have a " " between two data if the value is negative.
0.208646094863E+000-0.513313617099E-001-0.915625940530E-003 0.180604789668E+000-0.240882759294E-001-0.872767876563E-003
How do I split them now?
Thanks in advance
Cean
-
Try this, it does not matter whether the numbers or exponents are postive or negative:
b = file.gets() x = b[0..18].strip.to_f y = b[19..37].strip.to_f z = b[38..-1].strip.to_f
It is also safer to read the whole file into an array, then iterate the array:
` filepath = 'dir/dir2/filename.dat'
b = IO.readlines(filepath)b is is now an Array of textlines
b.each_with_index do |line,index|
process the current line
end`
-
For something like this I like to use regular expressions. It has the advantage of allowing you to explicitly capture the pattern you're looking for, regardless of slight variations in the files. Here is how I would parse the line into [float,int] pairs, and note that even if the values are separated by white space this will still work. The pattern given matches an optional + or -, then a digit, a decimal place, then more digits till it finds then E, then again an optional + or -, and exactly 3 digits. the string.scan method finds every match in the string and puts it into an array, and the pattern has groups delimited by the parenthesis which split the matching string natural into pair as [num,exp], all that's left is to convert to numerical quantities. Here is the code.
def parse_line(line) #accepts a string of numbers in scientific notation #returns an array of [float,int] pairs pat = /([-|\+]?\d\.\d*)E([-|\+]?\d\d\d)/ return line.scan(pat).map{|num,exp| [num.to_f, exp.to_i]} end
-
CB.. how about doing the conversion within the method, and returning an [x,y,z] array ?
def parse_line(line) #accepts a string of numbers in scientific notation #returns an array of [float,int] pairs pat = /([-|\+]?\d\.\d*)E([-|\+]?\d\d\d)/ val = line.scan(pat).map{|num,exp| [num.to_f, exp.to_i]} # Use Skecthup's Array class extended instance methods; # .x(), .y() and .z() to get the 3 members of val array. return [ val.x[0]*(10**val.x[1]), val.y[0]*(10**val.y[1]), val.z[0]*(10**val.z[1]) ] end
-
@dan rathbun said:
CB.. how about doing the conversion within the method, and returning an [x,y,z] array ?
Certainly can be done, I just didn't presume that the three numbers were coordinate values, or that a line would only have three values, but under these assumptions. I would probably do it a little differently however by including the construction point within the map block which tightens up the code a bit more than even my original.
def parse_line(line) #accepts a string of numbers in scientific notation #returns an array of [float,int] pairs pat = /([-|\+]?\d\.\d*)E([-|\+]?\d\d\d)/ return line.scan(pat).map{|num,exp| num.to_f**exp.to_i} end
You could also convert to numerical values and then stride the array if the values are not neatly separated by line. I figured there are times when scientific notation may be the preferred output for whatever reason, so I chose that route for the example code. In any event, the regular expression is at the heart of the function and the rest is easily tailored to various needs.
-
After looking at this again, it seems ruby understands scientific notation natively, so there is no need to split the block into num and exp, it can just be grouped as a single value. Here is the modification which gives a noticeable increase in speed as well. I would expect in general the regular expression will be slower than index parsing too.
def parse_line(line) #accepts a string of numbers in scientific notation #returns an array of [float,int] pairs pat = /[-|\+]?\d\.\d*E[-|\+]?\d\d\d/ return line.scan(pat).map{|num| num.to_f} end
-
Heard of regular expressions before, This time I tried to understand it. Thanks.
-
@shirazbj said:
Heard of regular expressions before, This time I tried to understand it. Thanks.
Regular expression are simultaneously very useful and painful to work with. There is a great quote by Jamie Zawinski which goes "Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems." I first ran into them at Dive into Python 3 which gives a couple simple case studies and although the language is slightly different the regular expressions are basically the same.
Looking at the regular expression given by /[-|+]?\d.\dE[-|+]?\d\d\d/ we have the / on each end marking the start and end of the expression. There are two blocks of [-|+]? which matches an optional + or - sign. I would read this is "0 or 1 instances of + or -". Then we have \d.\dE which matches a single digit, then some arbitrary number of digits until it find an E, then we again have an option sign followed by exactly 3 digits. So this pattern will match a single numerical value in the string given. The .scan method then constructs an array out of every match. From here you can wrap blocks in parenthesis to create groups so you can break the matches up into their own arrays which is what I had done originally until I realized Ruby would parse scientific notation with the .to_f method.
Advertisement