Set.insert vs array << x unless array.include?(x)
-
@unknownuser said:
Tom,
very useful actually.
It really looks that the processsing in C is a lot faster than in Ruby.Thanks
Fredo
Set
class makes more processing in C?
What I'm wondering is, theSet
class in SU Ruby is not the sameSet
class you get in Ruby 1.8.0. TheSet
class in Ruby 1.8.0 includesEnumerable
and the set.rb looks to be a pure ruby class that mixes arrays and hashes. From my Ruby 1.8.0. installation, theSet
class seem to be a pure Ruby implementation. But maybe the Set class that comes with SU is a C implementation...? Maybe that's why they replaced it? -
I thought the speed difference was that the Set class used a more efficient Hash lookup as oppose to the Array functions that has to iterate the array every time... but that's just guesswork...
-
@thomthom said:
I thought the speed difference was that the Set class used a more efficient Hash lookup as oppose to the Array functions that has to iterate the array every time... but that's just guesswork...
You're right.
I personally useHash
whenever I want to store lists that have unique elements.
The list itself is obtained viaHash.values
.Maybe, as you are on this, you extend your benchmark to
Hash
and see how it compares withSet
.Fred
-
Great info Thom! I had played around with sets and arrays before and I was not impressed with sets. But I did not experiment with amount of unique objects, which apparently has an adverse effect on arrays. Thanks,
Chris
-
-
I'll be damned!
Very interesting Jernej....looks like I need to do some more testing of my script and possibly refactor again.
So while the
Array.include?
is dead slow - the overhead of hash look-up is still faster than just adding everything into one big pile and so a single filtering afterwards... -
looking at the .uniq! source code: http://ruby-doc.org/core/classes/Array.src/M002215.html
/* * call-seq; * array.uniq! -> array or nil * * Removes duplicate elements from _self_. * Returns <code>nil</code> if no changes are made (that is, no * duplicates are found). * * a = [ "a", "a", "b", "b", "c" ] * a.uniq! #=> ["a", "b", "c"] * b = [ "a", "b", "c" ] * b.uniq! #=> nil */ static VALUE rb_ary_uniq_bang(ary) VALUE ary; { VALUE hash, v, vv; long i, j; hash = ary_make_hash(ary, 0); if (RARRAY(ary)->len == RHASH(hash)->tbl->num_entries) { return Qnil; } for (i=j=0; i<RARRAY(ary)->len; i++) { v = vv = rb_ary_elt(ary, i); if (st_delete(RHASH(hash)->tbl, (st_data_t*)&vv, 0)) { rb_ary_store(ary, j++, v); } } RARRAY(ary)->len = j; return ary; }
-
Jernej: how about larger iterations and higher number of random values?
-
How about using Array.uniq! method:
Test 1
t=Time.now;a=[];10000000.times{r=rand(10);a<<r unless a.include?(r)};puts Time.now - t
Result: 12.297
t=Time.now;a=Set.new;10000000.times{a.insert(rand(10))};puts Time.now - t
Result: 15.719
t=Time.now;a=[];10000000.times{r=rand(10);a<<r};a.uniq!; puts Time.now - t
Result: 7.753Test 2
t=Time.now;a=[];100000.times{r=rand(10000);a<<r unless a.include?(r)};puts Time.now-t
Result: 40.97
t=Time.now;a=Set.new;100000.times{a.insert(rand(10000))};puts Time.now-t
Result: 0.377
t=Time.now;a=[];100000.times{r=rand(10000);a<<r};a.uniq!;puts Time.now-t
Result: 0.087 -
@thomthom said:
Jernej: how about larger iterations and higher number of random values?
t=Time.now;a=Set.new;10000000.times{a.insert(rand(10000))};puts Time.now - t
Result: 37.911
t=Time.now;a=[];10000000.times{r=rand(10000);a<<r};a.uniq!; puts Time.now - t
Result: 8.282Still a winner?
-
It's refactoring time!
Nice find!
-
That's all great (using .uniq!) until you start dealing with Point3d objects
In that case, always use Set. -
....or make all of your Point3d's into arrays so they will sort!/uniq! etc as arrays...
-
@tig said:
....or make all of your Point3d's into arrays so they will sort!/uniq! etc as arrays...
But is the overhead of converting the Point3d's into arrays and uniq! faster than using a Set?
-
Who knows ?
Time for you to do another test... -
I probably don't know what I am doing, but I ran the following test, and obtained the
attached results
. I typically usearray.push variable
, and don't understand the situations when the other examples might be used. Btw, when I applied the other forms to my app, it failed in ways that leave me to believe that those forms are data sensitive. Can anyone explaine to a Ruby beginner what's up?t=Time.now a=[] 100000.times do r=rand(10000) a<<r end a.uniq! puts Time.now-t
0.125
t=Time.now a=[] 100000.times do r=rand(10000) a.push r end a.uniq! puts Time.now-t
0.141
t=Time.now a=[] 100000.times do r=rand(10000) a.push r end puts Time.now-t
0.094
t=Time.now a=[] 100000.times do r=rand(10000) a<<r end puts Time.now-t
0.093
Advertisement