• Login
sketchucation logo sketchucation
  • Login
ℹ️ Licensed Extensions | FredoBatch, ElevationProfile, FredoSketch, LayOps, MatSim and Pic2Shape will require license from Sept 1st More Info

Set.insert vs array << x unless array.include?(x)

Scheduled Pinned Locked Moved Developers' Forum
19 Posts 7 Posters 10.7k Views 7 Watching
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T Offline
    thomthom
    last edited by thomthom 20 Nov 2009, 12:04

    I was curious of the difference between Set.insert vs array << x unless array.include?(x) when you want a collection with a unique set of values.

    So I did some tests:

    Test 1
    A 10,000,000 times I tried to insert a number between 0-10.

    t=Time.now;a=[];10000000.times{r=rand(10);a<<r unless a.include?(r)};puts Time.now - t
    Result: 12.297

    t=Time.now;a=Set.new;10000000.times{a.insert(rand(10))};puts Time.now - t
    Result: 15.719

    The array was faster here by three seconds. But that's with ten million iterations!

    Test 2
    But, when your collection of unique values increases:
    t=Time.now;a=[];100000.times{r=rand(10000);a<<r unless a.include?(r)};puts Time.now - t
    Result: 45.129

    t=Time.now;a=Set.new;100000.times{a.insert(rand(10000))};puts Time.now - t
    Result: 0.391

    When your collection include a much higher number of elements - then the Array quickly starts to crumble. In this case, when the possible unique set was up to 100,000 the difference was 44.738 seconds!!!

    I tried this in some of my scrips which on occasions require a unique set of entites (like collecting all vertices) and the speed difference was immense on larger models.

    It doesn't take too many number of unique elements before Set becomes faster then array.include?.

    Scaling
    Just to illustrate how they scale:
    Array.include?
    t=Time.now;a=[];1000.times{r=rand(100);a<<r unless a.include?(r)};puts Time.now - t
    0.0
    t=Time.now;a=[];10000.times{r=rand(1000);a<<r unless a.include?(r)};puts Time.now - t
    0.485
    t=Time.now;a=[];100000.times{r=rand(10000);a<<r unless a.include?(r)};puts Time.now - t
    42.906
    t=Time.now;a=[];1000000.times{r=rand(100000);a<<r unless a.include?(r)};puts Time.now - t
    Took too long. I guess it'd take ~4000.0s

    Set.insert
    t=Time.now;a=Set.new;1000.times{a.insert(rand(100))};puts Time.now - t
    0.0
    t=Time.now;a=Set.new;10000.times{a.insert(rand(1000))};puts Time.now - t
    0.047
    t=Time.now;a=Set.new;100000.times{a.insert(rand(10000))};puts Time.now - t
    0.422
    t=Time.now;a=Set.new;1000000.times{a.insert(rand(100000))};puts Time.now - t
    4.937

    Conclusion (not!)
    %(#aaaaaa)[Use the Set class when you require a unique set of values - unless your set of unique values is very small.
    Set class is the rule - Array.include? is the exception.]

    Update
    None of the above is the fastest. A better way has appeared. Collect everything into an array and apply [ruby:2yy3xvjs].unique![/ruby:2yy3xvjs] at the end.
    http://forums.sketchucation.com/viewtopic.php?f=180&t=23760&p=222231#p222221

    Thomas Thomassen — SketchUp Monkey & Coding addict
    List of my plugins and link to the CookieWare fund

    1 Reply Last reply Reply Quote 0
    • T Offline
      thomthom
      last edited by 20 Nov 2009, 12:10

      Also, Set.include? is faster than Array.include?

      Thomas Thomassen — SketchUp Monkey & Coding addict
      List of my plugins and link to the CookieWare fund

      1 Reply Last reply Reply Quote 0
      • F Offline
        fredo6
        last edited by 20 Nov 2009, 18:00

        Tom,

        very useful actually.
        It really looks that the processsing in C is a lot faster than in Ruby.

        Thanks

        Fredo

        1 Reply Last reply Reply Quote 0
        • T Offline
          thomthom
          last edited by 20 Nov 2009, 18:07

          @unknownuser said:

          Tom,

          very useful actually.
          It really looks that the processsing in C is a lot faster than in Ruby.

          Thanks

          Fredo

          Set class makes more processing in C?
          What I'm wondering is, the Set class in SU Ruby is not the same Set class you get in Ruby 1.8.0. The Set class in Ruby 1.8.0 includes Enumerable and the set.rb looks to be a pure ruby class that mixes arrays and hashes. From my Ruby 1.8.0. installation, the Set class seem to be a pure Ruby implementation. But maybe the Set class that comes with SU is a C implementation...? Maybe that's why they replaced it?

          Thomas Thomassen — SketchUp Monkey & Coding addict
          List of my plugins and link to the CookieWare fund

          1 Reply Last reply Reply Quote 0
          • T Offline
            thomthom
            last edited by 20 Nov 2009, 18:09

            I thought the speed difference was that the Set class used a more efficient Hash lookup as oppose to the Array functions that has to iterate the array every time... but that's just guesswork...

            Thomas Thomassen — SketchUp Monkey & Coding addict
            List of my plugins and link to the CookieWare fund

            1 Reply Last reply Reply Quote 0
            • F Offline
              fredo6
              last edited by 20 Nov 2009, 20:08

              @thomthom said:

              I thought the speed difference was that the Set class used a more efficient Hash lookup as oppose to the Array functions that has to iterate the array every time... but that's just guesswork...

              You're right.
              I personally use Hash whenever I want to store lists that have unique elements.
              The list itself is obtained via Hash.values.

              Maybe, as you are on this, you extend your benchmark to Hash and see how it compares with Set.

              Fred

              1 Reply Last reply Reply Quote 0
              • Chris FullmerC Offline
                Chris Fullmer
                last edited by 20 Nov 2009, 21:10

                Great info Thom! I had played around with sets and arrays before and I was not impressed with sets. But I did not experiment with amount of unique objects, which apparently has an adverse effect on arrays. Thanks,

                Chris

                Lately you've been tan, suspicious for the winter.
                All my Plugins I've written

                1 Reply Last reply Reply Quote 0
                • T Offline
                  thomthom
                  last edited by 21 Nov 2009, 02:09

                  @unknownuser said:

                  Will have a look

                  Will have a look at that.

                  Thomas Thomassen — SketchUp Monkey & Coding addict
                  List of my plugins and link to the CookieWare fund

                  1 Reply Last reply Reply Quote 0
                  • T Offline
                    thomthom
                    last edited by 5 Feb 2010, 12:22

                    I'll be damned!
                    Very interesting Jernej.

                    ...looks like I need to do some more testing of my script and possibly refactor again.

                    So while the Array.include? is dead slow - the overhead of hash look-up is still faster than just adding everything into one big pile and so a single filtering afterwards...

                    Thomas Thomassen — SketchUp Monkey & Coding addict
                    List of my plugins and link to the CookieWare fund

                    1 Reply Last reply Reply Quote 0
                    • T Offline
                      thomthom
                      last edited by 5 Feb 2010, 12:24

                      looking at the .uniq! source code: http://ruby-doc.org/core/classes/Array.src/M002215.html

                      
                      /*
                       *  call-seq;
                       *     array.uniq! -> array or nil
                       *  
                       *  Removes duplicate elements from _self_.
                       *  Returns <code>nil</code> if no changes are made (that is, no
                       *  duplicates are found).
                       *     
                       *     a = [ "a", "a", "b", "b", "c" ]
                       *     a.uniq!   #=> ["a", "b", "c"]
                       *     b = [ "a", "b", "c" ]
                       *     b.uniq!   #=> nil
                       */
                      
                      static VALUE
                      rb_ary_uniq_bang(ary)
                          VALUE ary;
                      {
                          VALUE hash, v, vv;
                          long i, j;
                      
                          hash = ary_make_hash(ary, 0);
                      
                          if (RARRAY(ary)->len == RHASH(hash)->tbl->num_entries) {
                              return Qnil;
                          }
                          for (i=j=0; i<RARRAY(ary)->len; i++) {
                              v = vv = rb_ary_elt(ary, i);
                              if (st_delete(RHASH(hash)->tbl, (st_data_t*)&vv, 0)) {
                                  rb_ary_store(ary, j++, v);
                              }
                          }
                          RARRAY(ary)->len = j;
                      
                          return ary;
                      }
                      
                      

                      Thomas Thomassen — SketchUp Monkey & Coding addict
                      List of my plugins and link to the CookieWare fund

                      1 Reply Last reply Reply Quote 0
                      • T Offline
                        thomthom
                        last edited by 5 Feb 2010, 12:26

                        Jernej: how about larger iterations and higher number of random values?

                        Thomas Thomassen — SketchUp Monkey & Coding addict
                        List of my plugins and link to the CookieWare fund

                        1 Reply Last reply Reply Quote 0
                        • J Offline
                          Jernej Vidmar
                          last edited by 5 Feb 2010, 12:40

                          How about using Array.uniq! method:

                          Test 1
                          t=Time.now;a=[];10000000.times{r=rand(10);a<<r unless a.include?(r)};puts Time.now - t
                          Result: 12.297
                          t=Time.now;a=Set.new;10000000.times{a.insert(rand(10))};puts Time.now - t
                          Result: 15.719
                          t=Time.now;a=[];10000000.times{r=rand(10);a<<r};a.uniq!; puts Time.now - t
                          Result: 7.753

                          Test 2
                          t=Time.now;a=[];100000.times{r=rand(10000);a<<r unless a.include?(r)};puts Time.now-t
                          Result: 40.97
                          t=Time.now;a=Set.new;100000.times{a.insert(rand(10000))};puts Time.now-t
                          Result: 0.377
                          t=Time.now;a=[];100000.times{r=rand(10000);a<<r};a.uniq!;puts Time.now-t
                          Result: 0.087

                          1 Reply Last reply Reply Quote 0
                          • J Offline
                            Jernej Vidmar
                            last edited by 5 Feb 2010, 12:42

                            @thomthom said:

                            Jernej: how about larger iterations and higher number of random values?

                            t=Time.now;a=Set.new;10000000.times{a.insert(rand(10000))};puts Time.now - t
                            Result: 37.911
                            t=Time.now;a=[];10000000.times{r=rand(10000);a<<r};a.uniq!; puts Time.now - t
                            Result: 8.282

                            Still a winner?

                            1 Reply Last reply Reply Quote 0
                            • T Offline
                              thomthom
                              last edited by 5 Feb 2010, 12:49

                              It's refactoring time!

                              Nice find! 👍

                              Thomas Thomassen — SketchUp Monkey & Coding addict
                              List of my plugins and link to the CookieWare fund

                              1 Reply Last reply Reply Quote 0
                              • R Offline
                                RickW
                                last edited by 8 Feb 2010, 20:34

                                That's all great (using .uniq!) until you start dealing with Point3d objects 😄
                                In that case, always use Set.

                                RickW
                                [www.smustard.com](http://www.smustard.com)

                                1 Reply Last reply Reply Quote 0
                                • TIGT Offline
                                  TIG Moderator
                                  last edited by 8 Feb 2010, 20:40

                                  ....or make all of your Point3d's into arrays so they will sort!/uniq! etc as arrays...

                                  TIG

                                  1 Reply Last reply Reply Quote 0
                                  • T Offline
                                    thomthom
                                    last edited by 8 Feb 2010, 21:29

                                    @tig said:

                                    ....or make all of your Point3d's into arrays so they will sort!/uniq! etc as arrays...

                                    But is the overhead of converting the Point3d's into arrays and uniq! faster than using a Set?

                                    Thomas Thomassen — SketchUp Monkey & Coding addict
                                    List of my plugins and link to the CookieWare fund

                                    1 Reply Last reply Reply Quote 0
                                    • TIGT Offline
                                      TIG Moderator
                                      last edited by 8 Feb 2010, 22:31

                                      Who knows ?
                                      Time for you to do another test... 😉

                                      TIG

                                      1 Reply Last reply Reply Quote 0
                                      • honoluludesktopH Offline
                                        honoluludesktop
                                        last edited by 2 Feb 2011, 02:34

                                        I probably don't know what I am doing, but I ran the following test, and obtained the attached results. I typically use array.push variable, and don't understand the situations when the other examples might be used. Btw, when I applied the other forms to my app, it failed in ways that leave me to believe that those forms are data sensitive. Can anyone explaine to a Ruby beginner what's up?

                                        t=Time.now
                                        a=[]
                                        100000.times do r=rand(10000)
                                          a<<r
                                        end
                                        a.uniq!
                                        puts Time.now-t
                                        

                                        0.125

                                        t=Time.now
                                        a=[]
                                        100000.times do r=rand(10000)
                                          a.push r
                                        end
                                        a.uniq!
                                        puts Time.now-t
                                        

                                        0.141

                                        t=Time.now
                                        a=[]
                                        100000.times do r=rand(10000)
                                          a.push r
                                        end
                                        puts Time.now-t
                                        

                                        0.094

                                        t=Time.now
                                        a=[]
                                        100000.times do r=rand(10000)
                                          a<<r
                                        end
                                        puts Time.now-t
                                        

                                        0.093

                                        1 Reply Last reply Reply Quote 0
                                        • 1 / 1
                                        • First post
                                          Last post
                                        Buy SketchPlus
                                        Buy SUbD
                                        Buy WrapR
                                        Buy eBook
                                        Buy Modelur
                                        Buy Vertex Tools
                                        Buy SketchCuisine
                                        Buy FormFonts

                                        Advertisement