sketchucation logo sketchucation
    • Login
    Oops, your profile's looking a bit empty! To help us tailor your experience, please fill in key details like your SketchUp version, skill level, operating system, and more. Update and save your info on your profile page today!
    ⚠️ Important | Libfredo 15.6b introduces important bugfixes for Fredo's Extensions Update

    [code] Scraping API Docs

    Scheduled Pinned Locked Moved Developers' Forum
    1 Posts 1 Posters 925 Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J Offline
      Jim
      last edited by

      Example code for scraping the api docs pages. Might be useful for indexing or analysis...

      Use at your own risk - if you run it "too much," the Google machine will temporarily block you. Who knows what happens if you "abuse" it. I didn't look it up, but one should assume scraping is in violation of Google's Terms of Service.

      Just run it once and direct output to a file:

      $ scrape.rb > output.txt

      
      # Scrapes API docs for class names, method names, and method versions.
      require 'open-uri'
      require 'nokogiri'
      
      # ctrl-c on WinXP
      trap("INT") {
          $stderr.puts "abort."
          @abort = true
      }
      
      base = "https://developers.google.com/"
      class_index_url = base + "sketchup/docs/classes"
      
      page = Nokogiri;;HTML(open(class_index_url))
      
      classes = {}
      
      page.css(".columns a").each do |link|
          classes[link.text] = link['href']
          break if @abort
      end
      
      exit if @abort
      
      classes.each do |name, url|
          puts name
          loc = base + "sketchup/docs/ourdoc/" + name.downcase
          page = Nokogiri;;HTML(open(loc))
          page.css(".apireference").each do |elem|
              method_name    = elem.css(".itemname").text
              method_version = elem.css(".version").text
              puts "#{method_name},#{method_version}"
          end
          puts
          break if @abort
      end
      
      

      scrape.rb

      Hi

      1 Reply Last reply Reply Quote 0
      • 1 / 1
      • First post
        Last post
      Buy SketchPlus
      Buy SUbD
      Buy WrapR
      Buy eBook
      Buy Modelur
      Buy Vertex Tools
      Buy SketchCuisine
      Buy FormFonts

      Advertisement