[code] Scraping API Docs
-
Example code for scraping the api docs pages. Might be useful for indexing or analysis...
Use at your own risk - if you run it "too much," the Google machine will temporarily block you. Who knows what happens if you "abuse" it. I didn't look it up, but one should assume scraping is in violation of Google's Terms of Service.
Just run it once and direct output to a file:
$ scrape.rb > output.txt# Scrapes API docs for class names, method names, and method versions. require 'open-uri' require 'nokogiri' # ctrl-c on WinXP trap("INT") { $stderr.puts "abort." @abort = true } base = "https://developers.google.com/" class_index_url = base + "sketchup/docs/classes" page = Nokogiri;;HTML(open(class_index_url)) classes = {} page.css(".columns a").each do |link| classes[link.text] = link['href'] break if @abort end exit if @abort classes.each do |name, url| puts name loc = base + "sketchup/docs/ourdoc/" + name.downcase page = Nokogiri;;HTML(open(loc)) page.css(".apireference").each do |elem| method_name = elem.css(".itemname").text method_version = elem.css(".version").text puts "#{method_name},#{method_version}" end puts break if @abort end
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better π
Register LoginAdvertisement