ruby - Triggering ajax requests/responses before parsing a webpage with Mechanize/Nokogiri -
i parsing through website contains buyers feedback of customers. want collect name of each buyer , feedback or has given.
my issue few feedbacks given on first page. next page triggered clicking button, , website responds ajax. how new feedbacks ajax response mechanize page object? want click ajax trigger button many times possible, many feedbacks there available.
my code looks this:
require 'mechanize' require 'nokogiri' log_file = "log_file.txt" log = file.open(log_file, 'w') www = "http://www.trustpilot.dk/review/www.fona.dk" agent = mechanize.new page = agent.get(www) reviews = page.search(".clear") reviews.each |r| doc = nokogiri::html::document.parse(r.to_html) log << "####################### new review #######################\n\n" name = r.at_css(".profileinfo a").text.strip log << "customer name: #{name}\n" rating = doc.at("//meta[@itemprop = 'ratingvalue']/@content").to_s log << "rating: #{rating}\n\n" end log.close
the log file fyi this:
####################### new review ####################### customer name: hans-oluf rating: 5 ####################### new review ####################### customer name: jørgen rating: 3 ####################### new review ####################### customer name: frederik rating: 4
the ajax trigger should in peice of source code:
<div id="ajaxloader_1" class="ajaxpager"> <div class="ajaxpagerlinkwrapper"> <a class="button ajaxpagerlink" href="http://www.trustpilot.dk/review/www.fona.dk?page=2"> vis flere anmeldelser </a> </div> </div> <script type="text/javascript"> $(document).ready (function() { // testing spilttest console.log("/domains/reviews?did=767"); // element right before control var containerid = 'reviewcontainer'; var container = containerid == '' ? $('#ajaxloader_1').prev() : $($.f('#{0}', containerid)); var pager = new pager( 1, 25, 'nextpageloaded', 'ajaxloader_1', '/domains/reviews?did=767', 'page', '', container); }); </script>
easy. keep making requests to:
page = agent.get "http://www.trustpilot.dk/domains/reviews?did=767&page=#{increment me}"
until there's no more data.
Comments
Post a Comment