Sunday, 24 February 2013
Thursday, 14 February 2013
Quick and dirty facebook feed parsing
So, there's this big discussion going on for my hobby group at the moment, and the main discussion has been going on in facebook - mainly because that's where I was first asked to set up a discussion and it took all of five minutes to get the page up and running.
However - now that discussions are progressing, there are a number of people *outside* the facebookiverse who have raised quite reasonable objections to the discussion happening there. not everyone is on facebook, not everyone *wants* to be on facebook, and to be honest, a facebook group kinda sucks for searching and archiving really important discussions.
thus it has been requested that I copy all the posts and comments to Someplace Else, to make them available for more general consumption.
At first I balked at this request - 24 posts and 250 comments to be individually copied/pasted??? Who has that kind of time?
Of course when I actually sat down to think about the problem seriously, it took far less time than I though to solve it. So here's what I did, including the quick-and-dirty ruby script that will massage the facebook feed into something that resembles readable format. It aint pretty - but it'll pass for government work.
Step 1: get the feed from the API
I'll assume you are actually a member of the group that you're after. You will need to be.
You need to go to your group and get the group's ID from the URL.
The facebook API page is here: Facebook Graph API explorer
First you need to create an access token to get the data out of facebook. This is essentially the same as doing one of those "allow application to access my data" things that you click on when you add a new app. In this stage, you need to allow the Graph-API application to access *your* group-data, to prove that you have access to he feed of the group - and to allow it to fetch out all the posts for you.
- click the "Get access token" button
- Select the "user_groups" checkbox
- click the new "get access token" button
- follow any prompts (if this is the first time using this API, you'll get the "allow access for this application" process - but it may not happen for subsequent attempts
You should now have a long encoded token in the box at the top of the page.
Next up we need to tell it what feed to use. There's a drop-down labelled "GET" which you should leave as-is. In the text-box next to that, type in the ID of the group in a URL-format like this: /1234567890?fields=feed and then click "Submit". The "fields=feed" tells the API to actually go and fetch the feed of posts and comments.
At this stage, you should be able to see a huge hash full of posts and comments in the box to the right hand side of the screen. Copy and paste that into a file.
Step 2: massage it into shape
Now you've got your feed data, you just need to play with it and spit it out into a nicer format. In our case, I decided to go for just a rough html format that showed what the posts were, what comments were attached, and who said what. My script is posted below - which can serve as a starting point for whatever you'd like to see done.
This script accepts the input filename and an optional output filename (or it just jams '.html' on the end of the input filename). It'll generate a really rough-and-ready html file that contains the posts and comments (with names and datetimes) plus some of the links (if present).
Enjoy...
#!/usr/bin/env ruby DATE_FORMAT = "%H:%M:%S %d-%m-%Y" class Object def blank? self.nil? || (self.respond_to?(:empty?) && self.empty?) end def present? !self.blank? end end new_file_name = nil # if they've passed in the filename, use it if ARGV && ARGV.length >= 1 file_name = ARGV[0] if ARGV.length > 2 new_file_name = ARGV[1] end end if file_name.blank? p "usage: facebookfeed <file_name> [<outfile_name>]" exit(1) end p "got file_name of: #{file_name}" unless File.exists?(file_name) p "file: #{file_name} does not exist" exit(1) end # munge up an html filename for the output file new_file_name ||= file_name.split('.').first + '.html' if File.exists?(new_file_name) p "output file: #{new_file_name} already exists, please supply another" p "usage: facebookfeed <file_name> [<outfile_name>]" exit(1) end # parse json in file into ruby - preferably a hash require 'rubygems' require 'json' require 'date' facebook_hash = JSON.parse(IO.read(file_name)) feed_data = facebook_hash["feed"]["data"] if feed_data.present? File.open(new_file_name,'w') do |outfile| puts "parsing #{feed_data.count} posts" sum = 0 feed_data.each {|post| sum += post["comments"]["count"].to_i } puts "with: #{sum} total comments" feed_name = feed_data.first["to"]["data"]["name"] # html headers go here outfile.puts "<html>" outfile.puts "<head><title>#{feed_name}</title></head>" outfile.puts "<body>" outfile.puts "<h1>#{feed_name}</h1>" feed_data.each do |post| outfile.puts "<p>by <b>#{post["from"]["name"]}</b> at: <b>#{DateTime.parse(post["created_time"]).strftime(DATE_FORMAT)}</b></p>" if post["picture"].present? outfile.puts "<div style=\"float:left;\"><img src=\"#{post["picture"]}\" /></div>" end name_str = post["name"] name_str = "<a href=\"#{post["link"]}\">#{name_str}</a>" if post["link"].present? outfile.puts "<h2>#{name_str}</h2>" message = post["message"] message.each do |para| outfile.puts "<p style=\"clear:both;\">#{para}</p>" end comments = post["comments"] if comments.present? && comments["count"].present? && comments["count"].to_i > 0 outfile.puts "<h3>Comments</h3>" outfile.puts "<dl>" comments["data"].each do |comment| outfile.puts "<dt>by <b>#{comment["from"]["name"]}</b> at: <b>#{DateTime.parse(comment["created_time"]).strftime(DATE_FORMAT)}</b></dt>" outfile.puts "<dd>#{comment["message"]}</dd>" end outfile.puts "</dl>" end #any comments present outfile.puts "<hr />" end # each post # html footers go here outfile.puts "</body>" outfile.puts "</html>" end # with open outfile end #feed data present
Friday, 8 February 2013
Link: What rails security means for your startup
If you hadn't already heard, Rails has a security vulnerability that affects all versions of Rails. This one is about XML-parsing of YAML strings.
This was followed by a second vulnerability in the JSON parser - again of YAML-parsed code.
So what does this all mean for all of us running Rails-based systems? Is this just a flash-in-the-pan issue that will fade away the moment it's out of the public eye? or is it a herald of the coming apocalypse?
A really cogent overview of what the rails security issue means for your startup has been written by Patrick, and I strongly recommend you read it, and pass it on.
Amongst a number of useful overviews, it covers such things as "yeah, but we're not a high-profile site, nobody's going to attack us right?" and concludes that the worst may not yet be past, and that:
You Should Be At Defcon 2 For Most Of February
Saturday, 2 February 2013
Link: Help Vampires: A Spotter’s Guide
Here's a great post on the ubiquitous "Help Vampire" who drains the life out of helpful communities...
Help Vampires: A Spotter’s Guide gives tips on how to spot, avoid and reform them for the future benefit of humanity...