Thursday 14 February 2013

Quick and dirty facebook feed parsing

So, there's this big discussion going on for my hobby group at the moment, and the main discussion has been going on in facebook - mainly because that's where I was first asked to set up a discussion and it took all of five minutes to get the page up and running.

However - now that discussions are progressing, there are a number of people *outside* the facebookiverse who have raised quite reasonable objections to the discussion happening there. not everyone is on facebook, not everyone *wants* to be on facebook, and to be honest, a facebook group kinda sucks for searching and archiving really important discussions.

thus it has been requested that I copy all the posts and comments to Someplace Else, to make them available for more general consumption.

At first I balked at this request - 24 posts and 250 comments to be individually copied/pasted??? Who has that kind of time?

Of course when I actually sat down to think about the problem seriously, it took far less time than I though to solve it. So here's what I did, including the quick-and-dirty ruby script that will massage the facebook feed into something that resembles readable format. It aint pretty - but it'll pass for government work.

Step 1: get the feed from the API

I'll assume you are actually a member of the group that you're after. You will need to be.

You need to go to your group and get the group's ID from the URL.

The facebook API page is here: Facebook Graph API explorer

First you need to create an access token to get the data out of facebook. This is essentially the same as doing one of those "allow application to access my data" things that you click on when you add a new app. In this stage, you need to allow the Graph-API application to access *your* group-data, to prove that you have access to he feed of the group - and to allow it to fetch out all the posts for you.

  1. click the "Get access token" button
  2. Select the "user_groups" checkbox
  3. click the new "get access token" button
  4. follow any prompts (if this is the first time using this API, you'll get the "allow access for this application" process - but it may not happen for subsequent attempts

You should now have a long encoded token in the box at the top of the page.

Next up we need to tell it what feed to use. There's a drop-down labelled "GET" which you should leave as-is. In the text-box next to that, type in the ID of the group in a URL-format like this: /1234567890?fields=feed and then click "Submit". The "fields=feed" tells the API to actually go and fetch the feed of posts and comments.

At this stage, you should be able to see a huge hash full of posts and comments in the box to the right hand side of the screen. Copy and paste that into a file.

Step 2: massage it into shape

Now you've got your feed data, you just need to play with it and spit it out into a nicer format. In our case, I decided to go for just a rough html format that showed what the posts were, what comments were attached, and who said what. My script is posted below - which can serve as a starting point for whatever you'd like to see done.

This script accepts the input filename and an optional output filename (or it just jams '.html' on the end of the input filename). It'll generate a really rough-and-ready html file that contains the posts and comments (with names and datetimes) plus some of the links (if present).

Enjoy...


#!/usr/bin/env ruby

DATE_FORMAT = "%H:%M:%S %d-%m-%Y"

class Object
  def blank?
    self.nil? || (self.respond_to?(:empty?) && self.empty?)
  end
  def present?
    !self.blank?
  end
end

new_file_name = nil

# if they've passed in the filename, use it
if ARGV && ARGV.length >= 1
  file_name = ARGV[0]
  if ARGV.length > 2
    new_file_name = ARGV[1]
  end
end
if file_name.blank?
  p "usage: facebookfeed <file_name> [<outfile_name>]"
  exit(1)
end
p "got file_name of: #{file_name}"

unless File.exists?(file_name)
  p "file: #{file_name} does not exist"
  exit(1)
end

# munge up an html filename for the output file
new_file_name ||= file_name.split('.').first + '.html'


if File.exists?(new_file_name)
  p "output file: #{new_file_name} already exists, please supply another"
  p "usage: facebookfeed <file_name> [<outfile_name>]"
  exit(1)
end

# parse json in file into ruby - preferably a hash
require 'rubygems'
require 'json'
require 'date'
facebook_hash = JSON.parse(IO.read(file_name))

feed_data = facebook_hash["feed"]["data"]



if feed_data.present?
  File.open(new_file_name,'w') do |outfile|
    puts "parsing #{feed_data.count} posts"
    sum = 0
    feed_data.each {|post| sum += post["comments"]["count"].to_i }
    puts "with: #{sum} total comments"

    feed_name = feed_data.first["to"]["data"]["name"]

    # html headers go here
    outfile.puts "<html>"
    outfile.puts "<head><title>#{feed_name}</title></head>"
    outfile.puts "<body>"
    outfile.puts "<h1>#{feed_name}</h1>"

    feed_data.each do |post|
      outfile.puts "<p>by <b>#{post["from"]["name"]}</b> at: <b>#{DateTime.parse(post["created_time"]).strftime(DATE_FORMAT)}</b></p>"

      if post["picture"].present?
        outfile.puts "<div style=\"float:left;\"><img src=\"#{post["picture"]}\" /></div>"
      end
      name_str = post["name"]
      name_str = "<a href=\"#{post["link"]}\">#{name_str}</a>" if post["link"].present?
      outfile.puts "<h2>#{name_str}</h2>"

      message = post["message"]
      message.each do |para|
        outfile.puts "<p style=\"clear:both;\">#{para}</p>"
      end

      comments = post["comments"]

      if comments.present? && comments["count"].present? && comments["count"].to_i > 0
        outfile.puts "<h3>Comments</h3>"
        outfile.puts "<dl>"

        comments["data"].each do |comment|
          outfile.puts "<dt>by <b>#{comment["from"]["name"]}</b> at: <b>#{DateTime.parse(comment["created_time"]).strftime(DATE_FORMAT)}</b></dt>"
          outfile.puts "<dd>#{comment["message"]}</dd>"
        end
        outfile.puts "</dl>"
      end #any comments present
      outfile.puts "<hr />"
    end # each post

    # html footers go here
    outfile.puts "</body>"
    outfile.puts "</html>"
  end # with open outfile
end #feed data present

No comments: