The Union Metrics Blog

Archive for March, 2012

Introducing the new TweetReach Pro Ultimate plan

without comments

We’re happy to announce a new TweetReach Pro plan level for our larger enterprise, agency and media customers – TweetReach Pro Ultimate! This plan level is perfect for anyone managing multiple products, clients or accounts.

Our most comprehensive and personalized plan level, TweetReach Pro Ultimate comes with:

  • 50 Trackers
  • Access to TweetReach Back, our 30-day complete historical archive
  • A dedicated account manager to help you get exactly the data you need
  • Unlimited snapshot reports
  • Unlimited users and projects
  • API access

With 50 Trackers in your account, Ultimate subscribers will be able to monitor tweets about all of your campaigns, clients, products and events, in real time. Each Tracker can monitor unlimited tweets about your topic, including up to 20 distinct search queries to be sure we’re finding all relevant tweets.

TweetReach Back is our new historical analytics option. If you missed an important event or weren’t able to set up a Tracker before campaign tweets went out, we can go back up to 30 days and analyze all tweets about your topic. This is a more comprehensive option than our simple snapshot report, with no tweet limits and in-depth metrics like you see in a Tracker. Ultimate subscribers have access to up to 24 hours of TweetReachBack analysis each month.

A dedicated account manager will be available to answer all of your questions, from setting up tweet tracking, to interpreting metrics, to helping you improve next time.

The TweetReach Pro Ultimate plan is $2,500 per month. You can subscribe to the Ultimate plan here, and if you have any questions at all, please let us know.

Written by Jenn D

March 28th, 2012 at 8:55 am

Posted in News

Tagged with ,

New snapshot reports now available to everyone!

without comments

We recently began rolling out a new look – and some new metrics – in our snapshot reports. As of today, all snapshot reports are now in the new format. Isn’t it so much nicer?

There’s more information about the new snapshot report below, including a few frequently asked questions and explanation about the metrics and our calculations. Or, skip all that and run a new and improved report right now!

How much does the new report cost?

As always, the quick snapshot report (up to 50 tweets) is free. The full snapshot report (up to 1500 tweets from the past week) is $20. The price has not changed.

How is the new report different from the old report?

First, it looks different. Way different and way better. Second, we’ve added some new metrics (details on those below). We’ve moved a few things around, but we haven’t removed anything from the old version of the snapshot report. The new version is just smarter and prettier than ever before.

What new metrics are included in the new report?

There are three major new sections in the new version of the TweetReach snapshot report. They are the Activity, Top Contributors and Top Tweets sections, explained below. There’s a more detailed explanation of all the report metrics here.

  • Activity provides details about the tweets in this report, including a graphical timeline of when tweets were posted (times shown in UTC).
  • Top Contributors shows you the top three contributors – participants whose tweets appear in this report. You’ll see the highest contributor for each of three influence dimensions: highest exposure, most retweeted, and most mentioned.
  • Top Tweets shows the three most retweeted tweets in this report, showing retweet counts for each tweet.

Can I still see the old version of the report?

Yes, you can still access the old version. There’s a View Old Version link in the top right corner of the report.

So, how do I get one of these new reports?

Just go to and give it a try. Run a new TweetReach report for free right now!

Written by Jenn D

March 27th, 2012 at 1:40 pm

Posted in Features,News

Tagged with

Projects now available in TweetReach Pro

without comments

We’re excited to announce a new feature for TweetReach Pro subscribers – projects! Projects enable account holders to selectively share Trackers with their clients and colleagues, support multiple campaigns with one Pro subscription, and easily manage multiple users’ access.

You can use projects to:

  • Group related Trackers and snapshot reports together
  • Share select Trackers with clients or colleagues
  • Manage user access and permissions
  • Create guest access for one or more Trackers

There’s more detailed information about how to set up our new projects feature on our helpdesk. Projects are available in all TweetReach Pro plan levels.

Written by Jenn D

March 22nd, 2012 at 8:30 am

Posted in Features,News

Tagged with , ,

Twitter and the Polls: Tracking the Republican Primaries with TweetReach

without comments

Here in the United States, we’re right in the middle of the Republican primaries as the country tries to decide who the GOP nominee for President will be in our election later this year. One of the more interesting conversations around the 2012 Presidential election is the relationship between what people say on Twitter and what they do at the polls. Can we use Twitter conversations to predict election winners? Or, if they can’t predict results, what can tweets tell us about how potential voters feel about the candidates?

With Super Tuesday approaching and the GOP candidate field still wide open, we’ve been tracking tweets about the six top candidates for the Republican Presidential nomination since January 1 – Newt Gingrich, Jon Huntsman, Ron Paul, Rick Perry, Mitt Romney, and Rick Santorum. From those tweets, we built an interactive visualization of how Twitter talks about the GOP candidates, and how that relates to poll numbers over time.

Check out our interactive Republican primary Twitter tracker here or click on the screenshot below.

To create this visualization, we’re using a set of TweetReach Pro Trackers to track Twitter conversation about each of the candidates, along with our API to update the visualization daily. In the visualization, we’ve mapped the number of unique Twitter users talking about a candidate to the y-axis, polling results to the x-axis, and tweet volume to the circle radius. Polling data is from RealClearPolitics.

Written by Jenn D

March 5th, 2012 at 11:55 am

Posted in Events,Trends

Tagged with , ,

Exploring the Oscars with d3, Cassandra and the command line

without comments

This post is by Jerry Chen, our Lead Engineer. Look for more in-depth technical posts like these in our TweetReach Tech category.

Here at TweetReach we love data. But what we love more is making data understandable, useful and maybe even a little bit fun. When we saw all the amazing visualizations people have done with the d3.js library, we were inspired to do something with the millions of tweets that flow through our system every day. Fortunately, this amazing data-driven JavaScript framework does most of the heavy lifting and fluently speaks SVG and CSS. As a proof of concept, we put something together for the Grammys. It was a good first step, but we knew we could do much better for the Oscars.

And of course, we did! Check out the TweetReach Academy Awards Explorer.

On the other end of the stack, we were revisiting Apache Cassandra. Since last we took a look at the datastore, it graduated from Incubator, got counters, hit a 1.0 version milestone, and continued to capture the hearts (and columns) of millions. We knew our chart data would be broken down by a time component, so this project would be a great fit for Cassandra.

After a few sketches about what to show and how to show it, we decided to capture tweets containing any mention of the Oscars, and then break them down by a few categories and nominees. For each minute we would measure the volume about a particular nominee, and provide a slider so the user could view the exact volume at a particular minute in time.

Academy Awards Explorer Whiteboard

The Starting Point

But first, how is datta formed?

From the beginning

Our journey begins, as with many things on the Internet, with text. We wrote Flamingo to consume the Twitter Streaming API (and later on, Gnip PowerTrack). Incoming tweets get appended to an event log, and optionally resque jobs are scheduled based on subscriptions. Normally, we use the latter for our larger pipeline (which includes search, OLAP, contributor and reach calculations), but for this special project we fork the events log and stream it to a separate server.

For moving log files around, there’s Apache Flume, Facebook Scribe, and maybe even time-tested syslog (here’s a great post by Urban Airship), but in the spirit of getting the job done, we can get away with tailing over SSH (and maybe wrapping that in a screen session):

    nibbler$ tail -F /var/log/flamingod/events.log \
             | pv -l \
             | ssh -C parabox 'cat - >> /var/log/events.log'


(We use the capital -F flag for tail so to follow symlinks even if their destination changes, and pv is a great utility which will be explained shortly.) Meanwhile, on the destination server, we employ tail again and stream the events log into a ruby script which reads from STDIN, for the actual data insertion into Cassandra.

The schema is simple. For each tweet, we see if there are any matching terms. If there are matching terms, we extract the timestamp of the tweet, get it into its minute-resolution “time bucket” format (YYYYMMDDHHmm) and insert it into Cassandra. The schema ends up like this:

Optionally, we keep the available time buckets in a special super column called “index.” This is preferable to trying to list all the super columns under the row key. Thus, using the Ruby cassandra gem, an insertion looks like the following:



where i64() is a function that packs 64-bit unsigned integers, which in this case is the tweet ID.

To get the volume at a given minute, count the columns:

    >> client.count_columns(:volume,"hugo","201202241201",:count=>MAX_COLUMNS)


The default :count is 100, so if we have a magnitude greater than 100, it’ll get capped. I’ve set MAX_COLUMNS to something high like 999999.

Streaming Insertions with Ruby

The actual processing task is straightforward, but the script is optimized to do the least amount of work possible. This is the key to high-throughput: don’t waste your time and if you can correctly get away with skipping a line, get away with it. Based on the nominees/terms we’re filtering out, we define the group of regular expressions to match against, and then combine them, e.g. [/hugo/, /artist/] becomes /hugo|artist/. Using the group regular expression as a first pass means not having to parse JSON unless we absolutely must.

The crux of the code uses tweet.created_at (e.g., "Thu Mar 06 10:26:58 +0000 2008") to determine the time bucket, e.g. "200803061026". Since consecutive tweets are likely to be close in time, and perhaps in the same time bucket, we take the substring of created_at timestamp up to the minute and memoize the time bucket. In other words, if both the current and last tweets had created_at strings beginning with "Thu Mar 06 10:26", then skip parsing the timestamp and reuse the last time bucket. While this may seem like a micro-optimization, it’s with this mindset that we can maintain a processing rate of hundreds of tweets per second.

How do we measure performance? We could use Ruby’s Benchmark module and measure timing between various points. For a larger picture by way of throughput, we write the insertion script to consume STDIN and combine use the incredibly handy utility called Pipe Viewer, which provides information like throughput about anything that’s being piped:

    $ pv -l event.log | ruby insert.rb
    26.3k 0:01:27 [303.4/s ] [===============>              ] 0:01:30


In this example, pv starts off by counting the lines (-l), and then keeps track of lines seen, the duration and the rate. So far, 26k lines have been processed at a rate of 303k/s, and pv estimates about 1m30 left.

It also works in streaming mode, which is how we use it with a live stream of tweets:

    $ tail -F event.log | pv -l | ruby insert.rb
    26.3k 0:01:27 [303.4/s ] [                <=>                   ]


Meeting in the Middle

Once we have the data in Cassandra, how do we get it out and onto a webpage? If we’re in Ruby, a sane stack might be a Sinatra or Rails app that serves well-formatted JSON right from Cassandra. Given the static nature and finite data set of the visualization though, it was easier to write a script to generate JSON — nay, pure JavaScript! — that provided the series data in a global variable.

While JavaScript code generation in Ruby may seem inelegant, sacrilege or downright insane, working with static files meant being able to initially populate the frontend with dummy data, and figure out the format required by d3. In parallel, we determine how to retrieve the correct data from Cassandra, and finally, generate it in the format needed by d3. Luckily we had last year’s dataset to work with as well, which became integral in the testing and sanity check step.

All in all, it was a whirlwind expedition with two great pieces of open source — Cassandra and d3 — the latter of which deserves its own blog post. Cassandra took a hearty portion of memory but barely broke a sweat handling both insertions and queries.

The Finished Product. Click to Try it out.

Oh and by the way, if you want to build visualizations like this or wrangle terabytes of data, we’re hiring!

Written by jerry

March 2nd, 2012 at 11:02 am