Where Is The World's Data Being Stored?
An interesting infograph made by the people over @Mozy, showing us the main data hubs in the world. Nothing new here, and I'm glad about that, I don't like surprises, especially in this field. ;-)
An interesting infograph made by the people over @Mozy, showing us the main data hubs in the world. Nothing new here, and I'm glad about that, I don't like surprises, especially in this field. ;-)
...well, not violence, we have plenty of that. But of Social Networking. This thing's been around since the Interneat began, aeons ago :D. And Mark Suster from Caltech made a very nice presentation for all of us lazy bloggers to share with you. So here it is:
A great presentation done by Euan Semple and Alan Moore. You need to check this out! ;-)
I always wanted to take part in such an event and now my dream came true. It's one of those things that you can't help but tell everyone around you how cool it will be and so on. But enough talk, I'm a man of a few word (especially when they need to be more). Suffice to say I'm in select company, only 500 others will be attending.
In case you're curious about what will be going on, here it is:
9:00 AM Conference Check In Opens & Caffeine
Check in at the Venue / Lobby of the Crystal Palace. Grab a coffee and a bite to eat before heading into the first session.
10:00 AM Auditorium Doors Open
Join hosts in the Auditorium for a non-threatening ―get to know you warm-up. Think tai qi for the brain.
10:15 AM TEDxBucharest Session 1 TODAY IS a GOOD DAY TO CHANGE YOUR PERSPECTIVE
Musical performance for a good day. Featuring TEDTalks and live talks from Catalin Stefanescu, Radu Tatucu, Eric Weiner and Adrian Stoica.
11:30 AM Take a Breather!
Hot coffee, flavored tea, cold water and delicious sweets. All waiting for you while you take time to grasp the ideas and meet the audience. Enjoy, yes you are at TEDxBucharest and it is a good day!
12:15 PM TEDxBucharest Session 2 TODAY IS a GOOD DAY TO CHANGE YOUR THINKING
Energy from Drum Cafe followed by live talks from Adrian Bejan, Andrei Rosu, Raluca Ioana van Staden and Magnus Scheving.
1:30 PM Feed Your Tummy
TEDxBucharest wants you to stay energized, so plan to enjoy lunch with your new friends at TEDxBucharest. Search for people with similar passions; discover people who could become your new partners in meaningful conversation. Check out gifts shelf, as well as Bloggers Alley to share meaningful conversations.
3:00 PM TEDxBucharest Session 3 TODAY IS a GOOD DAY TO CHANGE YOUR LIFE
Energy from Drum Cafe followed by live talks from Mihai Panaitescu, Nik Halik and Arnoud Raskin.
4.30 PM Take a Breather!
No doubt by now your mind is on overload. Take a minute to stretch your legs, share some ideas, and check out what’s going on in the lobby. Don’t forget to get a snack and a fresh drink.
5:00 PM TEDxBucharest Session 4 TODAY IS A GOOD DAY TO CHANGE YOUR SURROUNDINGS
Head over to the amazing performance of KOTKI Visuals and live talks from Roland Hermann and Oana Pellea.
Make tomorrow a good day!
This Wednesday, Warden will make Friend, Fan page and name data from hundreds of millions of Facebook users available to the academic research community. It's a move that Facebook has to have seen coming, a move that many in the data-centric community have been calling on the company itself to do for years, and an event that's been complicated by Facebook's recent privacy policy changes, which have muddied the waters of right and wrong but rendered even more data available for outside analysis.
If what people call Web 2.0 was all about creating new technologies that made it easy for everyday people to publish their thoughts, social connections and activities, then the next stage of innovation online may be services like recommendations, self and group awareness, and other features made possible by software developers building on top of the huge mass of data that Web 2.0 made public. It's a very exciting future, and Warden is about to fire one of the earliest big shots in that direction.
Nerds in Space: Social Graph Analysis For Solving Large-Group Problems
Warden studied Computer Vision in college in the U.K., then got into game development. After moving to L.A., he spent six years building graphics drivers for the original Playstation and the XBox. Then he started his own independent business, where, thankfully, he open-sourced much of his work (something he's still doing today).
When he found out that starting his own business wasn't going to work with his immigration status, he was very fortunate to have also caught Apple's eye with the software he had been releasing to the public. Apple bought his company in order to bring him on board. The proceeds of that small sale are now sustaining his next project after going independent again.
After spending five years at Apple struggling to navigate the maze of people and connections and types of expertise in order to get the information he needed, Warden decided to go independent and build a company that solved exactly that kind of problem. "I can't think of a better big company to work for, but it was still a big company," he says. "It was hard to find the right people to talk to, whether for particular expertise or for contacts at external companies." And so Warden left Apple to build a company that would use social graph analysis to solve problems like that. He called the company Mailana, a play on "mail analysis" since he was initially focused on email social graph analysis.
We've written here a number of times about Mailana's tool that analyzes the social graph of any Twitter user. Enter the username of someone on Twitter and Mailana will show you which 20 other people the user has exchanged the largest number of reciprocal public @ replies with. Find someone interesting or important? Mailana's Twitter analyzer will tell you who they most regularly interact with. See, for example, The Inner Circles of 10 Geek Rockstars on Twitter.
Pulling Down the Facebook Social Graph
Now Warden is about to unveil a much larger project along the same vein. For the past six months he's been crawling public profile pages on Facebook. He now has more than 215 million of them indexed and updated about once a month. When he began he was using the Web crawling service 80legs, but over time he had to build his own crawling infrastructure.
When I talked to him this afternoon, he had already begun uploading 100 GB of user data onto his server to make it available for academic research starting on Wednesday. Warden says he's removed identifying profile URLs but kept names, locations, Fan page lists and partial Friends lists. All those fields of data are just waiting to be analyzed and cross referenced. That's one very rich resource.
Yesterday Warden posted some of his own initial observations from the data on his personal blog. Those included:
- In almost every state in the Southern U.S., God is number one most popular Fan page among Facebook users. Among people in the L.A., San Francisco and Nevada regions? "God hardly makes an appearance on the fan pages, but sports aren't that popular either," Warden writes. "Michael Jackson is a particular favorite, and San Francisco puts Barack Obama in the top spot." In the Oregon and Idaho region? Starbucks is number one.
- In the Mormon-influenced areas of Utah and Eastern Idaho, the most popular Fan pages are The Book of Mormon, Glen Beck and the vampire book Twilight, which was authored by a Mormon.
- The bulk of Warden's posted analysis yesterday was about location networks. People in the western U.S. tend to have Facebook friends all over the country; people in the southern U.S. tend to mostly be friends with people who have remained in the same area.
Taking a Deeper Look
These observations are interesting, but they are only the beginning of what's possible. Name, location, friends and interests are great data points to analyze. Warden has written a program that will estimate gender as well, based on names. All these data points can be cross-referenced with outside data, too. Members of Facebook's own staff did this kind of analysis when they compared user last names to U.S. Census data, which allowed them to estimate changes in Facebook's racial composition over time based on the likelihood of people with particular last names to report a particular racial backgrounds.
"I'm mostly thinking 'What do I try first?'," Warden says. "There's so many interesting ways to slice the data - especially as I'm starting to get changes over time. I'm also trying to map out political networks in aggregate; how polarized the fans of particular politicians are - so how likely a Sarah Palin fan is to have any friends who are fans of Obama, and how that varies with location too. One of my favorite results is that Texans are more likely to be fans of the Dallas Cowboys than God."
Warden says he hasn't talked to anyone from Facebook since he started crawling the site, but he did get an email from someone on the security team asking him to take down instructions he'd posted that exposed a security hole that made harvesting peoples' email addresses easy. So the company is paying attention. "I'd love to see them put me out of business by putting decent data out there," Warden says. He says his Amazon Web Services bill was over $5,000 last month.
Why is he indexing all this content and why is he going to hand it over to the academic world later this week? "I am fascinated by how we can build tools to understand our world and connect people based on all the data we're just littering the Internet with," Warden says.
"Nobody thinks about how much valuable information they're generating just by friending people and fanning pages. It's like we're constantly voting in a hundred different ways every day. And I'm a starry-eyed believer that we'll be able to change the world for the better using that neglected information. It's like an x-ray for the whole country - we can see all sorts of hidden details of who we're friends with, where we live, what we like."For a great example of the kind of social impact that data analysis can make, Warden points to some of the fascinating ways that GIS data is illuminating the intersection of race and public services. Data has shed light on social injustices for decades, and measurable information about the interactions of hundreds of millions of people every day on Facebook offers opportunities to discover both good and bad news about the contemporary human condition.
Warden says he's not yet been able to interest any investors in his ideas for businesses based on this data, so his girlfriend Liz Baumann, a former insurance actuary, stepped in to help and is now running much of the crawling. He says he's now focused on "working on ways of presenting all this information in a form that answers questions for people willing to pay." His first experiment along those lines is the very interesting FanPageAnalytics.com.
What does Pete Warden hope for from this week's public release of all this Facebook data? "Hopefully I'll get to see a bunch of interesting [academic research] papers come out of it, worst case. And I'd like to be the guy people turn to when they need stuff like this."
Already well-respected among a fringe group of bleeding-edge geeks, we hope that Warden's work on social graph analysis will end up impacting a far larger number of people than may ever know his name.
I wrote in another post on another blog that social media might end up being a curse. It might also become something of a blessing, especially for us busy people, who work for corporations, with tenuous work hours, who have to shop and pay the bills after work (if possible) and so on and so forth. Why? In part, these social networks usually attract smart people (score 1-0 social media) which means there's a greater chance of finding people with similar ideas and interests than in "real" life. I call that the old way. And I think that's what it is. Needless to say, I do not excel at the old way - hence why I blog :-D. Another reason might be the information part. Using the old way you need several incounters (dates) to get to know somebody, see how they are, how they behave, what they like etc. In the online medium, you can check out their profile, and have a pretty decent estimate if you want to deal with them from then on. You don't waste time (score 2-0 for social media). And another point, which I deem important, you can get reacquainted with old friends, meet new ones from all over the world, and have a better way to manage your activities than in the old way.
Why do I use social media? Besides getting to know people, I get info about gigs, plays, movies, showtimes and can manage my calendar. I get my daily dose of blogging (wow, really? - Yes!), both posting and reading. I get the news, I watch vids and listen to music, basically I lead my life the way Google would want me to, online :-) For some this is a bad thing, I know quite a few of those people. But they don't interest me since they have nothing in common - they are "the others". By the time they'll realize social media's importance I'll be a guru. ;-)
So, this is an empathic analysis on why I use social media networks. It's a little flawed, since I didn't write anything about it's business potential. I leave that for another post. For now I think this suffices.