AOL releases search histories of 500,000 users. Deliberately!

by Bob Crabtree on 7 August 2006, 15:07

Quick Link: HEXUS.net/qagig

Add to My Vault: x


Last week, AOL deliberately made available online the private search histories of half a million of its users. Not surprising - we think - the move caused uproar and resulted in the company pulling the data over the weekend.

Among the worries that have been expressed are:

* How easy it might be to identify some of the users, even though the information was supposed to be anonymous
* The possibility that google's recent successful court battle with the US government to keep such information private may be undermined

In addition, there's the very likelihood that spammers and other bad hats will make illicit use of the information. And that fear remains very real even though AOL removed the data because - surprise, surprise - it's still readily available after a whole bunch of mirror-download sites sprung up!

In a piece headlined, AOL Proudly Releases Massive Amounts of Private Data, TechCrunch gives AOL a thoroughly-deserved roasting, saying,

The utter stupidity of this is staggering. AOL has released very private data about its users without their permission. While the AOL username has been changed to a random ID number, the ability to analyze all searches by a single user will often lead people to easily determine who the user is, and what they are up to. The data includes personal names, addresses, social security numbers and everything else someone might type into a search box.

The most serious problem is the fact that many people often search on their own name, or those of their friends and family, to see what information is available about them on the net. Combine these ego searches with porn queries and you have a serious embarrassment. Combine them with “buy ecstasy” and you have evidence of a crime. Combine it with an address, social security number, etc., and you have an identity theft waiting to happen. The possibilities are endless.

The Paradigm Shift says that the data contains,

...hundreds of searches from people looking to kill themselves and even more scary are searches from users that seem to be looking to commit murder.

A blog at Caltech reckons it knows what motivated AOL's action,

In their desperation to gain recognition from the research community, AOL decided they would compromise their integrity to provide a data set that might become often-cited in research papers: "Please reference the following publication when using this collection..." is the message before the download.

The first indicator of what AOL had done looks to have come on the Geeking with Greg blog, subtitled, Exploring the Future of Personalised Information. But the author, Greg Linden, clearly hadn't considered the implications of AOL's actions. His Friday blog, A chance to play with big data, says simply,

...the new AOL Research site has posted a list of APIs and data collections from AOL.

Of most interest to me is data set of "500k User Queries Sampled Over 3 Months" that apparently includes {UserID, Query, QueryTime, ClickedRank, DestinationDomainUrl} for each of 20M queries. Drool, drool!

You know, just the other day, I was watching a Google Tech Talk where a researcher was lamenting the difficulty of getting access to big data. It is exciting to see two of the giants, Google and AOL, making this kind of data available.

Perhaps the most succinct take we've so far read came from our own web master. His email about this matter contains only a single link (to TechCrunch) and this one-liner,

Glad I'm not an AOL customer...

Thoughts? Share them with us in this thread in the HEXUS.community.

HEXUS.links

HEXUS.community - discussion thread about this news brief
TechCrunch - AOL Proudly Releases Massive Amounts of Private Data
Geeking With Greg - A chance to play with big data
The Paradigm Shift - AOL Search Data Shows Users Planning to Commit Murder
Caltech Blog - AOL Releases Search Logs from 500,000 Users
HEXUS.lifestyle.headline - New AOL video portal promises 45-plus channels



HEXUS Forums :: 11 Comments

Login with Forum Account

Don't have an account? Register today!
People use AOL? :p
It won't just be spammers and unsavoury types using this data. There'll be people who'll study it to see just what information can be obtained just from somebody's search history, so as to reveal just how much of a privacy breach giving up such a dataset can be.
I'm sorry but, why did AOL do that?

Blimey! well on the one hand it serves the users right for using AOL's browser (I'm assuming thats how the data was obtained in the first place, but hopefully AOL will recieve a nice juicey fine to teach them a lesson to be lame at thinking up their publicity stunts.
Steve
It won't just be spammers and unsavoury types using this data. There'll be people who'll study it to see just what information can be obtained just from somebody's search history, so as to reveal just how much of a privacy breach giving up such a dataset can be.

Steve,

Quite right but, in my view, if the people who have this data really cared about the bad hats, they'd only dole out the info to others they know can be trusted, rather than making it available for anyone to download.

That said, once the cat was out of the bag, all the bad hats who want the information would have been have been able to get hold of it, for sure, but likely would have had to pay for it, possibly quite large sums, and that just might have put some of them off.
Its profit, It will go to anyone who pays.