Experiment: Deleting a post from the Internet
Once you post something on the Internet, it is hard to get rid of it. As an experiment, I deleted one of my past posts, and I tried to remove all traces of it.
I selected my post about Technical Interview tips, because it is mildly popular, but never did very well. It was on reddit for only a couple of hours. Yet it regularly received a lot of hits from Google looking for interview tips for RIM. In my opinion the writing needed work, so I deleted it. Forever.
First, I removed it from my blog. I have a checkbox that says whether a post is shown or not. Unchecking it removes it from the main page, and whenever people try to see it, they get the main article listing instead.
My post still appeared in Google, and you could read it by clicking on the cached link. To remove it from the Google cache, I had to make the page return a HTTP 404 error. I tried using the .htaccess file:
Unfortunately this had no effect on my web server. Apparently .htaccess doesn't apply to php scripts. I had to physically change my blog software to return a 404 HTTP status if that entry is retrieved:
Clicking delete didn't work as advertised. You can still get to the post, but it is marked as [deleted]. This is a real problem on Reddit's part, because people might post something under the mistaken belief that they can remove it later. The button should be descriptive of what it will actually do. Software shouldn't lie.
The best way to hide your embarrassing past is to bury them with new things. For example, if you search for my name you won't find my Lion King fan-fiction anywhere in the first few pages of results.
RSS Reader caches
That wasn't good enough, because the article was still available in RSS readers. When Google reader retrieves my blog entries, it simply merges the updated ones with its own database. The atom specification does not define any way to delete posts, but it does allow updates. I had to put the post back, but remove its contents. Then, when the RSS reader did the merge, it would update its database to contain the empty post.
Google Cache
redirect gone /~smhanov/blog/?id=43
if ( $_GET['id'] == '43' ) {
header("HTTP/1.0 404 Not Found");
exit;
}
Reddit
Comments about the post appeared on reddit. Since I was the original submitter to reddit, I have the option to delete it:
Conclusion
The main text of the article is nowhere to be found. The problem is any comments or blog reactions will still be there, although they will have broken links. The experiment is a partial success.
I think it's worth remembering that it's very hard, if not impossible, to entirely remove something you've posted to the Internet; think carefully before publishing things you might later regret :)