Experiment: Deleting a post from the Internet

Once you post something on the Internet, it is hard to get rid of it. As an experiment, I deleted one of my past posts, and I tried to remove all traces of it.

I selected my post about Technical Interview tips, because it is mildly popular, but never did very well. It was on reddit for only a couple of hours. Yet it regularly received a lot of hits from Google looking for interview tips for RIM. In my opinion the writing needed work, so I deleted it. Forever.

First, I removed it from my blog. I have a checkbox that says whether a post is shown or not. Unchecking it removes it from the main page, and whenever people try to see it, they get the main article listing instead.

RSS Reader caches

That wasn't good enough, because the article was still available in RSS readers. When Google reader retrieves my blog entries, it simply merges the updated ones with its own database. The atom specification does not define any way to delete posts, but it does allow updates. I had to put the post back, but remove its contents. Then, when the RSS reader did the merge, it would update its database to contain the empty post.

Google Cache

My post still appeared in Google, and you could read it by clicking on the cached link. To remove it from the Google cache, I had to make the page return a HTTP 404 error. I tried using the .htaccess file:

redirect gone /~smhanov/blog/?id=43 

Unfortunately this had no effect on my web server. Apparently .htaccess doesn't apply to php scripts. I had to physically change my blog software to return a 404 HTTP status if that entry is retrieved:

    if ( $_GET['id'] == '43' ) { 
        header("HTTP/1.0 404 Not Found");
        exit;
    }

Reddit

Comments about the post appeared on reddit. Since I was the original submitter to reddit, I have the option to delete it:

Clicking delete didn't work as advertised. You can still get to the post, but it is marked as [deleted]. This is a real problem on Reddit's part, because people might post something under the mistaken belief that they can remove it later. The button should be descriptive of what it will actually do. Software shouldn't lie.

Conclusion

The main text of the article is nowhere to be found. The problem is any comments or blog reactions will still be there, although they will have broken links. The experiment is a partial success.

The best way to hide your embarrassing past is to bury them with new things. For example, if you search for my name you won't find my Lion King fan-fiction anywhere in the first few pages of results.

Comments