How to remove a web page from Google index and other search engines
Posted on January 1, 2009 at 5:20 am
So you have created a web site or a web page and you don’t want anyone else to be able to access it right? That’s a bit of a problem once Google, Yahoo, MSN, or some other search engine indexes it!
Once a web page or website is indexed, it can be found by anyone on the planet with an Internet connection. If you want to hide a page or website from search engines, you can do it in several ways.
I’ll try to walk you through the easier method first because it requires less technical knowledge. Basically, you can add a line of code to your HTML page or you can setup your web server to protect a file or directory.
Luckily, just about all search engines follow a web robots standard while crawling websites called Robots Exclusion Protocol. As a website owner, you can use the robots.txt file to give instructions to a search engine on what to index and what not to index.
So how does this work? It’s actually super simple! First, you create a text file called robots.txt using Notepad or any text editor. Now let’s say you want to block your entire website from being indexed by the search engines, so you would add these lines to your text file:
User-agent: * Disallow: /
The User-agent refers to the robot that is crawling your website, i.e. Google, Yahoo, etc. * means all robots. Note that a robot, such as a spam robot, can ignore your file altogether if it feels like.
Only use a robots.txt file to block content from being indexed by major search engines, not for hiding information. If someone comes to your website, a robots.txt file will not prevent them from accessing that webpage and viewing it. So just make sure you understand what the file does, it prevents your site from showing up in Google search results pages (Yahoo and MSN also).
You can also block directories or individual pages on your site using a robots.txt file instead of blocking the entire website. To block a directory, you could add the following lines:
Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /~secret/
Note that you only need to add the user-agent line once, unless you want each robot to get a different set of instructions. If you want to block a page, you could use this:
Disallow: /private_file.html
Also, check out the Help section at Google to learn more on how to create a robots.txt file. Once you have finished writing up the file, you just need to upload it to the root of your website so that it can be accessed as follows:
http://www.example.com/robots.txt.
The next time the robot visits your site, it will read the information and follow the instructions. If this seems too complicated, you can also block access to your website or webpage using META tags.
The noindex meta standard is also followed by all of the major search engines. To use it, you have to add a line of code to the HEAD section on the webpage. To prevent all robots from indexing a page on your site, add this line to the HEAD section:
<meta name="robots" content="noindex">
When Google or any other search engine sees that line on the page, it will automatically drop the page from the search results, even if other pages link to it.
So those are the two ways you can hide a page from Google and other search engines. If you are not able to get this to work, post a comment and I will try to help you out.
Also, check out my previous post if you are looking for a way to remove your name from search engines like Google, etc that are on other peoples websites. Enjoy!
» Filed Under Google Software/Tips
Related Posts
- Why you should stop worrying about avoiding the duplicate content penalty
- Windows Live Search Webmaster Center open to public
- Common Search Engine Optimization Mistakes
- A complete list of search engine friendly (SEO) WordPress plugins for your Blog
- Website Grader – A free search engine optimization (SEO)/marketing tool























Keep in mind that if you place a line in robots.txt, it might have the opposite to the desired effect because you’ll be announcing that the file exists.
For example if you put
Disallow: /topsecretfile.html
then (most) search engines will ignore it, but any human that loads your robots.txt will learn of it’s presence.
This method will work only if your web page is v new and is not already indexed by Google. If it is already indexed it won’t be deleted unless you go through the Google webmasters tools > link deletion route.
my $0.02
I want to remove a patient’s information from the website which otherwise will cause serious legal implications. Kindly do help
I am seeking for help. I was defamed and cyberstalked in the internet. 1)How can I totally remove a page of the cache from the search index? 2)Can I be able to remove the posted cache without letting the stalker/owner know?
Please help me!!!!
@Comment no 2 (Ankur Jain),
It will work even if the page is already indexed. However, using the Google’s webmaster tools will do it fast.
Refer last line of google’s help: http://www.google.com/support/.....swer=93710
Nice article, i will complete my robots.txt now
Please help us, someone made a blog on google blogger about myself and our company. When I clicked on the “report abuse” the blogger says it is freedom of speech, but it is libel pure and simple! please can you help us out, I saw the link about us on another blog and the admin of that blog removed it promptly but I saved it to my favorites so I have the link. It is not on the google search engines yet but I fear it might show up there. How can I get the whole page removed?? Please help me!
@Mrs G – Your best bet to get the person to remove content that is harmful to your reputation is to contact the Blogger legal team. Here is the link:
http://www.google.com/support/.....swer=76315
I went to the link you gave me, it says they will not remove libelous or defaming stuff without a court order! How do I get a court order if I don’t know for sure who the author is! They have it set up like a profile or blog,I am not sure! Please is there anyway to have this deleted
A reporter out of state posted a report about me regarding an incident that took place in his district. Some of the information was in accurate and incomplete; in addition there was no follow-up done. Currently, this 8 year old report is still listed # 1 in my name search. How can I address this? (it is embarrassing)
Does anyone know of an email contact info for escalating a matter of harassment via Google/Blogger. “Reporting” abuse does nothing and there is no follow up, it’s just click the box and no place to point out how the TOS has been violated. I can not find a contact email anywhere to escalate a matter.
Any help is appreciated