Why you should stop worrying about avoiding the duplicate content penalty为什么你应该不用担心,避免重复处罚内容

Posted on September 21, 2007 at 8:47 am张贴于2007年9月21日在上午8点47分

Welcome to Online Tech Tips , a great resource for computer tutorials, technology news, software reviews, and personal computing tips.欢迎网上高新技术小费,丰厚的资源,计算机技术指南,科技新闻,软件审查,以及个人电脑的提示。 If you're new here and you like the content, you can subscribe to my如果你是来这里和你喜欢的内容,你可以订阅我的 RSS feed rss的饲料 to get daily tips.获得每日小费。 Thanks for visiting!感谢来访!

Ok, so it seems like everyone and anyone starting a blog or "optimizing" their blog is concerned about duplicate content penalties from Google and so have devised a an entire slew of remedies from adding all kinds of disallow statements  to their robots.txt files to installing SEO-optimized duplicate-content-curing plugins for WordPress, etc.好吧,我们就这样,好像每一个人和任何人起博客或"优化"他们的博客是关心的内容重复处罚,由谷歌等设计了一个整一系列的补救措施,从加入各种不允许报表,以自己的robots.txt文件安装徐优化重复内容固化插件wordpress的,等等。

And I’m no special person, I’ve got over 30 lines in my robots.txt file to block Google from my WP- folders, my archive pages, my tag pages, and lots more!和我没有什么特别的人,我还有超过30线在我的robots.txt文件来阻止谷歌从我的可湿性粉剂-文件夹,我的档案页,我的标签页,及其它更多! I also have the SEO WordPress plugin installed that helps prevent "supplemental results" by adding the NOINDEX meta tag to my category and archive pages.我也有徐wordpress的插件安装,从而可以有效防止"补充结果" ,加入了noindex meta标签,以我的职类和档案页面。 Basically, the only pages that I allow Google to access are the actual permalinks URLs for my posts and my static pages.基本上,只有页面,我让谷歌访问是实际permalinks网址为我的职位和我的静态页面。

That’s it!这就是它! Nothing else!什么都没有了! If you perform a site:www.online-tech-tips.com search in Google, you’ll see it’s just my articles and nothing else.如果你演出地点: www.online科技tips.com搜索谷歌,你就会看到它的只是我的文章,什么都没有了。

谷歌网站

Now when I first implemented this, I thought that I was doing something that would help my rankings in Google considering it would be avoiding getting thrown into the supplemental results.现在,当我第一次执行这个,我还以为我是做一些事情,将有助于我国排名谷歌考虑,它会避免越来越扔进补充的效果。 However, over the last few months, I’ve been asking other bloggers like不过,在过去数个月来,我一直在问其他博客一样 Lorelle lorelle and Amit是amit about what kinds of steps they have taken to prevent duplicate content and was shocked by the responses.关于什么样的步骤,有措施,以防止重复的内容和震惊的反应。

Here was Lorelle’s response to my question: 这里是lorelle的回应我的问题:

Do I? 我? Or does WordPress.com? 还是wordpress.com ? This is a WordPress.com blog. 这是一个wordpress.com博客。 You’ll have to talk to them about their robots.txt. 你必须跟他们谈他们的robots.txt 。

The duplicate content issue is one that bloggers have taken WAY out of control. 对重复内容的问题,是一个博客已采取的方式失去控制。 Duplicate content is natural on blogs. 重复的内容,是很自然的就博客。 Don’t stress over it. 不应力超过它。 The issue is related specifically to evil doers who use duplicate content for their splogs, and stealing content from other blogs or copying content from their splogs across to their other splogs. 问题是具体有关邪恶者的人使用复制的内容,为他们的splogs ,并窃取内容,从其它博客或复制的内容,从他们的splogs跨越它们其他splogs 。 It’s to tackle the evil, not the normal blogger. 它的打击邪恶,而不是正常的博客。

For some reason I was thinking that such big bloggers would have been all over these "issues".由于某些原因,我以为这么大的博客会被所有这些"问题"的报告。 So I decided to perform a site: search on a couple of big name blogs like ProBlogger.net, CopyBlogger.com, Lifehacker.com, and SEOMoz.com .因此,我决定将演出地点:搜索对一对夫妇的大名称博客像problogger.net , copyblogger.com , lifehacker.com , seomoz.com Well it was pretty interesting what I came across.以及这是相当有趣的是什么我碰到。 All of these sites get thousands of visitors a day from the search engines and yet just about everything is indexed by Google including archive pages, category pages, tag pages, and comments!所有这些土地得到了数以千计的参观者,每天从搜寻引擎,但只是一切索引谷歌包括档案页,分类页,标签页,并评论!

So after doing this, I became even more curious as to whether my 30 line robots.txt is really necessary!因此,经过这样做,我变得更加好奇,至于是否我的30线robots.txt的,是十分必要的! What kind of robots.txt file are these guys using?什么样的robots.txt文件,是这些家伙用? So here’s what mine looks like as of right now:因此这里的是什么雷看起来像有权现在:

User-agent: Googlebot 用户代理: googlebot
Disallow: */feed* 批驳: * / *饲料
Disallow: */rss* 批驳: * / *的rss
Disallow: */trackback* 批驳: * / *跟踪
Disallow: */wp-admin 批驳: * /可湿性粉剂管理员
Disallow: */wp-content 批驳: * /可湿性粉剂内容
Disallow: */wp-includes 批驳: * /可湿性粉剂-包括
Disallow: *wp-login.php 批驳: *可湿性粉剂- login.php
Disallow: */20* 批驳: * / 20 *
Disallow: */comments* 批驳: * / *评论
Allow: */category/*/page/* 允许: * /类别/ * /页/ *
Disallow: /page* 批驳: /页*
Disallow: */search* 批驳: * / *搜索
Disallow: */?s* 批驳: * / ? s *
Disallow: */?p* 批驳: * / ? p *
Disallow: */index.php?p* 批驳: * / index.php吗? p *
Disallow: /*.php$ 批驳: / *. php元
Disallow: /*.js$ 批驳: / *. js元
Disallow: /*.inc$ 批驳: / *.公司$
Disallow: /*.css$ 批驳: / *.的css元
Disallow: /*.gz$ 批驳: / *广州元
Disallow: /*.cgi$ 批驳: / *.的cgi元
Disallow: /*.wmv$ 批驳: / *.的wmv元
Disallow: /*.cgi$ 批驳: / *.的cgi元
Disallow: /*.xhtml$ 批驳: / *.的xhtml元
Disallow: /*.php* 批驳: / * php *
Disallow: */trackback* 批驳: * / *跟踪
Disallow: /*?* 批驳: / * ? *
Disallow: /z/ 不让: /的z /
Disallow: /wp-* 批驳: /可湿性粉剂- *
Disallow: */tag/ 批驳: * /标签/
Disallow: */stats* 批驳: * /统计*
Disallow: */cgi-bin* 批驳: * / cgi - bin目录*
Allow: /wp-content/uploads/ 允许: / wp-content/uploads /

User-agent: Googlebot-Image用户代理: googlebot形象
Allow: /*允许: / *

Sitemap:网址: http://www.online-tech-tips.com/sitemap.xml

Now let’s take a look at a few from the big bloggers!现在,让我们来看看几个从大博客! So here’s what the robots.txt file looks like for the following sites:所以这里的什么robots.txt文件看起来像用于下列地点:

Problogger.net problogger.net

User-agent: * 用户代理: *
Disallow: 批驳:

LifeHacker.com lifehacker.com

User-Agent: Googlebot 用户代理: googlebot
Disallow: /index.xml$ 批驳: / index.xml元
Disallow: /excerpts.xml$ 批驳: / excerpts.xml元
Allow: /sitemap.xml$ 允许: / sitemap.xml元
Disallow: /*view=rss$ 批驳: / *查看=美元的rss
Disallow: /*?view=rss$ 批驳: / * ?鉴于=美元的rss
Disallow: /*format=rss$ 批驳: / *格式=美元的rss
Disallow: /*?format=rss$ 批驳: / * ?格式=美元的rss
Sitemap: 网址: http://lifehacker.com/sitemap.xml http://lifehacker.com/sitemap.xml

CopyBlogger.com copyblogger.com

User-agent: * 用户代理: *
Disallow: /*/feed/ 批驳: / * /饲料/
Disallow: /*/trackback/ 批驳: / * /跟踪/

TechCrunch.com techcrunch.com

User-agent: * 用户代理: *
Disallow: /*/feed/ 批驳: / * /饲料/
Disallow: /*/trackback/ 批驳: / * /跟踪/

Mashable.com mashable.com

User-agent: * 用户代理: *
Disallow: /feed 批驳: /饲料
Disallow: /*.xml$ 批驳: / *.的xml元
Disallow: /*/feed/ 批驳: / * /饲料/
Disallow: /*/trackback/ 批驳: / * /跟踪/

Ok, so as you can see from the above list, EVERYONE’s list is a hell of a lot shorter than mine and my list was created by reading through all kinds of posts talking about how everything must be blocked or disallowed.好吧,我们就这样,你看,从上述名单中每个人的名单,是一个地狱的很多少于矿井和我的名单是由读通过各种岗位谈到如何,必须尽一切阻挠或不获批准。 Well, obviously if the top bloggers are not worrying about duplicate content than why should I be!那么,显然,如果高层博客并不担心重复的内容比,我为什么要! Actually, it seems like maybe it’s even helping them in some kind of way.实际上,它似乎想也许它的,甚至帮助他们在某些种方式。

So before you go installing lots of plugins that prevent Google from indexing your site completely, remember two things:所以,您在安装了很多插件,防止谷歌从索引你的站点完全,记住两件事:

1. 1 。 Doesn’t seem like any of the really popular blogs are doing anything about it and似乎并不像任何一个真正受欢迎的博客做任何事,它与

2. 2 。 The supplemental results database no longer exists in Google anyway!补充成果数据库不再存在,在谷歌!

My next step is to remove all of my the disallow statements from my robots.txt file and see what happens!我的下一个步骤是清除所有我的该机构禁止报表从我的robots.txt文件,看看会发生什么! Any one else try this yet?任何人都试试这个吗?

Also, another observation that may be obvious, but warrants a mention is the fact that all of these people write GREAT content and a LOT of it.此外,另一种看法认为,可能是显而易见的,但值得一提的是一个事实,即所有这些人写出伟大的内容并提供不少。 So you can do all the optimizing you want, but unless you have really good content that people will link to, bookmark, and visit again, it’s not really going to matter!所以你可以做的所有优化你想要做的,但除非你有真正好的内容,人们将链接到,书签,并参观再次,它不是真的要的事情!

Tell me what you think in the comments!告诉我你的想法在评论! ; )

Technorati Tags: technorati的标签: , ,


If you enjoyed this post, make sure you 如果你享受这个职位时,要确保你 subscribe to my RSS feed 订阅我的rss饲料 !

» Filed Under »存档下 Blogging博客

Related Posts相关职位

4 Responses to “Why you should stop worrying about avoiding the duplicate content penalty”四反应, "为什么你应该不用担心,避免重复内容罚款"

  1. Siddharth siddharth said on : 说:

    One question regarding duplicate content please ?一个问题,对于第二条的内容好吗?
    I write for some more sites我写的部分更多土地
    especially techtoday one of my really good friend尤其是techtoday我的一个真正的好朋友
    I need to ask that I directly copy and paste from my site to his我要问,我直接拷贝和粘贴,从我的网站,以他的
    SO will it panelize me or him??????所以将它panelize我还是他??????
    thx的thx :-)


  2. akishore said on : akishore说:

    Well it depends.以及它有赖于此。 If you write the content on your site and immediately post it on his site, the site that will be penalized will be the one that Google indexes LAST.如果你写的内容对你的网站,并立即邮寄对他的网站上,该网站将受到惩罚将是一个谷歌指标上。 So if the Google bot indexes your Page1.html, let’s say, first and then goes to his site and see the same content, his site will be penalized.所以,如果谷歌bot等指标,你page1.html ,让我们说,首先,然后去他的网站上看到同样的内容,他的网站将受到惩罚。 But if it’s the other way around, you will be penalized.但如果它的另一条路周围,你将受到惩罚。

    Basically, the content should only be on one person’s site because no matter how you do it, only one will be in the main index.基本上,内容应只对一个人的地盘,因为不管你怎样做,只有一个,将在主要指数。


  3. Siddharth siddharth said on : 说:

    hmm hmm的
    I immediately post in his site我立刻后,在他的网站
    So wht if I do a bit of change in that article and then post it??????所以西隧,如果我做一点改变,在这篇文章中,然后邮寄回??????


  4. akishore said on : akishore说:

    Your changes should be significant, minor changes won’t really help.你的修改应显着,小的变化,将不能真正帮助。 Actually, it would be much smarter to write the article and have it posted on ONE site and then have the other site link back to that article with good keywords in the link.实际上,它将会大大聪明,写文章,并已张贴在其中一个地点,然后再有其他网站链接回条具有良好关键词,在联系汇率制度。 That way both sites will be getting high quality back links, which is one of the most important factors in Google’s ranking algorithm.这样,这两个网站将得到的高品质回链接,这是其中一个最重要的因素谷歌的排名算法。 Don’t worry about having the content on both sites.并不担心过的内容就这两个网站。


    Please post your comments/suggestions!请后,你的意见/建议!