• Categories
    • Tutorials
    • Beginners Guide
    • WordPress News
    • WordPress Security
    • Best WordPress Plugins
    • WordPress Themes
    • Product Reviews
    • WP Tips & Tricks
  • Guides
    • Start a Blog
    • Make a Website
    • WordPress Hosting
  • WordPress Hosting
    • A2 Hosting
    • HostGator
    • Bluehost
    • Cloudways
  • Managed Hosting
    • WPEngine
    • Rocket.net
    • WPX
    • Kinsta
  • Coupons
    • WPEngine
    • Flywheel
    • Cloudways
    • A2 Hosting
    • WPX Hosting
Beginners Guide

The WPKube Guide to Content Scraping in WordPress

Last Updated on: November 29, 2013 Joe Fylan Leave a Comment

The WPKube Guide to Content Scraping in WordPress

Content scraping is essentially the act of copying the content from one site and publishing it on another.  If you are publishing content online then there is a good chance that you have been a victim of content scraping at some point.

Content scraping is usually carried out in one of two ways. One popular method is to use a content scraping bot that has been created to search the internet looking for relevant content, and then scraping it or copying it, before publishing it on another website. Another approach is to manually search for content, copy it and then publish it elsewhere.

However, for the victim of content scraping the end result is the same and their content ends up published elsewhere without permission and usually unaccredited to the original author.

As Google and other search engines reportedly don’t like to list the same piece of content in their database more than once, if your content gets scraped, then you run the risk of not being listed in the search engine results pages, despite the content rightfully belong to you. Not only does someone take the credit for your hard work, they also end up standing a good chance of taking the readers and visitors that would’ve made their way to your site via a Google search.

Content Scraping Lego

Why Do People Scrape Content

At the most superficial level, the main reason for carrying out content scraping is to add content to a site with minimal effort. By using an automated content scraping service, unscrupulous webmasters can quickly build out a site with thousands of pages in a very short space of time and with very little effort involved.

One of the reasons why they might do this is to effortlessly create a site that gets lots of traffic via the search engines. As in most cases traffic equals money, there is a good incentive to attempt this. The traffic to the site can then be used to build an email mailing list which can then be used to promote products, display pay per click ads from networks like Google AdSense or advertise products using an affiliate program such as Amazon Associates.

Another reason why people might scrape content is to claim credit for other people’s work, in an act which is also known as plagiarism. While the above reasons related to making money online from content scraping might take place on a massive scale, copying content from multiple sites on a daily basis, this reason for doing it might involve a more selective approach.

Individuals or small business have been known to selectively scrape content on a manual basis, cherry picking the best articles from a site as they find them, in order to boost their credibility and appear an expert on a particular topic. Appropriating other people’s content for portfolios is a common example of content scraping, where the content can then be used to gain clients and work. This content could take the form of images, written content or any other types that can be published or distributed online.

How to Check if You are a Victim

Many victims of content scraping are blissfully unaware of the fact. However, by using WordPress, the chances of you discovering it taking place are greatly increased.

By making use of the WordPress pingback and trackback functionality, you will get a notification when someone publishes content that links back to your site. This only happens if they content they scrape contains links to your site, which is another good reason to interlink your content, while it won’t stop it from happening, it can be a good way to be notified after the fact.

However, its best to ensure your installation of WordPress isn’t setup to publish these trackbacks on your site as you will be publishing a link to the offending site. To find out how to disable publishing your trackbacks and pingbacks on your WordPress site, read our post on How to Deal with Trackbacks and Pingbacks in WordPress

Another option is to use Google, or use another search engine, to search for your content online. By copying and pasting the title of your post, or a whole sentence into the search engine, surrounded by quotes, such as “WPKube Guide to Content Scraping” you can view all the pages indexed in the search engine that contain that exact phrase. As long as the phrase you search for is fairly unique, then any results returned are worth investigating to see if your content has been scrapped.

Content Scraping Crime Scene

How to Prevent Content Scraping

There isn’t much you can do to prevent content scraping from taking place. There are some anti-content scraping WordPress plugins available as well as commercial services that you can sign up to help dissuade scrappers from targeting your site. Some plugins work to make sure that once your content has been scraped, you can still try and ensure you get a credit for it once it has been republished elsewhere. Some plugins to consider include:

  • Anti Feed-Scraper Message: this free plugin adds some text and a link to each of your posts in your RSS feed, where the bot is likely to be sourcing your content from, attributing the author and a link back to your site.
  • Copyright Proof: this plugin works with the Digiprove service to ensure that there is a record of your site being the rightful owner of the content you create and publish.
  • WordPress Data Guard: block the IP addresses of those you suspect are stealing your content, preventing them from accessing your site.
  • DCMA Protection Badge: this free plugin allows you to easily insert anti-scraping badges on your site that might help dissuade scrapers from targeting your site, although it’s no guarantee.

Once it has taken place you do have a couple of limited options. One such option is to invoke a DCMA takedown. This service works in line with the Digital Millennium Copyright Act and for a fee, will attempt to get your stolen content taken down. However other than getting in touch with the website owner or their host and stating your case, there aren’t really any other options.

Content Curating vs. Content Scraping

Content curating is a popular method of publishing that if done incorrectly could see you inadvertently becoming a content scraper. Content curation can be described as the practice of sharing content with others. This can be in the form of a Tweet or a creating a top 10 list on your blog of must read articles.

Some lists of curated content feature an excerpt from the source material along with the link back to the original site. While this is in most cases acceptable, it is essential that you properly attribute the author and the original source. Good content curation sees the curator adding value to the reader in some way such as by highlighting a key point or giving their take on the topic.

Conclusion

Content scraping will continue to take place for as long as the efforts of those doing it are rewarded. Until Google and the other major search engines become sophisticated enough to determine what the original source of an article was, and not list the unauthorised publisher prominently in their listings, sites with stolen content will continue to thrive.

While there are steps you can take to minimise the chances of it happening to you, while also ensuring your stolen content is still attributed to you in some way, at the end of the day, the fate of your content is out of your hands.

When it happens to you, the best approach is to remember the saying that imitation is the best form of flattery and then get back to creating the best content you can. By building a community around your site and making a name for yourself in your niche, you can ensure that you benefit from creating great content, even if others try to piggyback your efforts and dishonestly gain from your hard work.

Images: Lego / Crime Scene

+ Share
Disclosure

Joe Fylan

Joe Fylan loves using WordPress to create websites and enjoys writing about this topic for a number of blogs. If you'd like to work with Joe, visit his website today.

Related Posts

Back to all articles
  • Managed WordPress Hosting Options 2016

    8 Best Managed WordPress Hosting Providers for 2023 Compared

  • 10 Best WordPress Hosting Options for 2023 (Pros & Cons)

  • Best Cheap WordPress Hosting

    8 Best Cheap WordPress Hosting Providers in 2023 (From $1.99)

Coupons

View more deals
  • Recipe Card Blocks Coupon
    15% OFF

    Recipe Card Blocks Coupon

    Running a cooking or food website can be fun (and tasty) – but
    Get This Deal
  • WP 2FA Coupon
    20% OFF

    WP 2FA Coupon

    Security should be at the forefront of all site owner’s min
    Get This Deal
  • Themskingdom Coupon
    20% OFF

    ThemesKingdom Coupon

    First impressions count. As such, you’ll want a WordPress t
    Get This Deal

Leave a Reply Cancel reply

Full Disclosure This post may contain affiliate links, meaning that if you click on one of the links and purchase an item, we may receive a commission (at no additional cost to you). All opinions are our own and we do not accept payments for positive reviews.

THE BEST OF WPKUBE

Some of the best content we have published so far.

BEGINNER GUIDES & REVIEWS

18 Best Cheap WordPress Hosting Providers in 2023 (From $1.99)
210 Best WordPress Hosting Options for 2023 (Pros & Cons)
38 Best Managed WordPress Hosting Providers for 2023 Compared
45 Best WooCommerce Hosting Providers Compared in 2023 (All Budgets)
5Top 9 Landing Page Plugins for WordPress (2023)
69 Best List Building Plugins for WordPress In 2023
7How to Fix the 500 Internal Server Error on Your WordPress Website
8Thrive Themes Review: A Look At The Full Membership
9Beaver Builder Review: Is it The Best Page Builder Plugin for WordPress (2023)?
10OptimizePress Review: Create Landing Pages with Ease
11How to Make a Website: Complete Beginner’s Guide
12Top 22 Best Free Stock Photo Resources For Your Site
1317 of the Best Google Fonts for 2023 (And How to Use Them in WordPress)
14How to Start a Blog in 2022 (Step by Step Guide)
15How To Fix ‘503 Service Unavailable’ WordPress Error
1611 Best Contact Form Plugins for WordPress in 2023
17How to Add a Custom Logo to Your WordPress Site
18How to Fix Error Establishing a Database Connection in WordPress

WPX Hosting: 50% OFF

Save 50% on WPX Hosting using our exclusive coupon code.

Get this Deal

Flywheel(our review)

Our Newsletter

Get awesome content delivered straight to your inbox.

Thank you!

You have successfully joined our subscriber list.

.
Featured In Forbes Huffpost Entrepreneur SEJ

About WPKube

WPKube is an online WordPress resource which focuses on WordPress tutorials, How-to’s, guides, plugins, news, and more. We aim to provide the most comprehensive beginner’s guides to anything about WordPress — from installing plugins, themes, automated installs and setups, to creating and setting up pages for your website.

We have over 500+ tutorials, guides, product reviews, tips, and tricks about WordPress. Founded by Devesh Sharma, the main goal of this site is to provide useful information on anything and everything WordPress.

Twitter Facebook

Useful Links

  • Behind the Scenes
  • Beginner Guides
  • WordPress Hosting
  • WooCommerce Themes
  • MeridianThemes
  • Exclusive WordPress Deals
View All Guides »

Reviews

  • WPEngine 33% OFF
  • Thrive Leads
  • Flywheel 33% OFF
  • Divi Theme 20% OFF
  • Thrive Architect
  • Elegant Themes
Reviews »

Deals

  • InMotion Hosting
  • LifterLMS Coupon
  • LiquidWeb Coupon
  • WPEngine Coupon
  • A2 Hosting
  • FloThemes
More Deals »
© Copyright 2023 WPKube ® All Rights Reserved.
  • Contact
  • Site Terms
  • Disclosure
  • Privacy Policy