How to prevent your PDF documents from being indexed by search engines
Imagine you have some valuable content in your PDFs posted on your website. Google is indexing all of these PDFs, and instead of people arriving to your website, many arrive directly to your PDFs. When this happens you often miss out on the opportunity to convert these users on your website. After reading this article, you will know how to prevent your PDFs from being indexed by search engines. So let's get started.
By using a cloud service like CloudPDF, you can integrate your PDFs directly in your website. The PDFs are rendered on the CloudPDF server. This means search engines will not be able to index the PDF files because they are not being sent to the browser.
Another idea to keep users on your website and increase the probability of conversion is to integrate the PDF file inside a popup. That way, visitors remain on your website while viewing the PDF file.
Change PDF header
This is a more technical solution. Google offers a way of preventing a specific URL from being indexed by using the X-Robots-Tag. On your server, you need to configure this on the response of the PDF file. It will add the following header:
This will make sure that the file will not be indexed. If you are using an Apache webserver, you can add the following code to your .htaccess or your httpd.conf :
<Files ~ "\.pdf$">Header set X-Robots-Tag "noindex, nofollow"</Files>
Robots.txt: A mistake I made
When you first look into the Google options to prevent indexing, the more obvious solution seems to be the robot.txt file, since it allows you to add rules for the search engine. At first I thought this would be sufficient. I added the following robots.txt file to my website and added the PDFs to the /pdfs directory:
User-agent: *Disallow: /pdfs/
This seemed to work until some people loved my files so much that they started to link to my PDFs from their website. Since their websites allowed indexing, Google started to ignore my robots.txt and to index the linked PDF files.
In order to prevent this I would suggest to use the X-Robots-Tag method instead of the robots.txt. This avoids the possibility of Google indexing your PDFs that are linked on other sites, because the X-Robots-Tag is directly attached to the document instead of the website.
When visitors click directly into your documents from the search engines, it can negatively impact your probability of conversion. To minimize traffic coming directly to your PDF documents and missing the rest of your website, we found two solutions for displaying PDFs on your site while protecting them from being indexed.
When you use our CloudPDF viewer, the way we store your files makes it simple to prevent them from being indexed. As we covered, this is because they are rendered in the cloud and not sent to the browser.
We also saw the option of using the X-Robots-Tag to accomplish this goal, and the subtle but important difference between this and the robots.txt file.
We hope this is helpful to you as you customize your PDF management. Have you found another solution for preventing search engines from indexing your documents? If so, we would love to hear about it.