Ahrefs runs our own web crawler (known as AhrefsBot) that visits millions of websites to retrieve information and store them in our records. This is how Ahrefs builds its huge link index.
AhrefsBot strictly respects robots.txt, both disallow and allow rules. As such, it is possible to control the behaviour of AhrefsBot by modifying robots.txt.
As far as we know, sites like Quora, LinkedIn and Slideshare have either:
- prevented us from crawling, or
- only allowed a partial crawling of their site.
This is the main reason why backlinks from these sites (dofollow/nofollow) are not shown in Ahrefs' backlinks report.
As for PDF files, AhrefsBot does not crawl them for links, meta-data, and so on.