PDF Indexer

PDF Indexer - Joomla PDF and DOC indexer

Index your PDF and MsWord documents and allows it's content searchable through your Joomla search functions, include Joomla Smart Search tool.

Version: 4.0 Last Updated: Jan 02 2024 Compatible: Joomla 3.9.0+, Joomla 4, Joomla 5

DOCUMENTS INDEXERv4.0 KEY FEATURES

1. Indexing pdf, doc, docx and xlsx documents and save content of these documents into database for searching. These documents can be uploaded to different directories .

2. Indexing all documents stored in a folder and all it's sub-folders within one click. This will save time for you incase you have many documents stored in different folders..

3. Searching indexed documents via Joomla standard search .

4. Removing indexed content with removed files

5. Integrated with popular download extension: EDOCMAN

  • Allow indexing PDF documents stored by these extensions .
  • Allow searching for documents based on indexed content via standard search function of these components.

6. Having option to index content of pdf files without popen

7. The extension can also be used to index word documents. However, because of limitation of the Word Indexer library which I am using, there are still problems with word documents content special character. I will try to find a better library and add it to next versions of the extension to improve indexing word documents feature.

8. Integrate with Joomla Update System

9. Compatible with Joomla 3.x, Joomla 4.x and Joomla 5.x

10. Works well on PHP 7.x and 8.x series

11. Provide option to index new documents automatically

1. Issue pdftotext: Permission denied

When you index documents and having error

/components/com_docindexer/lib/binaries/linux/pdftotext: Permission denied

in indexed content of pdf files, you can change permission of file:

/components/com_docindexer/lib/binaries/linux/pdftotext

to 777, then the issue will be solved

2. How to setup cron task to index content of new documents automatically

- Publish the plugin: Documents Indexer - Cron task

This plugin has various important parameters:

  • Integrate with Edocman documents Select Yes in case you want to index documents in Edocman documents folder
  • Number Documents will be indexed With each time running, Documents Indexer can't index all documents, you should enter a number of documents that component will index. Default is 5

- You should setup cron tasks from your server to call the direct links on your site by this commandBy default, Document Indexer extension uses a system plugin to trigger sending alert notifications. That mean it requires someone access to the site (search engine bots are also counted) to trigger process. Sometime, it is not reliable or causes multiple documents indexed in case your site has a very high traffic. To address that limitation, you can setup cron job from your hosting account to trigger documents indexing instead. Please see detailed instructions below: 

Setup a cron job to make a request to this URL using CURL (note that you should use the CURL so that the variable can be passed in the GET request, see https://stackoverflow.com/questions/11375260/cron-command-to-run-url-address-every-5-minutes for detailed instructions

  domain.com/index.php?trigger_code=SECRETCODE

  • Replace domain.com/ with URL of your site
  • Replace SECRETCODE with the secret string which you entered in the Trigger Code parameter of plugin

That will make the reminder only processed when there is a request made to that URL (which should be secret as no real users will access to that URL). It will make it more reliable compare to replying a system plugin.