Hi, Im doing a small project in C++ in LINUX PLATFORM.i need to search 10 or more PDF files and find required data.how can i do so?. i will make my question more clear with following eg suppose i have ten text books all about c++ and i need info about the topic array and how i can search the pdf and find data? thanks in advance |
The Lucene engine supports PDF searching. Some FLOSS projects use the Lucene Engine:
It uses the Solr subproject for pdf searching. Propably, this question will help you :) Good luck! |
Not the cleanest method, but you can use:
To convert the PDF to a text stream on stdout and then use whatever text manipulation commands you'd like from there. To convert the PDF to a text file, replace - with a filename:
--jeremy does pdftotext keeps/inserts tags so to easily find chapters/section/topic entries?
(07 Jun '10, 18:09)
pmarini
No, pdftotext simply converts a PDF to plain text. If you'd like something more in-depth you may want to setup Lucene, htdig or some other indexing method that supports PDF.
(08 Jun '10, 01:06)
jeremy ♦♦
|
Please accept an answer so the question/answer can be finished. Or provide more details so we can help.