HOW CAN WE SEARCH PDF ?

Question

Hi, Im doing a small project in C++ in LINUX PLATFORM.i need to search 10 or more PDF files and find required data.how can i do so?. i will make my question more clear with following eg

suppose i have ten text books all about c++ and i need info about the topic array and how i can search the pdf and find data?

thanks in advance

Answer 1

4

Not the cleanest method, but you can use:

pdftotext file.pdf -

To convert the PDF to a text stream on stdout and then use whatever text manipulation commands you'd like from there. To convert the PDF to a text file, replace - with a filename:

pdftotext file.pdf file.txt

--jeremy

link

answered 07 Jun '10, 15:54

jeremy ♦♦
1.0k●1●5●16
accept rate: 37%

does pdftotext keeps/inserts tags so to easily find chapters/section/topic entries?

(07 Jun '10, 18:09) pmarini

No, pdftotext simply converts a PDF to plain text. If you'd like something more in-depth you may want to setup Lucene, htdig or some other indexing method that supports PDF.

(08 Jun '10, 01:06) jeremy ♦♦

Answer 2

1

The Lucene engine supports PDF searching. Some FLOSS projects use the Lucene Engine:

OpenCms
regain

It uses the Solr subproject for pdf searching.

Propably, this question will help you :)
How can I search PDF? on Stackoverflow

Good luck!

link

answered 07 Jun '10, 16:36

guerda
553●3●5●15
accept rate: 38%

Please note that LinuxExchange will be shutting down on December 31st, 2016. Visit this thread for additional information and to provide feedback.

HOW CAN WE SEARCH PDF ?

Follow this question

Related questions