Just stumbled over this wonderful tool:
* Generates a searchable PDF/A file from a regular PDF
* Places OCR text accurately below the image to ease copy / paste
* Keeps the exact resolution of the original embedded images
* When possible, inserts OCR information as a "lossless" operation without rendering vector information
* Keeps file size about the same
* If requested deskews and/or cleans the image before performing OCR
* Validates input and output files
* Provides debug mode to enable easy verification of the OCR results
* Processes pages in parallel when more than one CPU core is available
* Uses Tesseract OCR engine
* Supports more than 100 languages recognized by Tesseract
* Battle-tested on thousands of PDFs, a test suite and continuous integration
There is an official package in Debian Linux for those using Linux.
I have used it so far to postprocess both a Spanish and English language PDF of my own making, and i am very happy with the results.