October 15th, 2007, 08:57 AM
coding a pdf recovery software
A few days ago I downloaded a book on cryptanalysis in pdf format, it was huge in terms of size (40 MB).
After downloading it I found that I can't read it because it is corrupt. So I downloaded a pdf recovery software and finally read that book.
After a few days it occoured to me that how that pdf recovery software worked?
So I dissassembled it to see the logic behind it and the functions imported by it,but I am still confused by it.
Can anyone point me in the right direction.
And also, do someone know a way to correct a corrupt pdf file "by hand" (opening it in notepad or hex editor etc and editing it to correct it structure so that it can be read by the pdf reader.)
October 15th, 2007, 01:58 PM
I don't know the inner workings of PDF files but I would imagine that the software works in three basic ways:
1. Validate/repair the header record.
2. Validate/repair the end of file record.
3. Extract and recreate pages.
Provided you know what 1 & 2 should look like it would be theoretically possible to manually repair them. If the body of the document is corrupt I wouldn't bother.
October 16th, 2007, 01:14 PM
I have no idea about a PDF Recovery Software.
When the downloaded file is corrupted, that means there was an error in the downloading process itself and file rebuild after downloading (for resume-supported sites) fails to integrate the parsed file locations. In cases such as these, what I do is simply restart the file download and let it run its due course until the file is integrally complete.
Attempting to reconstruct a PDF file by extracting its contents into a text editor then recreate it using a PDF writer (I use CutePDF; and when the document production is mine, there always is a watermark) is tedious... the integrity of the original file (most specially when there are graphics and in multiple columns) is compromised. Personally, that's a no-no since I need to pinpoint the exact page (plus the URL when necessary) when citing the document as a reference.
[Aside 1: I literally had to ask the National Academy of Sciences on how to cite their books that I download in PDF format using the MLA format given the length of the list of editors, the committees and all.]
[Aside 2: I have a healthy fear of PDF attachments in unsolicited emails.]
But when I browse e-books that are displayed in HTML format, I print it in PDF format to make sure I extract it all. The disadvantage of this approach is the configuration of the PDF writer with regard to fonts prevails over my personal preference.
Si vis pacem, para bellum!
October 16th, 2007, 02:06 PM
I dont think it would be really all that easy to write a program that does it since the data is encoded so "proprietarily" Im not sure where you would get the information on howto parse the information even...
By ThePreacher in forum Miscellaneous Security Discussions
Last Post: December 14th, 2006, 08:37 PM
By TAIWL in forum AntiOnline's General Chit Chat
Last Post: July 8th, 2004, 04:35 PM
By helloworid in forum Newbie Security Questions
Last Post: April 30th, 2004, 08:27 PM
By hatebreed2000 in forum AntiOnline's General Chit Chat
Last Post: March 14th, 2003, 05:36 AM
By allenb1963 in forum AntiOnline's General Chit Chat
Last Post: September 17th, 2002, 07:05 AM