-
October 15th, 2007, 08:57 AM
#1
coding a pdf recovery software
<history lesson>
A few days ago I downloaded a book on cryptanalysis in pdf format, it was huge in terms of size (40 MB).
After downloading it I found that I can't read it because it is corrupt. So I downloaded a pdf recovery software and finally read that book.
</history lesson>
After a few days it occoured to me that how that pdf recovery software worked?
So I dissassembled it to see the logic behind it and the functions imported by it,but I am still confused by it.
Can anyone point me in the right direction.
And also, do someone know a way to correct a corrupt pdf file "by hand" (opening it in notepad or hex editor etc and editing it to correct it structure so that it can be read by the pdf reader.)
-
October 15th, 2007, 01:58 PM
#2
I don't know the inner workings of PDF files but I would imagine that the software works in three basic ways:
1. Validate/repair the header record.
2. Validate/repair the end of file record.
3. Extract and recreate pages.
Provided you know what 1 & 2 should look like it would be theoretically possible to manually repair them. If the body of the document is corrupt I wouldn't bother.
-
October 16th, 2007, 01:14 PM
#3
I have no idea about a PDF Recovery Software.
When the downloaded file is corrupted, that means there was an error in the downloading process itself and file rebuild after downloading (for resume-supported sites) fails to integrate the parsed file locations. In cases such as these, what I do is simply restart the file download and let it run its due course until the file is integrally complete.
Attempting to reconstruct a PDF file by extracting its contents into a text editor then recreate it using a PDF writer (I use CutePDF; and when the document production is mine, there always is a watermark) is tedious... the integrity of the original file (most specially when there are graphics and in multiple columns) is compromised. Personally, that's a no-no since I need to pinpoint the exact page (plus the URL when necessary) when citing the document as a reference.
[Aside 1: I literally had to ask the National Academy of Sciences on how to cite their books that I download in PDF format using the MLA format given the length of the list of editors, the committees and all.]
[Aside 2: I have a healthy fear of PDF attachments in unsolicited emails.]
But when I browse e-books that are displayed in HTML format, I print it in PDF format to make sure I extract it all. The disadvantage of this approach is the configuration of the PDF writer with regard to fonts prevails over my personal preference.
Si vis pacem, para bellum!
-
October 16th, 2007, 02:06 PM
#4
I dont think it would be really all that easy to write a program that does it since the data is encoded so "proprietarily" Im not sure where you would get the information on howto parse the information even...
Similar Threads
-
By ThePreacher in forum Miscellaneous Security Discussions
Replies: 17
Last Post: December 14th, 2006, 09:37 PM
-
By TAIWL in forum AntiOnline's General Chit Chat
Replies: 6
Last Post: July 8th, 2004, 04:35 PM
-
By helloworid in forum Newbie Security Questions
Replies: 10
Last Post: April 30th, 2004, 08:27 PM
-
By hatebreed2000 in forum AntiOnline's General Chit Chat
Replies: 1
Last Post: March 14th, 2003, 06:36 AM
-
By allenb1963 in forum AntiOnline's General Chit Chat
Replies: 1
Last Post: September 17th, 2002, 07:05 AM
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
|