Results 1 to 4 of 4

Thread: coding a pdf recovery software

Hybrid View

  1. #1
    Banned shakuni's Avatar
    Join Date
    Aug 2007
    Posts
    24

    coding a pdf recovery software

    <history lesson>
    A few days ago I downloaded a book on cryptanalysis in pdf format, it was huge in terms of size (40 MB).
    After downloading it I found that I can't read it because it is corrupt. So I downloaded a pdf recovery software and finally read that book.
    </history lesson>

    After a few days it occoured to me that how that pdf recovery software worked?
    So I dissassembled it to see the logic behind it and the functions imported by it,but I am still confused by it.
    Can anyone point me in the right direction.

    And also, do someone know a way to correct a corrupt pdf file "by hand" (opening it in notepad or hex editor etc and editing it to correct it structure so that it can be read by the pdf reader.)

  2. #2
    Senior Member nihil's Avatar
    Join Date
    Jul 2003
    Location
    United Kingdom: Bridlington
    Posts
    17,188
    I don't know the inner workings of PDF files but I would imagine that the software works in three basic ways:

    1. Validate/repair the header record.
    2. Validate/repair the end of file record.
    3. Extract and recreate pages.

    Provided you know what 1 & 2 should look like it would be theoretically possible to manually repair them. If the body of the document is corrupt I wouldn't bother.

  3. #3
    I have no idea about a PDF Recovery Software.

    When the downloaded file is corrupted, that means there was an error in the downloading process itself and file rebuild after downloading (for resume-supported sites) fails to integrate the parsed file locations. In cases such as these, what I do is simply restart the file download and let it run its due course until the file is integrally complete.

    Attempting to reconstruct a PDF file by extracting its contents into a text editor then recreate it using a PDF writer (I use CutePDF; and when the document production is mine, there always is a watermark) is tedious... the integrity of the original file (most specially when there are graphics and in multiple columns) is compromised. Personally, that's a no-no since I need to pinpoint the exact page (plus the URL when necessary) when citing the document as a reference.

    [Aside 1: I literally had to ask the National Academy of Sciences on how to cite their books that I download in PDF format using the MLA format given the length of the list of editors, the committees and all.]

    [Aside 2: I have a healthy fear of PDF attachments in unsolicited emails.]

    But when I browse e-books that are displayed in HTML format, I print it in PDF format to make sure I extract it all. The disadvantage of this approach is the configuration of the PDF writer with regard to fonts prevails over my personal preference.
    Si vis pacem, para bellum!

  4. #4
    Senior Member
    Join Date
    Oct 2003
    Location
    MA
    Posts
    1,052
    I dont think it would be really all that easy to write a program that does it since the data is encoded so "proprietarily" Im not sure where you would get the information on howto parse the information even...

Similar Threads

  1. Port List
    By ThePreacher in forum Miscellaneous Security Discussions
    Replies: 17
    Last Post: December 14th, 2006, 09:37 PM
  2. Data recovery software
    By TAIWL in forum AntiOnline's General Chit Chat
    Replies: 6
    Last Post: July 8th, 2004, 04:35 PM
  3. Debate about Data Recovery after Format.
    By helloworid in forum Newbie Security Questions
    Replies: 10
    Last Post: April 30th, 2004, 08:27 PM
  4. ports
    By hatebreed2000 in forum AntiOnline's General Chit Chat
    Replies: 1
    Last Post: March 14th, 2003, 06:36 AM
  5. Bad news for software consumers
    By allenb1963 in forum AntiOnline's General Chit Chat
    Replies: 1
    Last Post: September 17th, 2002, 07:05 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •