Results 1 to 7 of 7

Thread: **URGENT** Download remote files in VB.Net using web request

  1. #1
    Junior Member
    Join Date
    Oct 2002
    Posts
    3

    Exclamation **URGENT** Download remote files in VB.Net using web request

    Hi,

    I am building a VB.NET application in which i need to automatically read a file's contents whose link exists on a web page for a third party web site.The link url for the file looks like this:

    http://<DOMAIN>/ <...some path...> / 64ae64aabd27233f85256d3b0076549b/ a00421d13d0b0e1885256f04005c7511/ $FILE/ THE%20FILE.doc

    When i paste the url directly on the browser it gives a file download box (open/save).but i need to do the download through backend, using a code like this:



    Dim wResp As HttpWebResponse
    Dim sr As StreamReader
    Dim txt As String
    Dim streamWriter As System.IO.StreamWriter
    Dim sw As StreamWriter
    Dim wReq As HttpWebRequest
    Dim exFolder As String


    // strURL is the file url mentioned above

    wReq = CType(WebRequest.Create(New Uri(strURL)), HttpWebRequest)

    With wReq

    .Proxy = WebRequest.DefaultWebProxy
    .Proxy.Credentials = System.Net.CredentialCache.DefaultCredentials
    .Credentials = New NetworkCredential(user, passwd, domain)
    .CookieContainer = cookieJar
    .UserAgent = "BGClient"
    '.UseBinary = True
    .KeepAlive = True
    .Headers.Set("Pragma", "no-cache")
    '.Timeout = 1000000000
    '.Method = System.Net.WebRequestMethods.Ftp.DownloadFile '"POST"
    .ContentType = "application/msword"
    '.ReadWriteTimeout = 1000000000
    .PreAuthenticate = True

    End With

    wResp = wReq.GetResponse

    sr = New StreamReader(wResp.GetResponseStream)
    txt = sr.ReadToEnd.Trim
    sr.Close()
    wResp.Close()
    wReq = Nothing



    If Not txt Is Nothing And txt.Length > 0 Then

    streamWriter = New StreamWriter(dFolder & strTitle & "." & fType, False)
    streamWriter.WriteLine(txt)
    streamWriter.Close()
    streamWriter = Nothing
    fCount = fCount + 1

    End If

    I have tried using web client and also FTPwebrequest, but the site does not accept ftp requests.
    Please help.Thanks in advance!!
    ****DOOD*****

  2. #2
    Senior Member
    Join Date
    Mar 2004
    Posts
    557
    Hi

    How about
    Code:
    Dim sysWebClient As System.Net.WebClient = New System.Net.WebClient
    sysWebClient.DownloadFile("http://www.google.com/images/logo.gif", _
       System.Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments)  & _
       "\google_logo.gif")
    In addition, you can configure Proxy-Settings as well as User-Credentials with the WebClient-class.

    Cheers
    Last edited by sec_ware; November 5th, 2006 at 11:51 AM.
    If the only tool you have is a hammer, you tend to see every problem as a nail.
    (Abraham Maslow, Psychologist, 1908-70)

  3. #3
    Junior Member
    Join Date
    Oct 2002
    Posts
    3
    Hi,

    Thanks for the reply.

    I had tried using web client but the site throws up a login page...and i am using the web request object to pass POST parameters to bypass the login page.

    Currently I have been successful in opening the file and getting it as an IO Stream, but now there's a new problem in that the file is a word doc, and the saved file appears corrupted ...i guess the problem is with encoding and i am currently researching on it...

    Any help will be appreciated
    ****DOOD*****

  4. #4
    Senior Member
    Join Date
    Mar 2004
    Posts
    557
    Hi

    I see - I misunderstood you. I thought you are talking about a login that could be handled using
    Code:
    sysWebClient.Credentials = New System.Net.NetworkCredential("username", "password")
    Since the login requires a POST, you first need the GET, which will provide
    you some kind of session handler (e.g. cookie). Then, you can do the POST.
    The tricky part is know what to POST - as you know

    I recommend burpproxy[1] (or another middleman) to capture and analyse the
    (manually triggered) traffic between your client (don't forget to set the
    Proxy) and the webserver. Using this information will allow you to finetune
    your program.
    For inspiration, have a look at this thread[2].

    Cheers

    [1] http://freshmeat.net/projects/burpproxy/
    [2] http://www.thescripts.com/forum/thread367596.html
    Last edited by sec_ware; November 5th, 2006 at 06:09 PM.
    If the only tool you have is a hammer, you tend to see every problem as a nail.
    (Abraham Maslow, Psychologist, 1908-70)

  5. #5
    Junior Member
    Join Date
    Oct 2002
    Posts
    3
    Hi,

    Thanks for the reply and suggestions, but I have already been able to manage the POST data and have access to the site's resources (which are mainly docs).But now my problem is with downloading a MS word doc, whose link is embedded in one of the pages.My application works like a web crawler which goes and downloads all data from the specified links.

    Now the word documents which are getting downloaded are corrupted and are not opening in correct ENCODING.I have the following code for the download:

    Dim wReq as HttpWebRequest
    Dim wResp as HttpWebResponse
    Dim str as String
    Dim sr as StreamReader
    Dim sw as StreamWriter

    wReq = CType(WebRequest.Create(New Uri(strURL)),HttpWebRequest)
    wReq.Method = "POST"
    wReq.KeepAlive = True
    wReq.Credentials = New NetworkCredential(user, passwd, domain)
    wReq.ContentType = "application/msword"


    wResp = wReq.GetResponse

    sr = New StreamReader(wResp.GetResponseStream, System.Text.Encoding.ASCII)
    str = sr.ReadToEnd.Trim
    sr.Close()
    wResp.Close()
    wReq = Nothing



    If Not str Is Nothing And str.Length > 0 Then

    sw = New StreamWriter(dFolder & filename & "." & "doc", False)
    sw.WriteLine(txt)
    sw.Close()
    sw = Nothing

    End If

    I have tried content types of "application/doc" and encoding of the format GetEncoding(1251), etc but still when i open the downloaded doc in word it is not able to understand the encoding.

    Please help!!
    Thanks in advance
    ****DOOD*****

  6. #6
    Senior Member nihil's Avatar
    Join Date
    Jul 2003
    Location
    United Kingdom: Bridlington
    Posts
    17,188
    Hello,

    1. I know absolutely nothing about this sort of thing other than that it can be done.

    2. Between you and sec_ware (good help there mate!) you seem to have gotten it pretty much solved, in that you are getting the downloads to work?

    I just had a passing thought based on my previous experiences with .doc (and other file extension) compatibility?

    Could it be that the file you are getting is a version that is incompatible with your reader, so the problem is at your end?

    If you copy one of your downloaded documents and rename it as a .txt file then open it in Notepad, you should at the head and/or foot of the file, see some of the metadata that will tell you what created the file.

    Also, if you can post with a small (1Mb or less) example I can try a variety of "Word Compatible" tools that I have, to try to open it. I know, I know, this approach is as crude as hell, but supposedly compatible tools frequently give you quite informative error messages when they cannot properly convert a Word document?..............I guess you won't hire me to fix your washing machine now?

    The only other question that came to mind was "are they encrypted in some way?"

    Cheers.................this is probably a complete red herring, but if it just happens to be true, you would mess with your downloading forever, and it still would not work?

    Good Luck!
    Last edited by nihil; November 6th, 2006 at 07:57 AM.

  7. #7
    Senior Member
    Join Date
    Mar 2004
    Posts
    557
    Hi

    Nihil has implied a good suggestion: analyse your problem - e.g. do you have the same problem with all kind of "downloads"?

    You won't.

    The problem is caused by the fact that a doc-file is binary, not ascii, hence
    treating the "download" as a textstream will fail (StreamReader implements a TextReader). StreamReader will auto-detect the encoding (Ascii, UTF-7/8/16) ,
    ie. you don't have to specify it.

    Use the BinaryReader-Implementation of Stream[1]. I had a quick look at the web and have found a nice sample chapter[2] and this implementation of a web-crawler[3] (in C#, but it's all the same).

    Cheers

    [1] http://msdn2.microsoft.com/en-us/lib...er(VS.80).aspx
    [2] http://www.microsoft.com/mspress/boo...chap/6436.aspx
    [3] http://www.codeproject.com/cs/internet/Crawler.asp
    If the only tool you have is a hammer, you tend to see every problem as a nail.
    (Abraham Maslow, Psychologist, 1908-70)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •