-
November 5th, 2006, 09:10 AM
#1
Junior Member
**URGENT** Download remote files in VB.Net using web request
Hi,
I am building a VB.NET application in which i need to automatically read a file's contents whose link exists on a web page for a third party web site.The link url for the file looks like this:
http://<DOMAIN>/ <...some path...> / 64ae64aabd27233f85256d3b0076549b/ a00421d13d0b0e1885256f04005c7511/ $FILE/ THE%20FILE.doc
When i paste the url directly on the browser it gives a file download box (open/save).but i need to do the download through backend, using a code like this:
Dim wResp As HttpWebResponse
Dim sr As StreamReader
Dim txt As String
Dim streamWriter As System.IO.StreamWriter
Dim sw As StreamWriter
Dim wReq As HttpWebRequest
Dim exFolder As String
// strURL is the file url mentioned above
wReq = CType(WebRequest.Create(New Uri(strURL)), HttpWebRequest)
With wReq
.Proxy = WebRequest.DefaultWebProxy
.Proxy.Credentials = System.Net.CredentialCache.DefaultCredentials
.Credentials = New NetworkCredential(user, passwd, domain)
.CookieContainer = cookieJar
.UserAgent = "BGClient"
'.UseBinary = True
.KeepAlive = True
.Headers.Set("Pragma", "no-cache")
'.Timeout = 1000000000
'.Method = System.Net.WebRequestMethods.Ftp.DownloadFile '"POST"
.ContentType = "application/msword"
'.ReadWriteTimeout = 1000000000
.PreAuthenticate = True
End With
wResp = wReq.GetResponse
sr = New StreamReader(wResp.GetResponseStream)
txt = sr.ReadToEnd.Trim
sr.Close()
wResp.Close()
wReq = Nothing
If Not txt Is Nothing And txt.Length > 0 Then
streamWriter = New StreamWriter(dFolder & strTitle & "." & fType, False)
streamWriter.WriteLine(txt)
streamWriter.Close()
streamWriter = Nothing
fCount = fCount + 1
End If
I have tried using web client and also FTPwebrequest, but the site does not accept ftp requests.
Please help.Thanks in advance!!
****DOOD*****
-
November 5th, 2006, 11:48 AM
#2
Hi
How about
Code:
Dim sysWebClient As System.Net.WebClient = New System.Net.WebClient
sysWebClient.DownloadFile("http://www.google.com/images/logo.gif", _
System.Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments) & _
"\google_logo.gif")
In addition, you can configure Proxy-Settings as well as User-Credentials with the WebClient-class.
Cheers
Last edited by sec_ware; November 5th, 2006 at 11:51 AM.
If the only tool you have is a hammer, you tend to see every problem as a nail.
(Abraham Maslow, Psychologist, 1908-70)
-
November 5th, 2006, 12:19 PM
#3
Junior Member
Hi,
Thanks for the reply.
I had tried using web client but the site throws up a login page...and i am using the web request object to pass POST parameters to bypass the login page.
Currently I have been successful in opening the file and getting it as an IO Stream, but now there's a new problem in that the file is a word doc, and the saved file appears corrupted ...i guess the problem is with encoding and i am currently researching on it...
Any help will be appreciated
****DOOD*****
-
November 5th, 2006, 06:05 PM
#4
Hi
I see - I misunderstood you. I thought you are talking about a login that could be handled using
Code:
sysWebClient.Credentials = New System.Net.NetworkCredential("username", "password")
Since the login requires a POST, you first need the GET, which will provide
you some kind of session handler (e.g. cookie). Then, you can do the POST.
The tricky part is know what to POST - as you know
I recommend burpproxy[1] (or another middleman) to capture and analyse the
(manually triggered) traffic between your client (don't forget to set the
Proxy) and the webserver. Using this information will allow you to finetune
your program.
For inspiration, have a look at this thread[2].
Cheers
[1] http://freshmeat.net/projects/burpproxy/
[2] http://www.thescripts.com/forum/thread367596.html
Last edited by sec_ware; November 5th, 2006 at 06:09 PM.
If the only tool you have is a hammer, you tend to see every problem as a nail.
(Abraham Maslow, Psychologist, 1908-70)
-
November 6th, 2006, 03:59 AM
#5
Junior Member
Hi,
Thanks for the reply and suggestions, but I have already been able to manage the POST data and have access to the site's resources (which are mainly docs).But now my problem is with downloading a MS word doc, whose link is embedded in one of the pages.My application works like a web crawler which goes and downloads all data from the specified links.
Now the word documents which are getting downloaded are corrupted and are not opening in correct ENCODING.I have the following code for the download:
Dim wReq as HttpWebRequest
Dim wResp as HttpWebResponse
Dim str as String
Dim sr as StreamReader
Dim sw as StreamWriter
wReq = CType(WebRequest.Create(New Uri(strURL)),HttpWebRequest)
wReq.Method = "POST"
wReq.KeepAlive = True
wReq.Credentials = New NetworkCredential(user, passwd, domain)
wReq.ContentType = "application/msword"
wResp = wReq.GetResponse
sr = New StreamReader(wResp.GetResponseStream, System.Text.Encoding.ASCII)
str = sr.ReadToEnd.Trim
sr.Close()
wResp.Close()
wReq = Nothing
If Not str Is Nothing And str.Length > 0 Then
sw = New StreamWriter(dFolder & filename & "." & "doc", False)
sw.WriteLine(txt)
sw.Close()
sw = Nothing
End If
I have tried content types of "application/doc" and encoding of the format GetEncoding(1251), etc but still when i open the downloaded doc in word it is not able to understand the encoding.
Please help!!
Thanks in advance
****DOOD*****
-
November 6th, 2006, 07:50 AM
#6
Hello,
1. I know absolutely nothing about this sort of thing other than that it can be done.
2. Between you and sec_ware (good help there mate!) you seem to have gotten it pretty much solved, in that you are getting the downloads to work?
I just had a passing thought based on my previous experiences with .doc (and other file extension) compatibility?
Could it be that the file you are getting is a version that is incompatible with your reader, so the problem is at your end?
If you copy one of your downloaded documents and rename it as a .txt file then open it in Notepad, you should at the head and/or foot of the file, see some of the metadata that will tell you what created the file.
Also, if you can post with a small (1Mb or less) example I can try a variety of "Word Compatible" tools that I have, to try to open it. I know, I know, this approach is as crude as hell, but supposedly compatible tools frequently give you quite informative error messages when they cannot properly convert a Word document?..............I guess you won't hire me to fix your washing machine now?
The only other question that came to mind was "are they encrypted in some way?"
Cheers.................this is probably a complete red herring, but if it just happens to be true, you would mess with your downloading forever, and it still would not work?
Good Luck!
Last edited by nihil; November 6th, 2006 at 07:57 AM.
-
November 6th, 2006, 01:26 PM
#7
Hi
Nihil has implied a good suggestion: analyse your problem - e.g. do you have the same problem with all kind of "downloads"?
You won't.
The problem is caused by the fact that a doc-file is binary, not ascii, hence
treating the "download" as a textstream will fail (StreamReader implements a TextReader). StreamReader will auto-detect the encoding (Ascii, UTF-7/8/16) ,
ie. you don't have to specify it.
Use the BinaryReader-Implementation of Stream[1]. I had a quick look at the web and have found a nice sample chapter[2] and this implementation of a web-crawler[3] (in C#, but it's all the same).
Cheers
[1] http://msdn2.microsoft.com/en-us/lib...er(VS.80).aspx
[2] http://www.microsoft.com/mspress/boo...chap/6436.aspx
[3] http://www.codeproject.com/cs/internet/Crawler.asp
If the only tool you have is a hammer, you tend to see every problem as a nail.
(Abraham Maslow, Psychologist, 1908-70)
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
|