heya guys after looking around the interent i found something called a html grabber
how do they work ?
Printable View
heya guys after looking around the interent i found something called a html grabber
how do they work ?
they open socket with the webserver, using the GET call... the HTTP Server will then send the Document data (in this case, html) . It's a very simple protocol.
Here is the RFC: ftp://ftp.rfc-editor.org/in-notes/rfc2660.txt
Quote:
An appropriate HTTP request to dereference this URL would be:
============================================================
GET /secret HTTP/1.0
Security-Scheme: S-HTTP/1.4
User-Agent: Web-O-Vision 1.2beta
Accept: *.*
Key-Assign: Inband,1,reply,des-ecb;7878787878787878
============================================================
The nice thing about it is that the GET request is sent in plain text,
so you can log on to a web server with telnet, and do it manually.
http://www.dgate.org/~brg/bvtelnet80/Quote:
To retrieve Web documents using HTTP, the client (you)
must issue a GET request. The syntax of a GET request is as follows:
GET document-name HTTP-version
You can do it with telnet or netcat:
:cool:Code:[rcgreen@acer rcgreen]$ nc google.com 80
GET / HTTP/1.1
HTTP/1.1 200 OK
Content-Length: 2690
Server: GWS/2.1
Date: Mon, 15 Sep 2003 20:04:46 GMT
Content-Type: text/html
Cache-control: private
Set-Cookie: PREF=ID=5bd6a80e6a3b9ea3:TM=1063656286:LM=1063656286:S=x2BNEz_-ZUATmbRG; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
<html><head><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"><title>Google</title><style><!--
body,td,a,p,.h{font-family:arial,sans-serif;}
.h{font-size: 20px;}
.q{text-decoration:none; color:#0000cc;}
//-->
</style>
<script>
<!--
function sf(){document.f.q.focus();}
// -->
</script>
</head><body bgcolor=#ffffff text...etc. etc. etc.
Here is a simple VB source code. All You Need is..
1 Richtextbox or Regular (Text1)
1 Winsock (sckTCP)
1 CommonDialog (cdialog)
2 Textboxes (Text2, Text3)
1 CommandButton (Command1)
Code:Private Sub Command1_Click()
Text1.SetFocus
Text1.Text = ""
sckTCP.Close
sckTCP.RemoteHost = Text3
sckTCP.RemotePort = 80
sckTCP.Connect
Do
If sckTCP.State = 7 Then Exit Do
DoEvents
Loop
sckTCP.SendData "GET " & Text2 & " HTTP/1.0" & vbCrLf & "Accept: */*" & vbCrLf & "Accept: text/html" & vbCrLf & vbCrLf
End Sub
Private Sub Form_QueryUnload(Cancel As Integer, UnloadMode As Integer)
sckTCP.Close
End Sub
Private Sub sckTCP_DataArrival(ByVal bytesTotal As Long)
sckTCP.GetData temp$
Text1.SelText = temp$
Text1.SelStart = Len(Text1)
End Sub
Here is a nice lil' Perl example...
usage: "perl hclient.pl http://www.antionline.com"
Code:use LWP::UserAgent; $http_client = LWP::UserAgent->new; $http_client->agent("Perl/5.8 "); my $data_request = HTTP::Request->new(POST => $ARGV[0]); $data_request->content_type('application/x-www-form-urlencoded'); $data_request->content('match=www&errors=0'); my $data_response = $http_client->request($data_request); if ($data_response->is_success) { print $data_response->content; } else { print "\n[NO DATA]\n"; }
Don't forget nice little tools like:
wget http://www.gnu.org/software/wget/wget.html
curl http://curl.haxx.se/
curlssl (I think it is covered by curl now).
They add some nice functionality, that while possible to do manually, is much easier to do via wget or curl (especially when interacting with forms).
/nebulus