-
September 15th, 2003, 07:27 AM
#1
Junior Member
Htmal grabber
heya guys after looking around the interent i found something called a html grabber
how do they work ?
the tallest blade of grass is first to be cut by the lawnmower
-
September 15th, 2003, 03:23 PM
#2
they open socket with the webserver, using the GET call... the HTTP Server will then send the Document data (in this case, html) . It's a very simple protocol.
Here is the RFC: ftp://ftp.rfc-editor.org/in-notes/rfc2660.txt
An appropriate HTTP request to dereference this URL would be:
============================================================
GET /secret HTTP/1.0
Security-Scheme: S-HTTP/1.4
User-Agent: Web-O-Vision 1.2beta
Accept: *.*
Key-Assign: Inband,1,reply,des-ecb;7878787878787878
============================================================
yeah, I\'m gonna need that by friday...
-
September 15th, 2003, 09:06 PM
#3
The nice thing about it is that the GET request is sent in plain text,
so you can log on to a web server with telnet, and do it manually.
To retrieve Web documents using HTTP, the client (you)
must issue a GET request. The syntax of a GET request is as follows:
GET document-name HTTP-version
http://www.dgate.org/~brg/bvtelnet80/
You can do it with telnet or netcat:
Code:
[rcgreen@acer rcgreen]$ nc google.com 80
GET / HTTP/1.1
HTTP/1.1 200 OK
Content-Length: 2690
Server: GWS/2.1
Date: Mon, 15 Sep 2003 20:04:46 GMT
Content-Type: text/html
Cache-control: private
Set-Cookie: PREF=ID=5bd6a80e6a3b9ea3:TM=1063656286:LM=1063656286:S=x2BNEz_-ZUATmbRG; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
<html><head><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"><title>Google</title><style><!--
body,td,a,p,.h{font-family:arial,sans-serif;}
.h{font-size: 20px;}
.q{text-decoration:none; color:#0000cc;}
//-->
</style>
<script>
<!--
function sf(){document.f.q.focus();}
// -->
</script>
</head><body bgcolor=#ffffff text...etc. etc. etc.
I came in to the world with nothing. I still have most of it.
-
September 16th, 2003, 02:06 AM
#4
Here is a simple VB source code. All You Need is..
1 Richtextbox or Regular (Text1)
1 Winsock (sckTCP)
1 CommonDialog (cdialog)
2 Textboxes (Text2, Text3)
1 CommandButton (Command1)
Code:
Private Sub Command1_Click()
Text1.SetFocus
Text1.Text = ""
sckTCP.Close
sckTCP.RemoteHost = Text3
sckTCP.RemotePort = 80
sckTCP.Connect
Do
If sckTCP.State = 7 Then Exit Do
DoEvents
Loop
sckTCP.SendData "GET " & Text2 & " HTTP/1.0" & vbCrLf & "Accept: */*" & vbCrLf & "Accept: text/html" & vbCrLf & vbCrLf
End Sub
Private Sub Form_QueryUnload(Cancel As Integer, UnloadMode As Integer)
sckTCP.Close
End Sub
Private Sub sckTCP_DataArrival(ByVal bytesTotal As Long)
sckTCP.GetData temp$
Text1.SelText = temp$
Text1.SelStart = Len(Text1)
End Sub
AntiOnline Quick Forum Version 2b Click Here
10010101000000110010001100111
-
September 16th, 2003, 02:49 PM
#5
Here is a nice lil' Perl example...
usage: "perl hclient.pl http://www.antionline.com"
Code:
use LWP::UserAgent; $http_client = LWP::UserAgent->new; $http_client->agent("Perl/5.8 "); my $data_request = HTTP::Request->new(POST => $ARGV[0]); $data_request->content_type('application/x-www-form-urlencoded'); $data_request->content('match=www&errors=0'); my $data_response = $http_client->request($data_request); if ($data_response->is_success) { print $data_response->content; } else { print "\n[NO DATA]\n"; }
yeah, I\'m gonna need that by friday...
-
September 16th, 2003, 03:01 PM
#6
Don't forget nice little tools like:
wget http://www.gnu.org/software/wget/wget.html
curl http://curl.haxx.se/
curlssl (I think it is covered by curl now).
They add some nice functionality, that while possible to do manually, is much easier to do via wget or curl (especially when interacting with forms).
/nebulus
There is only one constant, one universal, it is the only real truth: causality. Action. Reaction. Cause and effect...There is no escape from it, we are forever slaves to it. Our only hope, our only peace is to understand it, to understand the 'why'. 'Why' is what separates us from them, you from me. 'Why' is the only real social power, without it you are powerless.
(Merovingian - Matrix Reloaded)
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
|