Thread: Htmal grabber

    Htmal grabber

    heya guys after looking around the interent i found something called a html grabber
    how do they work ?
    they open socket with the webserver, using the GET call... the HTTP Server will then send the Document data (in this case, html) . It's a very simple protocol.

    Here is the RFC: ftp://ftp.rfc-editor.org/in-notes/rfc2660.txt

    An appropriate HTTP request to dereference this URL would be:

    GET /secret HTTP/1.0
    Security-Scheme: S-HTTP/1.4
    User-Agent: Web-O-Vision 1.2beta
    Accept: *.*
    Key-Assign: Inband,1,reply,des-ecb;7878787878787878

    The nice thing about it is that the GET request is sent in plain text,
    so you can log on to a web server with telnet, and do it manually.

    To retrieve Web documents using HTTP, the client (you)
    must issue a GET request. The syntax of a GET request is as follows:
    GET document-name HTTP-version

    You can do it with telnet or netcat:

    [rcgreen@acer rcgreen]$ nc google.com 80
    GET / HTTP/1.1
    HTTP/1.1 200 OK
    Content-Length: 2690
    Server: GWS/2.1
    Date: Mon, 15 Sep 2003 20:04:46 GMT
    Content-Type: text/html
    Cache-control: private
    Set-Cookie: PREF=ID=5bd6a80e6a3b9ea3:TM=1063656286:LM=1063656286:S=x2BNEz_-ZUATmbRG; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
    <html><head><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"><title>Google</title><style><!--
    .h{font-size: 20px;}
    .q{text-decoration:none; color:#0000cc;}
    function sf(){document.f.q.focus();}
    // -->
    </head><body bgcolor=#ffffff text...etc. etc. etc.
    Here is a simple VB source code. All You Need is..
    1 Richtextbox or Regular (Text1)
    1 Winsock (sckTCP)
    1 CommonDialog (cdialog)
    2 Textboxes (Text2, Text3)
    1 CommandButton (Command1)

    Private Sub Command1_Click()
    Text1.Text = ""
    sckTCP.RemoteHost = Text3
    sckTCP.RemotePort = 80
    If sckTCP.State = 7 Then Exit Do
    sckTCP.SendData "GET " & Text2 & " HTTP/1.0" & vbCrLf & "Accept: */*" & vbCrLf & "Accept: text/html" & vbCrLf & vbCrLf
    End Sub
    Private Sub Form_QueryUnload(Cancel As Integer, UnloadMode As Integer)
    End Sub
    Private Sub sckTCP_DataArrival(ByVal bytesTotal As Long)
    sckTCP.GetData temp$
    Text1.SelText = temp$
    Text1.SelStart = Len(Text1)
    End Sub
    Here is a nice lil' Perl example...
    usage: "perl hclient.pl http://www.antionline.com"
      use LWP::UserAgent; $http_client = LWP::UserAgent->new; $http_client->agent("Perl/5.8 ");  my $data_request = HTTP::Request->new(POST => $ARGV[0]);  $data_request->content_type('application/x-www-form-urlencoded');  $data_request->content('match=www&errors=0');  my $data_response = $http_client->request($data_request);  if ($data_response->is_success) {      print $data_response->content;  } else {      print "\n[NO DATA]\n";  }
    Don't forget nice little tools like:

    wget http://www.gnu.org/software/wget/wget.html

    curl http://curl.haxx.se/

    curlssl (I think it is covered by curl now).

    They add some nice functionality, that while possible to do manually, is much easier to do via wget or curl (especially when interacting with forms).

