Results 1 to 6 of 6

Thread: Htmal grabber

  1. #1
    Junior Member
    Join Date
    Sep 2003

    Htmal grabber

    heya guys after looking around the interent i found something called a html grabber
    how do they work ?
    the tallest blade of grass is first to be cut by the lawnmower

  2. #2
    Senior Member tampabay420's Avatar
    Join Date
    Aug 2002
    they open socket with the webserver, using the GET call... the HTTP Server will then send the Document data (in this case, html) . It's a very simple protocol.

    Here is the RFC: ftp://ftp.rfc-editor.org/in-notes/rfc2660.txt

    An appropriate HTTP request to dereference this URL would be:

    GET /secret HTTP/1.0
    Security-Scheme: S-HTTP/1.4
    User-Agent: Web-O-Vision 1.2beta
    Accept: *.*
    Key-Assign: Inband,1,reply,des-ecb;7878787878787878

    yeah, I\'m gonna need that by friday...

  3. #3
    AO Curmudgeon rcgreen's Avatar
    Join Date
    Nov 2001
    The nice thing about it is that the GET request is sent in plain text,
    so you can log on to a web server with telnet, and do it manually.

    To retrieve Web documents using HTTP, the client (you)
    must issue a GET request. The syntax of a GET request is as follows:
    GET document-name HTTP-version

    You can do it with telnet or netcat:

    [rcgreen@acer rcgreen]$ nc google.com 80
    GET / HTTP/1.1
    HTTP/1.1 200 OK
    Content-Length: 2690
    Server: GWS/2.1
    Date: Mon, 15 Sep 2003 20:04:46 GMT
    Content-Type: text/html
    Cache-control: private
    Set-Cookie: PREF=ID=5bd6a80e6a3b9ea3:TM=1063656286:LM=1063656286:S=x2BNEz_-ZUATmbRG; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
    <html><head><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"><title>Google</title><style><!--
    .h{font-size: 20px;}
    .q{text-decoration:none; color:#0000cc;}
    function sf(){document.f.q.focus();}
    // -->
    </head><body bgcolor=#ffffff text...etc. etc. etc.
    I came in to the world with nothing. I still have most of it.

  4. #4
    Senior Member
    Join Date
    Sep 2003
    Here is a simple VB source code. All You Need is..
    1 Richtextbox or Regular (Text1)
    1 Winsock (sckTCP)
    1 CommonDialog (cdialog)
    2 Textboxes (Text2, Text3)
    1 CommandButton (Command1)

    Private Sub Command1_Click()
    Text1.Text = ""
    sckTCP.RemoteHost = Text3
    sckTCP.RemotePort = 80
    If sckTCP.State = 7 Then Exit Do
    sckTCP.SendData "GET " & Text2 & " HTTP/1.0" & vbCrLf & "Accept: */*" & vbCrLf & "Accept: text/html" & vbCrLf & vbCrLf
    End Sub
    Private Sub Form_QueryUnload(Cancel As Integer, UnloadMode As Integer)
    End Sub
    Private Sub sckTCP_DataArrival(ByVal bytesTotal As Long)
    sckTCP.GetData temp$
    Text1.SelText = temp$
    Text1.SelStart = Len(Text1)
    End Sub
    AntiOnline Quick Forum Version 2b Click Here

  5. #5
    Senior Member tampabay420's Avatar
    Join Date
    Aug 2002
    Here is a nice lil' Perl example...
    usage: "perl hclient.pl http://www.antionline.com"
      use LWP::UserAgent; $http_client = LWP::UserAgent->new; $http_client->agent("Perl/5.8 ");  my $data_request = HTTP::Request->new(POST => $ARGV[0]);  $data_request->content_type('application/x-www-form-urlencoded');  $data_request->content('match=www&errors=0');  my $data_response = $http_client->request($data_request);  if ($data_response->is_success) {      print $data_response->content;  } else {      print "\n[NO DATA]\n";  }
    yeah, I\'m gonna need that by friday...

  6. #6
    Jaded Network Admin nebulus200's Avatar
    Join Date
    Jun 2002
    Don't forget nice little tools like:

    wget http://www.gnu.org/software/wget/wget.html

    curl http://curl.haxx.se/

    curlssl (I think it is covered by curl now).

    They add some nice functionality, that while possible to do manually, is much easier to do via wget or curl (especially when interacting with forms).

    There is only one constant, one universal, it is the only real truth: causality. Action. Reaction. Cause and effect...There is no escape from it, we are forever slaves to it. Our only hope, our only peace is to understand it, to understand the 'why'. 'Why' is what separates us from them, you from me. 'Why' is the only real social power, without it you are powerless.

    (Merovingian - Matrix Reloaded)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts