Htmal grabber
Results 1 to 6 of 6

Thread: Htmal grabber

  1. #1
    Junior Member
    Join Date
    Sep 2003
    Posts
    9

    Htmal grabber

    heya guys after looking around the interent i found something called a html grabber
    how do they work ?
    the tallest blade of grass is first to be cut by the lawnmower

  2. #2
    Senior Member tampabay420's Avatar
    Join Date
    Aug 2002
    Posts
    953
    they open socket with the webserver, using the GET call... the HTTP Server will then send the Document data (in this case, html) . It's a very simple protocol.

    Here is the RFC: ftp://ftp.rfc-editor.org/in-notes/rfc2660.txt

    An appropriate HTTP request to dereference this URL would be:

    ============================================================
    GET /secret HTTP/1.0
    Security-Scheme: S-HTTP/1.4
    User-Agent: Web-O-Vision 1.2beta
    Accept: *.*
    Key-Assign: Inband,1,reply,des-ecb;7878787878787878

    ============================================================
    yeah, I\'m gonna need that by friday...

  3. #3
    AO Curmudgeon rcgreen's Avatar
    Join Date
    Nov 2001
    Posts
    2,716
    The nice thing about it is that the GET request is sent in plain text,
    so you can log on to a web server with telnet, and do it manually.

    To retrieve Web documents using HTTP, the client (you)
    must issue a GET request. The syntax of a GET request is as follows:
    GET document-name HTTP-version
    http://www.dgate.org/~brg/bvtelnet80/

    You can do it with telnet or netcat:

    Code:
    [rcgreen@acer rcgreen]$ nc google.com 80
    GET / HTTP/1.1
    
    HTTP/1.1 200 OK
    Content-Length: 2690
    Server: GWS/2.1
    Date: Mon, 15 Sep 2003 20:04:46 GMT
    Content-Type: text/html
    Cache-control: private
    Set-Cookie: PREF=ID=5bd6a80e6a3b9ea3:TM=1063656286:LM=1063656286:S=x2BNEz_-ZUATmbRG; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
    
    <html><head><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"><title>Google</title><style><!--
    body,td,a,p,.h{font-family:arial,sans-serif;}
    .h{font-size: 20px;}
    .q{text-decoration:none; color:#0000cc;}
    //-->
    </style>
    <script>
    <!--
    function sf(){document.f.q.focus();}
    // -->
    </script>
    </head><body bgcolor=#ffffff text...etc. etc. etc.
    I came in to the world with nothing. I still have most of it.

  4. #4
    Senior Member
    Join Date
    Sep 2003
    Posts
    279
    Here is a simple VB source code. All You Need is..
    1 Richtextbox or Regular (Text1)
    1 Winsock (sckTCP)
    1 CommonDialog (cdialog)
    2 Textboxes (Text2, Text3)
    1 CommandButton (Command1)

    Code:
    Private Sub Command1_Click()
    Text1.SetFocus
    Text1.Text = ""
    sckTCP.Close
    sckTCP.RemoteHost = Text3
    sckTCP.RemotePort = 80
    sckTCP.Connect
    Do
    If sckTCP.State = 7 Then Exit Do
    DoEvents
    Loop
    sckTCP.SendData "GET " & Text2 & " HTTP/1.0" & vbCrLf & "Accept: */*" & vbCrLf & "Accept: text/html" & vbCrLf & vbCrLf
    End Sub
    Private Sub Form_QueryUnload(Cancel As Integer, UnloadMode As Integer)
    sckTCP.Close
    End Sub
    
    Private Sub sckTCP_DataArrival(ByVal bytesTotal As Long)
    sckTCP.GetData temp$
    Text1.SelText = temp$
    Text1.SelStart = Len(Text1)
    End Sub
    AntiOnline Quick Forum Version 2b Click Here
    10010101000000110010001100111

  5. #5
    Senior Member tampabay420's Avatar
    Join Date
    Aug 2002
    Posts
    953
    Here is a nice lil' Perl example...
    usage: "perl hclient.pl http://www.antionline.com"
    Code:
      use LWP::UserAgent; $http_client = LWP::UserAgent->new; $http_client->agent("Perl/5.8 ");  my $data_request = HTTP::Request->new(POST => $ARGV[0]);  $data_request->content_type('application/x-www-form-urlencoded');  $data_request->content('match=www&errors=0');  my $data_response = $http_client->request($data_request);  if ($data_response->is_success) {      print $data_response->content;  } else {      print "\n[NO DATA]\n";  }
    yeah, I\'m gonna need that by friday...

  6. #6
    Jaded Network Admin nebulus200's Avatar
    Join Date
    Jun 2002
    Posts
    1,356
    Don't forget nice little tools like:

    wget http://www.gnu.org/software/wget/wget.html

    curl http://curl.haxx.se/

    curlssl (I think it is covered by curl now).

    They add some nice functionality, that while possible to do manually, is much easier to do via wget or curl (especially when interacting with forms).

    /nebulus
    There is only one constant, one universal, it is the only real truth: causality. Action. Reaction. Cause and effect...There is no escape from it, we are forever slaves to it. Our only hope, our only peace is to understand it, to understand the 'why'. 'Why' is what separates us from them, you from me. 'Why' is the only real social power, without it you are powerless.

    (Merovingian - Matrix Reloaded)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •