-
May 3rd, 2004, 11:31 AM
#1
AO RSS Feed.
Hey Hey,
I've had a bit of a bout of insomnia tonight, so I decided to create a shell script.
Code:
#!/bin/bash
nc www.antionline.com 80 < get.txt | grep -B 5 "END TEMPLATE: P_activetopicbits -->" > ao.html
html2text -style pretty -nobs -o ao.txt ao.html
cat ao.html | grep showthread | cut -d " " -f 4 | cut -d "\"" -f 2 | cut -d "<" -f 3 | grep http > links.txt
cat ao.html | grep "Today at" | cut -d ">" -f 4 | cut -d "<" -f 1 > title.txt
cat links.txt | cut -d "&" -f 1 > link1.txt
cat links.txt | cut -d "&" -f 2 > link2.txt
counter=1
echo "<?xml version=\"1.0\" ?>" > ao.xml
echo "<rss version=\"2.0\">" >> ao.xml
echo " <channel>" >> ao.xml
echo " <title>AO RSS Feed</title>" >> ao.xml
echo " <link>http://www.antionline.com</link>" >> ao.xml
echo " <description>Computer Security</description>" >> ao.xml
until [ $counter -eq 20 ]; do
title=`head -n $counter title.txt | tail -n 1`
link1=`head -n $counter link1.txt | tail -n 1`
link2=`head -n $counter link2.txt | tail -n 1`
echo " <item>" >> ao.xml
echo " <title>$title</title>" >> ao.xml
echo " <link>$link1&$link2</link>" >> ao.xml
echo " </item>" >> ao.xml
counter=`expr $counter + 1`
done
echo " </channel>" >> ao.xml
echo "</rss>" >> ao.xml
As you can see it's a fairly simple shell script (you can download it from http://www.seeminglyrandom.info/ao/script.sh if you wish. Anyways it generates a simple RSS feed with the links off the main page (the first 20). It'll display the title of the thread and a link to the thread. It's nothing fancy, but it took a long time since I've never touched XML before and had to fix parse errors. Anyways it seems to be working.. feel free to use it as you wish. For those of you not interested in generating your own. I have a cron on my website running it every 15 minutes. If my host doesn't complain, I plan on changing it to every 5 minutes. You can access the XML feed from http://www.seeminglyrandom.info/ao/ao.xml.
It's not much but hopefully you'll enjoy it.
Peace,
HT
PS... is this an OK forum to post this in? I figure it's security related (it's AO Headlines and I wanted everyone to know it existed).....
-
May 3rd, 2004, 11:51 AM
#2
great stuff..
will look nice next to the lwn.net rss
saves me the time of coming to AO with nothing interesting there 
<edit type=add>
don't forget you need the get.txt file http://www.seeminglyrandom.info/ao/get.txt
People might need html2text
The problem is there are a lot of versions of html2text http://www.google.com/search?q=html2text
Just guessing that this http://userpage.fu-berlin.de/~mbayer....html#download is the version I need..
And it seems to work well !!
</edit>
ASCII stupid question, get a stupid ANSI.
When in Russia, pet a PETSCII.
Get your ass over to SLAYRadio the best station for C64 Remixes !
-
May 3rd, 2004, 10:22 PM
#3
Hey Hey,
Thanks for the comments JinX, I completely forgot that they'd need get.txt.... Anyways.. my feed is a lil slow currently.. I'm having crontab issues with my host. I'm going to move it over to another location I think and I'll post an updated link when I get a chance. For now whenever I remember I'm updating the feed by manual....
Peace,
HT
-
May 4th, 2004, 12:42 AM
#4
Hey Hey,
Well it's a round about method of doing this, however I've got 2 shells, one that doesn't have webspace and one that "apparently" doesn't have a functioning crontab. So i've come to a conclusion. The one with the crontab and no webspace is running lynx every 5 minutes and requesting a page off my server. I'm using -dump and outputing to /dev/null. The page it's requestion contains a single SSI statement to exec the shell script that generates the feed. The result is http://www.seeminglyrandom.info/ao/ao.xml
Enjoy,... and as always Peace,
HT
-
May 11th, 2004, 12:10 PM
#5
BUGFIX
damn.. it seems the new banner (or something else they (jupmedia) changed) killed your rss thingy.. I'm getting all blanks too !!
If I find a fix, I'll post it here . .
<edit type="add">
Well that was easy.. The entire layout of the AO html is changed..
the line could/should be
nc www.antionline.com 80 < get.txt | grep -A 122 "Active In AntiOnline's Forums" > ao.html
(this greps you all the active topics..)
also, I've noticed that although you are using html2text, nothing is done with the outcome.. (ao.txt)
so Let's lose that line (an the dependancy on html2text)
the weird thing I get now is 19 lines and the last is the same as number 18, well the title is, the link isn't..
Found out the cause..
you are grepping for "Today at" and the later posts (numbers 19 and further) are posted "Yesterday at" 
That fixed it !
my updated version..
Code:
#!/bin/bash
nc www.antionline.com 80 < get.txt | grep -A 122 "Active In AntiOnline's Forums" > ao.html
cat ao.html | grep showthread | cut -d " " -f 4 | cut -d "\"" -f 2 | cut -d "<" -f 3 | grep http > links.txt
cat ao.html | grep "Today at" | cut -d ">" -f 4 | cut -d "<" -f 1 > title.txt
cat ao.html | grep "Yesterday at" | cut -d ">" -f 4 | cut -d "<" -f 1 >> title.txt
cat links.txt | cut -d "&" -f 1 > link1.txt
cat links.txt | cut -d "&" -f 2 > link2.txt
counter=1
echo "<?xml version=\"1.0\" ?>" > ao.xml
echo "<rss version=\"2.0\">" >> ao.xml
echo " <channel>" >> ao.xml
echo " <title>AO RSS Feed</title>" >> ao.xml
echo " <link>http://www.antionline.com</link>" >> ao.xml
echo " <description>Computer Security</description>" >> ao.xml
until [ $counter -eq 21 ]; do
title=`head -n $counter title.txt | tail -n 1`
link1=`head -n $counter link1.txt | tail -n 1`
link2=`head -n $counter link2.txt | tail -n 1`
echo " <item>" >> ao.xml
echo " <title>$title</title>" >> ao.xml
echo " <link>$link1&$link2</link>" >> ao.xml
echo " </item>" >> ao.xml
counter=`expr $counter + 1`
done
echo " </channel>" >> ao.xml
echo "</rss>" >> ao.xml
you can also download the latest rss feed and all the scripts..
http://etv.cx/~the_jinx/ao.xml
http://etv.cx/~the_jinx/ao_rss
http://etv.cx/~the_jinx/get.txt
</edit>
-
May 11th, 2004, 08:14 PM
#6
Greetings All:
Unfortunately, with scripts like these you're always opening yourself up to things breaking every time the webmaster adds a new banner (which happens pretty often on this website), moves things around, etc.
You also have to worry about injection of code from people making creative topics, etc. etc. (which probably wouldn't be a big worry considering that it's AO that you're pulling from, but regardless).
Although you did a good job with this HTRegz, perhaps someone should request that JupiterMedia start an official thread dump.
I use to have one, so I'm sure the code is probably still integrated into the site, all they'd have to do is add it as a cron job, and there you have it....
-
May 12th, 2004, 08:51 AM
#7
It could even be made with ssi (php)
I mean.. the queries that make up the frontpage could also very easily make an rss (without much overhead)
PS I made an official request here
ASCII stupid question, get a stupid ANSI.
When in Russia, pet a PETSCII.
Get your ass over to SLAYRadio the best station for C64 Remixes !
-
May 13th, 2004, 09:49 AM
#8
Major Update
Another major upgrade of the AO rss thingy..
the new version has added checking (better ?) for & problems ( makes it & )
Also the new version has added discription of
Topic starter
In what forum
Number of replies
Poster of last reply
Time of last reply
Hope you'll like..
and I'll keep it updated
Code:
nc www.antionline.com 80 < get.txt | grep -A 122 "Active In AntiOnline's Forums" > ao.html
cat ao.html | grep showthread | cut -d " " -f 4 | cut -d "\"" -f 2 | cut -d "<" -f 3 | grep http > links.txt
cat ao.html | grep "Today at" | cut -d ">" -f 4 | cut -d "<" -f 1 > title.txt
cat ao.html | grep "Yesterday at" | cut -d ">" -f 4 | cut -d "<" -f 1 >> title.txt
cat ao.html | grep "Today at" | cut -d ">" -f 6 | cut -d "<" -f 1 > time.txt
cat ao.html | grep "Yesterday at" | cut -d ">" -f 6 | cut -d "<" -f 1 >> time.txt
cat ao.html | grep "Topic Started By: " | cut -d ">" -f2 | cut -d "<" -f1 > starter.txt
cat ao.html | grep "Topic Started By: " | cut -d ">" -f6 | cut -d "<" -f1 > lastreply.txt
cat ao.html | grep "Thread Is In: " | cut -d ">" -f2 | cut -d "<" -f1 > cat.txt
cat ao.html | grep "Thread Is In: " | cut -d ">" -f4 | cut -d "<" -f1 > rep.txt
counter=1
echo "<?xml version=\"1.0\" ?>" > ao.xml
echo "<rss version=\"2.0\">" >> ao.xml
echo " <channel>" >> ao.xml
echo " <title>Antionline RSS Feed</title>" >> ao.xml
echo " <link>http://www.antionline.com</link>" >> ao.xml
echo " <description>Maximum Security for a Connected World</description>" >> ao.xml
until [ $counter -eq 21 ]; do
title=`head -n $counter title.txt | tail -n 1`
title=${title//&/&amp;}
links=`head -n $counter links.txt | tail -n 1`
links=${links//&/&amp;}
desc="Topic Started By: "`head -n $counter starter.txt | tail -n 1`", "
desc=$desc" In "`head -n $counter cat.txt | tail -n 1`", "
desc=$desc`head -n $counter rep.txt | tail -n 1`" Replies, "
desc=$desc"Last Reply By: "`head -n $counter lastreply.txt | tail -n 1`
desc=$desc","`head -n $counter time.txt | tail -n 1`
desc=${desc//&/&amp;}
echo " <item>" >> ao.xml
echo " <title>$title</title>" >> ao.xml
echo " <link>$links</link>" >> ao.xml
echo " <description>$desc</description>" >> ao.xml
echo " </item>" >> ao.xml
counter=`expr $counter + 1`
done
echo " </channel>" >> ao.xml
echo "</rss>" >> ao.xml
latest version: http://www.etv.cx/~the_jinx/ao_rss
the rss feed: http://www.etv.cx/~the_jinx/ao.xml
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
|