I need a file with 200+ known spam emails in order to get bayesian filter/learning option seeded with enough data for it to kick in.
Does anyone have a file of spam to lend me?
I know, I could set up and collect, but I'm busy (and lazy).
Printable View
I need a file with 200+ known spam emails in order to get bayesian filter/learning option seeded with enough data for it to kick in.
Does anyone have a file of spam to lend me?
I know, I could set up and collect, but I'm busy (and lazy).
sorry, it would take me a few days to collect that many. I have 49 in my junk folder at the moment.
yeah, that's my backup plan. it will take a few days here too.
CSR I get around 3,000 per day. Not sure if I can dump the report into CSV. But I'll let you know if I can
Thanks dino. That would be awesome. wouldnt need to be csv. plain old text is fine. In fact, spamassassin can read it if its in maildir or mbox format.
If you are abel to get a file for me, PM me and I will give you an ftp account to send to.
Thanks again. CSR.
P.S. Drinks on me at the bar on the other side.
Sorry mate. I can send you many charts and graphs but the only place that lists the actual emails is in a control window where you can "release" or "Mark as not spam" No way to copy all records or export.
I'm talking about MXLogic SpamSoap admin console if anyone knows a way to dl the quarantine list.
Greetings.
http://www.stopforumspam.com/downloads
Download the CSV maybe that should get you started. :)
Also
http://www.stopforumspam.com/spamdomainsandips
Thanks, but I need the actual spam, not a list of addresses.
I am trying to implement a bayesian filter that will "learn" what is spam vs ham. the code needs a seed file of known spam in order to get started.
No worries. I have been capturing spam on my hotmail account. Should have enough in a few days.
Hi CSR,
I'm afraid my cats eat all the spam around here but this link might help:
http://untroubled.org/spam/
Over 10,000 for May 2009 already!
:)
I don't know what a "Lorien" file is supposed to be, but you can view them as text.
Great find. Thanks Old Man. ;)Quote:
P.S. sa-learn wasnt able to consume the folder as-is. I needed to write a script to loop through the folder list and process them explicitly one at a time. I suspect perl had problems with the large number of files. Easier to write a new script than debug perl. (Lazy).