Linux SORT problem

Printable View

April 27th, 2006, 10:02 PM
aura2

Linux SORT problem

I am using slackware 10, and wanted to get more familiar with piping and sorting. I have a document named output.txt and I want to display the top 20 most recurring words along with how many times they were used in the document.
I know I have to use sort and wc but I don't know how to pipe them together.
Any suggestions?
April 27th, 2006, 10:13 PM
aura2

Ah nvm posted too soon :)
I just used
cat output.txt | sort | uniq -c | sort -nr | head -20
April 27th, 2006, 10:27 PM
preacherman481

Are you sure that gives you what you want? Doesn't seem to be working for me. I think it just gives you the first 20 unique lines in alphabetical order. If you are using a regular document it won't work as you desired. The only way it would work the way you were asking, would be if the document was just a list of words one under the other like this:

apple
pear
bear
car
house
etc
etc
April 28th, 2006, 12:33 AM
preacherman481

See this page. You will have to modify the example they use according to your own needs, but it will help you get what you are wanting to do. See the exercise "Frequency analysis of Text" and the solution for the exercise on the same page.
April 28th, 2006, 02:40 AM
aura2

Yeah it works just fine for me, the file I had was already broken up into 1 word per line.

All times are GMT +1. The time now is 09:39 AM.