|
-
April 27th, 2006, 10:02 PM
#1
Banned
Linux SORT problem
I am using slackware 10, and wanted to get more familiar with piping and sorting. I have a document named output.txt and I want to display the top 20 most recurring words along with how many times they were used in the document.
I know I have to use sort and wc but I don't know how to pipe them together.
Any suggestions?
-
April 27th, 2006, 10:13 PM
#2
Banned
Ah nvm posted too soon 
I just used
cat output.txt | sort | uniq -c | sort -nr | head -20
-
April 27th, 2006, 10:27 PM
#3
Are you sure that gives you what you want? Doesn't seem to be working for me. I think it just gives you the first 20 unique lines in alphabetical order. If you are using a regular document it won't work as you desired. The only way it would work the way you were asking, would be if the document was just a list of words one under the other like this:
apple
pear
bear
car
house
etc
etc
For the wages of sin is death, but the free gift of God is eternal life in Christ Jesus our Lord.
(Romans 6:23, WEB)
-
April 28th, 2006, 12:33 AM
#4
See this page. You will have to modify the example they use according to your own needs, but it will help you get what you are wanting to do. See the exercise "Frequency analysis of Text" and the solution for the exercise on the same page.
For the wages of sin is death, but the free gift of God is eternal life in Christ Jesus our Lord.
(Romans 6:23, WEB)
-
April 28th, 2006, 02:40 AM
#5
Banned
Yeah it works just fine for me, the file I had was already broken up into 1 word per line.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
|