You got a document, Document.odt. You wonder if you use some words too often. Find it out with:
unzip -p Document.odt content.xml|sed 's/<[^>]*>/ /g'|
sed 's/[^a-zA-Z]/ /g'|grep -Eo "[^ ]{3,}" |
sort -n|uniq -c|
grep -vf ~/words.txt|grep -v "^[ ]*1" |sort -n
Where words.txt is a list of common words for your language, we don’t want to see them. Get the list at http://wortschatz.uni-leipzig.de/html/wliste.html or from sites like http://de.wikipedia.org/wiki/Liste_der_h%C3%A4ufigsten_W%C3%B6rter_der_deutschen_Sprache
You get something like
2 Beitrag 2 Effizienz 2 Hauptteil 2 Technik 3 Autor 4 Collectors 5 Garbage 7 Daten
which is really cool.
Recent Comments