Frequency Word List is driving me mad

Why is nothing ever easy.
I managed to get Frequency List Builder up and running in no time. It can work pretty fast and gives me decent output.

What do i find ?? the input has errors.. of course that is bound to happen. So what do i do ? well i change my code to work around common bugs. ‘il is actually ‘ll and mr. is same as mr etc etc.

so what happens next.. i test it and its all beautiful. Except i have no idea the structure of words in non English languages. Damn!!!!!

The xml wiki dumps are just a pile of xml / html mess. I’ll probably have to write more code to strip the unwanted data before i start looking for real data.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s