Thursday, March 19, 2009

Unjumble Algoritm

Algorithm to unjumble a given jumbled word to obtain the valid possible words:
Here I will explain the algorithm to unjumble a jumbled word and form the possible meaningfull words.
This can be done in a bruteforce way. Where in we form different combinations out the letters present in the given
word and try to find the each combination the dictionary for its validity.
This seems to be easy. But let us consider this method from the performance point of view.
Now let us assume that we will be given with a jumbled word of 10 letters. The possible combination
will be of the order 10! (10 factorial) = 3,628,800 words. This implies that we have to search for all these words in the
dictionary to check whether the word formed is valid or not. So the amount of time required to achive the find all the
possible valid words is too high and not a practical solution as the time complexity for this method is O(n!) X
O(Searching in dictionary).
Algorithm to unjumble a given jumbled word to obtain the valid possible words:
To solve this problem we can think of the following algorithm. The time complexity is far far less when
compared to the previous algorithm.
To achive this we have to follow two step methods.
1) To organize the data (The words present in the dictionary in hash-table format). And save this for
the further use.
2) To create the hash-key from the jumbled word given. And use this hash-key to get all the possible
values from the previously saved data.
Let us focus on each these steps in depth to uderstant the algorithm.
1) To organize the data (The words present in the dictionary in hash-table format). And save this for the
further use.
The major concern in this step is to find a hash function that takes individual word of from the
dictionary and gives us the hash-key for it.
Now consider a valid word from British English "inert". If we rearrange the letters of this word in the ascending order, we will get
"einrt".
Simillarly, consider the other valid words in British English "inter", "nitre" and "tiren". If we rearrange the letters of these words in
the ascending order, we will get "einrt" in all the cases.
So, we can use method as our hashing fuction.
In simple words, our hashing function (hash_function), takes a word and produces the hash-key by
rearranging the letters of the given word.
Using this hasing fuction we have observed, for different valid words, we get the same hash-key. So we can use this property to
organize our data (list of all the valid word) with the hash-key asscociated with them.
A snap shot of the organized data will look like this:
...
...
...< hash_key ="aet"> 
   < actual_values  = “ate”, “eat”, “tea”>
...
...
...
...< hash_key = "einrt">
   < actual_values  = "inert", "inter", "nitre", "tiren">
...
...
...
The organizing of the data will be a one time job and this can be used at any point of time in the future to
unjumble the words from the given jumbled words as explained in the next section.
2) To create the hash-key from the jumbled word given. And use this hash-key to get all the possible
values from the previously saved data.
In the previous section we have came across the hash function that generates the hash-key for the word
given.
Now let us consider the same hash function. And let us give it a jumbled word “terni” / “ertin” /
“”erint” etc. The hash-key generated by our hash function after reaaranging of the letters is “einrt”.
Now if look for this hash_key in the saved data, we observe that it has four values attached it and they
are “intert”, “inter”, “nitre” and “tiren”. And these are the possible valid words. And this is what we
needed. So we have seen how we can get all the possible valid words from a given jumbled word in a
single search!!!
So we can summarize these steps as possible.
a) Get the jumbled word.
b) Get the hash_key by reaaranging the letters of the given jumbled word.
c) Use this hash_key and fetch all the associated values from the organized stored hash
table.
Conclusion:
The algorithm is simple and the time complexity is far far better compared to bruted force method very
much practical and easily implementable. The same technique might be applied in other scenarios too.

Download the unjumble application from : http://code.google.com/p/unjumble/downloads/list

1 comment: