Oops! Was Re: [DGD] LPC algorithm question

Noah Gibbs angelbob at monkeyspeak.com
Sat Feb 23 23:00:20 CET 2002


  I'm wrong -- you can hash just fine on an LWO.  I was just not
initializing my hash table properly!
  Mea culpa.  I always find it right after hitting "Send" :-)

-- 
angelbob at monkeyspeak.com
See my page of DGD documentation at
"http://www.angelbob.com/projects/DGD_Page.html"
If you post to the DGD list, you may see yourself there!

On Sat, 23 Feb 2002, Noah Gibbs wrote:

>   I'm writing a function, filter, which looks like this:
> 
> private mixed* filter(mixed* entries);
> 
>   Entries is an array, where each element looks something like this:
> 
> ["linkdead", <phrase>, (stuff)]
> 
>   The first element, a string, is a keyword for a help entry.  The second
> element, "<phrase>" is an LWO containing, among other things, an array of
> string sorted by locale (English, Spanish, etc).  There is a one-to-many
> mapping between keywords and help entries, so for instance a particular
> help entry might be listed by the keywords "policy", "policies",
> "rule" and "rules".  The <phrase> structure will be the same for all four 
> entries for this help but the leading keyword string, the first element of
> an entry, will be different.  In other words:
> 
>   ["policy", <phrase>, ...]
>   ["policies", <phrase>, ...]
>   ["rule", <phrase>, ...]
>   ["rules", <phrase>, ...]
> 
> Note <phrase> points to the same LWO in all four cases -- all of this is
> stored in the same heavyweight object (/usr/common/sys/helpd), so that
> should work fine.
> 
> I want to take a list of entries like this and eliminate all duplicate
> <phrase> structures so that if the Soundex algorithm cheerfully came back
> with both "policy" and "policies" as a match for "help pollissee" only one
> of the two would be shown since they have the same text as their
> description.
> 
> My first thought was to keep a hash table just within the function and
> hash each entry into the table using the phrase structure as a key, then
> just return the map_values array from the table.  This doesn't work since
> you can't use an LWO as an index into a hash table.  Hashing on the
> object_name of an LWO doesn't help.  In C I'd have a pointer...
> 
> I could just grab the English string out of each one in turn and use
> *that* as the hash key, though that will take more CPU since the help
> descriptions can be a page or more in length.  It would also be more
> complex and error-prone since some entries may conceivably only exist in
> non-English languages so I'd need a number of lines of code to fish
> through the locales and come up with the right one to hash on in each
> case.
> 
> Is there some easy way to do this without checking each one and
> eliminating the duplicates with a linear search?  I'd like to avoid that
> since it would be O(n^2) time and I'm trying to add some more permissive
> Soundex matching which would return a much larger number of entries for 
> sorting...
> 
> -- 
> angelbob at monkeyspeak.com
> See my page of DGD documentation at
> "http://www.angelbob.com/projects/DGD_Page.html"
> If you post to the DGD list, you may see yourself there!
> 
> _________________________________________________________________
> List config page:  http://list.imaginary.com/mailman/listinfo/dgd
> 

_________________________________________________________________
List config page:  http://list.imaginary.com/mailman/listinfo/dgd



More information about the DGD mailing list