2012年3月21日星期三

Full Text on Company Names

I am using full text on a colum of company names.
My question is what is the best way to full text search names when a
lot of them have apostraphes such as Dave's Hardware or Tony's
Plumbing?
Should I create a column that doesn't have apostraphe's using
replace() and make that my full text index or do I just change my
containstable(..) to be like containstable(table,col, ' "daves*",
"dave''s*" ')
In order to find Dave's Hardware when somebody search for daves?
Thanks in advance!
-Andy
Ohhh yea,
Sorry
Microsoft SQL Server 2005 - 9.00.2047.00 (Intel X86) Apr 14 2006
01:12:25 Copyright (c) 1988-2005 Microsoft Corporation Standard
Edition on Windows NT 5.2 (Build 3790: Service Pack 1)
|||This should work as long as the s is not in your noise word list. So a
search on Dave's hardware will match with dave's hardware. To search on both
versions of the word you would have to expand your search to both versions -
Dave's Hardware and Daves Hardware. You could use the thesaurus option for
this but you would have to know all versions of the word in advance, ie Dave
and Dave's, Mike and Mike's, Harold and Harold's.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"AndyNY" <AndyWorral@.gmail.com> wrote in message
news:1170713365.985430.214560@.l53g2000cwa.googlegr oups.com...
> Ohhh yea,
> Sorry
> Microsoft SQL Server 2005 - 9.00.2047.00 (Intel X86) Apr 14 2006
> 01:12:25 Copyright (c) 1988-2005 Microsoft Corporation Standard
> Edition on Windows NT 5.2 (Build 3790: Service Pack 1)
>
|||Thanks Hilary,
Another quick question for you. I could swear I saw somewhere how to
set the default rank algorithm for containstable to not be jaccard.
Does this exist? My ranking on my returned results are not even close
to the weightings I set on my words. I think they are compounding on
multiple matching root words.
Any ideas?
If there isn't a way to change algorithm I will go into what I am
trying to do more elaboratly.
Thanks Again!
|||you can do this in other MS search products, but not SQL FTS. Contains uses
a Kyle Peltonen special which is a tf-idf, isabout uses jaccard.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"AndyNY" <AndyWorral@.gmail.com> wrote in message
news:1170806211.551764.179030@.k78g2000cwa.googlegr oups.com...
> Thanks Hilary,
> Another quick question for you. I could swear I saw somewhere how to
> set the default rank algorithm for containstable to not be jaccard.
> Does this exist? My ranking on my returned results are not even close
> to the weightings I set on my words. I think they are compounding on
> multiple matching root words.
> Any ideas?
> If there isn't a way to change algorithm I will go into what I am
> trying to do more elaboratly.
> Thanks Again!
>
|||Hillary,
if my "isabout" looks like this for a search for ricottas
isabout("riccotti's*" weight (0.500), "riccottis*" weight (0.500),
"ricetta's*" weight (0.750), "ricettas*" weight (0.750), "rickett's*"
weight (0.250), "ricketts*" weight (0.250), "ricota's*" weight
(0.750), "ricotas*" weight (0.750), "ricotta's*" weight (1.000),
"ricottas*" weight (1.000), "rigetta's*" weight (0.500), "rigettas*"
weight (0.500))
(this is formed by pulling words that are near spellings using double
metaphone and edit distance from a word dictionary made from all
possible words in names with any words that ends in 's' being
dupilicated to its apostrophe form as well)
when I run it my result set is
Name Rank
Ricotas Pizza 77
Ricotta's Pizza 43
Ricotta's Pizza Pasta Subs 43
Ricotta's Pizza Pasta Subs 43
Shouldn't Ricottas's be highest as that has a rank of 1.00 where the
others have a max rank of .75? Could you explain breifly how jaccard
ranks these?
Thanks!
-Andy
|||I think I might have solved the problem by just creating a view that
removed all 's before indexing.
Thanks for all your help!

没有评论:

发表评论