2012年2月19日星期日

FTS is accent sensitive?

Hi,
I'm planning to store some romanian documents in SQL image. I understand
that I have to use neutral language.
What I need to know if FTS is accent sensitive: some peoples writes
"acas"(means home), others simply "acasa".
The query "acasa" returns "acasa" and "acas" or only "acasa".
Sorry for my bad english..
Thank you.
Best regards,
Emil Mustea
It is not accent sensitive, or accent aware. searches on "acas" will not
match on row containing "acasa".
"Emil Mustea" <emil@.sonic.ro> wrote in message
news:%239GG17WKEHA.2576@.TK2MSFTNGP12.phx.gbl...
> Hi,
> I'm planning to store some romanian documents in SQL image. I understand
> that I have to use neutral language.
> What I need to know if FTS is accent sensitive: some peoples writes
> "acas"(means home), others simply "acasa".
> The query "acasa" returns "acasa" and "acas" or only "acasa".
> Sorry for my bad english..
> Thank you.
> Best regards,
> Emil Mustea
>
|||> It is not accent sensitive, or accent aware. searches on "acas" will not
> match on row containing "acasa".
So it's true: searches on "acasa" will return both "acasa" and
"acas".Correct?
"Hilary Cotter" <hilaryk@.att.net> wrote in message
news:uZbXCiYKEHA.3316@.tk2msftngp13.phx.gbl...
> It is not accent sensitive, or accent aware. searches on "acas" will not
> match on row containing "acasa".
> "Emil Mustea" <emil@.sonic.ro> wrote in message
> news:%239GG17WKEHA.2576@.TK2MSFTNGP12.phx.gbl...
>
|||Emil,
The answer to your question for SQL Server 2000 (RTM to current SP3a) is no.
I've tested this with both an accent sensitive database and an accent
insensitive database collation using the same data and same server
configuration:
select TextCol, VarcharCol, CharCol from FTSAccent
/*
TextCol VarcharCol CharCol
-- -- --
Kahla HalfPipe jam caf cafe
Kahlua HalfPipe jam cafe cafe
Halfpipe Jam? classic-recipes CD-R
Halfpipe Jam classic recipes CD R
*/
-- Accent testing for "accent insensitive" results
select TextCol from FTSAccent where contains(TextCol,'Kahlua') --
non-accented word
-- Expected Returns: 2 rows - "Kahlua HalfPipe jam" and "Kahla HalfPipe
jam" as this database is accent-Insensitive
-- Actual Results : 1 row - "Kahlua HalfPipe jam"
select TextCol from FTSAccent where contains(TextCol,'Kahla') -- accented
word
-- Expected Returns: 2 rows - "Kahlua HalfPipe jam" and "Kahla HalfPipe
jam" as this database is accent-Insensitive
-- Actual Results : 1 row - "Kahla HalfPipe jam"
-- Accent specific testing for "accent sensitive" results...
select TextCol from FTSAccent where contains(TextCol,'Kahlua') --
non-accented word
-- Expected Returns: 1 row - "Kahlua HalfPipe jam" and NOT "Kahla HalfPipe
jam" as this database is accent-sensitive
-- Actual Results : 1 row - "Kahlua HalfPipe jam"
select TextCol from FTSAccent where contains(TextCol,'Kahla') -- accented
word
-- Expected Returns: 1 row - "Kahla HalfPipe jam" and NOT "Kahlua HalfPipe
jam" as this database is accent-sensitive
-- Actual Results : 1 row - "Kahlua HalfPipe jam"
As you can see the actual results differs from the expected results with
both accent sensitive and accent insensitive database collations and for the
accent insensitive database collation, only the accented or non-accented
search word was returned, but not both. This may or may not be fixed in a
future service pack for SQL Server 2000.
Additionally, for SQL Server 2005 (codename Yukon) will support accent
sensitive or insensitive fulltext search via new T-SQL:
CREATE FULLTEXT CATALOG fulltext_catalog_identifier ON FILEGROUP
filegroup_identifier
IN PATH <root path> WITH ACCENT SENSITIVE | INSENSITIVE AS DEFAULT
Regards,
John
"Emil Mustea" <emil@.sonic.ro> wrote in message
news:OP22HkgKEHA.3392@.TK2MSFTNGP10.phx.gbl...[vbcol=seagreen]
not[vbcol=seagreen]
> So it's true: searches on "acasa" will return both "acasa" and
> "acas".Correct?
>
> "Hilary Cotter" <hilaryk@.att.net> wrote in message
> news:uZbXCiYKEHA.3316@.tk2msftngp13.phx.gbl...
not[vbcol=seagreen]
understand
>
|||Thank you for your detailed reply.
In Romanian we have a 2 letters with accents in the bottom of the letter
""(means "sh" in english) ""(it's sounds like zz in Pizza) . These letters
will be considered accented?
"John Kane" <jt-kane@.comcast.net> wrote in message
news:uLIxb5hKEHA.3216@.tk2msftngp13.phx.gbl...
> Emil,
> The answer to your question for SQL Server 2000 (RTM to current SP3a) is
no.
> I've tested this with both an accent sensitive database and an accent
> insensitive database collation using the same data and same server
> configuration:
> select TextCol, VarcharCol, CharCol from FTSAccent
> /*
> TextCol VarcharCol CharCol
> -- -- --
--
> --
> Kahla HalfPipe jam caf cafe
> Kahlua HalfPipe jam cafe cafe
> Halfpipe Jam? classic-recipes CD-R
> Halfpipe Jam classic recipes CD R
> */
> -- Accent testing for "accent insensitive" results
> select TextCol from FTSAccent where contains(TextCol,'Kahlua') --
> non-accented word
> -- Expected Returns: 2 rows - "Kahlua HalfPipe jam" and "Kahla HalfPipe
> jam" as this database is accent-Insensitive
> -- Actual Results : 1 row - "Kahlua HalfPipe jam"
> select TextCol from FTSAccent where contains(TextCol,'Kahla') --
accented
> word
> -- Expected Returns: 2 rows - "Kahlua HalfPipe jam" and "Kahla HalfPipe
> jam" as this database is accent-Insensitive
> -- Actual Results : 1 row - "Kahla HalfPipe jam"
>
> -- Accent specific testing for "accent sensitive" results...
> select TextCol from FTSAccent where contains(TextCol,'Kahlua') --
> non-accented word
> -- Expected Returns: 1 row - "Kahlua HalfPipe jam" and NOT "Kahla
HalfPipe
> jam" as this database is accent-sensitive
> -- Actual Results : 1 row - "Kahlua HalfPipe jam"
> select TextCol from FTSAccent where contains(TextCol,'Kahla') --
accented
> word
> -- Expected Returns: 1 row - "Kahla HalfPipe jam" and NOT "Kahlua
HalfPipe
> jam" as this database is accent-sensitive
> -- Actual Results : 1 row - "Kahlua HalfPipe jam"
> As you can see the actual results differs from the expected results with
> both accent sensitive and accent insensitive database collations and for
the
> accent insensitive database collation, only the accented or non-accented
> search word was returned, but not both. This may or may not be fixed in a
> future service pack for SQL Server 2000.
> Additionally, for SQL Server 2005 (codename Yukon) will support accent
> sensitive or insensitive fulltext search via new T-SQL:
> CREATE FULLTEXT CATALOG fulltext_catalog_identifier ON FILEGROUP
> filegroup_identifier
> IN PATH <root path> WITH ACCENT SENSITIVE | INSENSITIVE AS DEFAULT
> Regards,
> John
>
> "Emil Mustea" <emil@.sonic.ro> wrote in message
> news:OP22HkgKEHA.3392@.TK2MSFTNGP10.phx.gbl...
> not
> not
> understand
>
|||You're welcome, Emil,
Would you be able to provide the ascii code for these letters from the
Romanian code page?
For example for the english accented letter: SELECT ascii('') -- returns:
224
Thanks,
John
"Emil Mustea" <emil@.sonic.ro> wrote in message
news:OO2gNBwKEHA.2024@.TK2MSFTNGP11.phx.gbl...
> Thank you for your detailed reply.
> In Romanian we have a 2 letters with accents in the bottom of the letter
> ""(means "sh" in english) ""(it's sounds like zz in Pizza) . These
letters[vbcol=seagreen]
> will be considered accented?
>
> "John Kane" <jt-kane@.comcast.net> wrote in message
> news:uLIxb5hKEHA.3216@.tk2msftngp13.phx.gbl...
> no.
> -- -- --
> --
> accented
> HalfPipe
> accented
> HalfPipe
> the
a[vbcol=seagreen]
will[vbcol=seagreen]
will[vbcol=seagreen]
writes
>
|||= 186
= 254
"John Kane" <jt-kane@.comcast.net> wrote in message
news:ez5dgUwKEHA.1392@.TK2MSFTNGP09.phx.gbl...
> You're welcome, Emil,
> Would you be able to provide the ascii code for these letters from the
> Romanian code page?
> For example for the english accented letter: SELECT ascii('') --
returns:[vbcol=seagreen]
> 224
> Thanks,
> John
>
> "Emil Mustea" <emil@.sonic.ro> wrote in message
> news:OO2gNBwKEHA.2024@.TK2MSFTNGP11.phx.gbl...
> letters
is[vbcol=seagreen]
> -- -- --
HalfPipe[vbcol=seagreen]
HalfPipe[vbcol=seagreen]
with[vbcol=seagreen]
for[vbcol=seagreen]
non-accented[vbcol=seagreen]
in
> a
> will
> will
> writes
>
|||Emil,
Yes, I believe that the ascii code values above 127 are "extended character"
and are treated as accented characters.
The best way to confirm this is to put Romanian words that contain these
accented letters in a SQL table defined with one of the following datatypes:
Nvarchar, Nchar, or NText. Then create a FT Index on this column using the
Neutral "Language for Word Breaker" and the confirm that Romanian words that
contain these accented letters, are returned together with similar
non-accented letters using CONTAINS or FREETEXT. "The proof is always in
the pudding" - to paraphrase an old English saying .. <G>
Regards,
John
"Emil Mustea" <emil@.sonic.ro> wrote in message
news:eO7GnM9KEHA.3596@.tk2msftngp13.phx.gbl...[vbcol=seagreen]
> = 186
> = 254
>
> "John Kane" <jt-kane@.comcast.net> wrote in message
> news:ez5dgUwKEHA.1392@.TK2MSFTNGP09.phx.gbl...
> returns:
letter[vbcol=seagreen]
SP3a)[vbcol=seagreen]
> is
accent[vbcol=seagreen]
> -- -- --
> HalfPipe
> HalfPipe
> with
> for
> non-accented
> in
accent
>

没有评论:

发表评论