2012年3月27日星期二

full text search within single xml document

I have a full text indexed xml column in sql server 2005 containing the
following xml snippet:
<plays>
<folder id="681">
<page id="3155">
<submitdate>12-10-02</submitdate>
<pagetext>Horatio says 'tis but our fantasy,</pagetext>
</page>
<page id="9267">
<submitdate>09-04-04</submitdate>
<pagetext>And will not let belief take hold of him</pagetext>
</page>
</folder>
<folder id="902">
<page id="1853">
<submitdate>22-11-05</submitdate>
<pagetext>Touching this dreaded sight, twice seen of us:</pagetext>
</page>
<page id="8423">
<submitdate>31-05-02</submitdate>
<pagetext>Therefore I have entreated him along</pagetext>
</page>
</folder>
</plays>
I'd like to be able to search this document for keywords, such as
"belief" to return the value '@.id=9267' i.e. return the id value of the
containing node ('page') where the string "belief" was found.
Likewise, if the search was for "him" then '@.id=9267' and '@.id=8423'
would be returned.
Could anyone give me pointers as to how to implement this in XQuery /
Full-text?
Many thanks, DavidHere is a possible implementation:
begin
declare @.x xml
set @.x='
<plays>
<folder id="681">
<page id="3155">
<submitdate>12-10-02</submitdate>
<pagetext>Horatio says ''tis but our fantasy,</pagetext>
</page>
<page id="9267">
<submitdate>09-04-04</submitdate>
<pagetext>And will not let belief take hold of him</pagetext>
</page>
</folder>
<folder id="902">
<page id="1853">
<submitdate>22-11-05</submitdate>
<pagetext>Touching this dreaded sight, twice seen of us:</pagetext>
</page>
<page id="8423">
<submitdate>31-05-02</submitdate>
<pagetext>Therefore I have entreated him along</pagetext>
</page>
</folder>
</plays>
'
declare @.x2 xml
set @.x2 = (SELECT T.c.query('.') AS risultato FROM
@.x.nodes('/plays/folder/page[@.id]') T(c) where T.c.value('.','varchar(50)')
like '%Hor%')
select @.x2.value('(/page/@.id)[1]','int')
end
OK?
"firechaser@.talk21.com" wrote:

> I have a full text indexed xml column in sql server 2005 containing the
> following xml snippet:
> <plays>
> <folder id="681">
> <page id="3155">
> <submitdate>12-10-02</submitdate>
> <pagetext>Horatio says 'tis but our fantasy,</pagetext>
> </page>
> <page id="9267">
> <submitdate>09-04-04</submitdate>
> <pagetext>And will not let belief take hold of him</pagetext>
> </page>
> </folder>
> <folder id="902">
> <page id="1853">
> <submitdate>22-11-05</submitdate>
> <pagetext>Touching this dreaded sight, twice seen of us:</pagetext>
> </page>
> <page id="8423">
> <submitdate>31-05-02</submitdate>
> <pagetext>Therefore I have entreated him along</pagetext>
> </page>
> </folder>
> </plays>
>
> I'd like to be able to search this document for keywords, such as
> "belief" to return the value '@.id=9267' i.e. return the id value of the
> containing node ('page') where the string "belief" was found.
> Likewise, if the search was for "him" then '@.id=9267' and '@.id=8423'
> would be returned.
> Could anyone give me pointers as to how to implement this in XQuery /
> Full-text?
> Many thanks, David
>|||Thanks very much for your time Luca.
Unfortunately I was hoping to use the full-text capabilities in SQL
Server 2005 to produce *multiple* search results from any single xml
snippet stored in a single (and known in advance) row.
All the FTS examples I've seen are of the type
select xmlcol from t1
where contains(xmlcol, '"dog cat"')
which return the whole xml record. Is there a way to accomplish this on
a single record as per xml snippet in my original post which exists as
an xml document in a single row in the table?
I will know *in advance* which record in the table needs to be
queried, so I need to be able to query the actual content of the xml
cell and return 0,1 or more matching results based upon that xml
snippet and *not* based upon matches spanning several rows.
Looking at the above snippet, I need to be able to search for the word
"him" which would give me '@.id=9267' and '@.id=8423' returned.
I'm not looking to search across rows - that appears straightforward -
I'm trying to perform searches *within* the xml cell that exits in a
single (and at the moment, unique) record in my table.
I'd be very grateful for any help in trying to explain this, as I'm at
a loss at the moment.
Thanks again, David|||try this
create database FullTextXML
use FullTextXML
GO
drop table XMLFULLText
drop XML SCHEMA COLLECTION PlaysSchema
CREATE XML SCHEMA COLLECTION PlaysSchema AS '
<xsd:schema targetNamespace="http://www.plays.com/plays"
xmlns ="http://www.plays.com/plays"
elementFormDefault="qualified"
attributeFormDefault="unqualified"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" >
<xsd:element name="plays"><xsd:complexType><xsd:complexContent>
<xsd:restriction base="xsd:anyType"><xsd:choice
maxOccurs="unbounded"><xsd:element name="folder">
<xsd:complexType><xsd:complexContent><xsd:restriction
base="xsd:anyType"><xsd:sequence>
<xsd:element name="page" minOccurs="0"
maxOccurs="unbounded"><xsd:complexType>
<xsd:complexContent><xsd:restriction base="xsd:anyType"><xsd:sequence>
<xsd:element name="submitdate" type="xsd:string" minOccurs="0" />
<xsd:element name="pagetext" type="xsd:string" minOccurs="0" />
</xsd:sequence><xsd:attribute name="id" type="xsd:string"
/></xsd:restriction></xsd:complexContent>
</xsd:complexType></xsd:element></xsd:sequence><xsd:attribute name="id"
type="xsd:string" />
</xsd:restriction></xsd:complexContent></xsd:complexType></xsd:element></xsd
:choice></xsd:restriction>
</xsd:complexContent></xsd:complexType></xsd:element></xsd:schema>'
GO
Create Table XMLFULLText (pk int not null identity constraint XMLFULLTEXTPK
primary key, XMLDOC XML(PlaysSchema))
GO
declare @.XMLDOC XML
set @.XMLDOC ='<plays xmlns="http://www.plays.com/plays"><folder id="681">
<page id="3155">
<submitdate>12-10-02</submitdate>
<pagetext>Horatio says ''tis but our
fantasy,</pagetext>
</page>
<page id="9267">
<submitdate>09-04-04</submitdate>
<pagetext>And will not let belief take hold of
him</pagetext>
</page>
</folder>
<folder id="902">
<page id="1853">
<submitdate>22-11-05</submitdate>
<pagetext>Touching this dreaded sight, twice seen of
us:</pagetext>
</page>
<page id="8423">
<submitdate>31-05-02</submitdate>
<pagetext>Therefore I have entreated him
along</pagetext>
</page>
</folder></plays>'
insert into XMLFULLText (XMLDOC) values(@.XMLDOC)
GO
sp_fulltext_database 'enable'
GO
create fulltext catalog test as default
GO
create fulltext index on XMLFULLText (XMLDOC) key index XMLFULLTextPK
GO
select * from XMLFULLText where contains(*,'belief')
GO
--1 row returned. OK, lets make this a little more interesting
GO
declare @.XMLDOC XML
set @.XMLDOC ='<plays xmlns="http://www.plays.com/plays"><folder id="681">
<page id="3155">
<submitdate>12-10-02</submitdate>
<pagetext>Horatio says ''tis but our
fantasy,</pagetext>
</page>
<page id="9267">
<submitdate>09-04-04</submitdate>
<pagetext>And will not let beliefs take hold of
him</pagetext>
</page>
</folder>
<folder id="902">
<page id="1853">
<submitdate>22-11-05</submitdate>
<pagetext>Touching this dreaded sight, twice seen of
us:</pagetext>
</page>
<page id="8423">
<submitdate>31-05-02</submitdate>
<pagetext>Therefore I have entreated him
along</pagetext>
</page>
</folder>
</plays>'
insert into XMLFULLText (XMLDOC) values(@.XMLDOC)
GO
select * from XMLFULLText where contains(*,'belief')
GO
--1 row returned.
select * from XMLFULLText where FREETEXT(*,'belief')
GO
--2 row returned. OK, lets restrict this to contents coming from an element
declare @.XMLDOC XML
set @.XMLDOC ='<plays xmlns="http://www.plays.com/plays"><folder id="681">
<page id="3155">
<submitdate>belief</submitdate>
<pagetext>Horatio says ''tis but our
fantasy,</pagetext>
</page>
<page id="9267">
<submitdate>09-04-04</submitdate>
<pagetext>And will not let take hold of
him</pagetext>
</page>
</folder>
<folder id="902">
<page id="1853">
<submitdate>22-11-05</submitdate>
<pagetext>Touching this dreaded sight, twice seen of
us:</pagetext>
</page>
<page id="8423">
<submitdate>31-05-02</submitdate>
<pagetext>Therefore I have entreated him
along</pagetext>
</page>
</folder>
</plays>'
insert into XMLFULLText (XMLDOC) values(@.XMLDOC)
GO
select * from XMLFULLText where contains(*,'belief')
GO
--2 rows returned, one in page text and one in submit date
select * from XMLFULLText where FREETEXT(*,'belief')
GO
--2 rows returned. OK, lets restrict this to contents coming from an element
select * from XMLFulltext where contains(*,'belief')
WITH XMLNAMESPACES ('http://www.plays.com/plays' AS pd)
SELECT pk, XMLDOC
FROM XMLFulltext
where contains(XMLDOC, 'belief')
and
xmldoc.exist('/plays/folder/page/pagetext/text()[contains(.,"belief")]')=1
--notice we don't get a hit from 3 where belief is in the submitdate column
WITH XMLNAMESPACES ('http://www.plays.com/plays' AS pd)
SELECT pk, XMLDOC.query('
-- <pd:plays>
--{/pd:folder/pd:page/pd:pagetext}
-- </pd:plays>
--') AS Result
FROM XMLFulltext
where contains(XMLDOC, 'belief')
and
xmldoc.exist('/pd:plays/pd:folder/pd:page/pd:pagetext[1]/text()[contains(.,"
belief")]')=1
--notice we don't get a hit from 3 where belief is in the submitdate column
WITH XMLNAMESPACES ('http://www.plays.com/plays' AS pd)
WITH XMLNAMESPACES
('http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ProductMode
lDescription'
AS pd)
SELECT CatalogDescription.query('
<pd:Product ProductModelID="{
(/pd:ProductDescription/@.ProductModelID)[1] }">
<Picture>
{ /pd:ProductDescription/pd:Picture/pd:Angle }
{ /pd:ProductDescription/pd:Picture/pd:Size }
</Picture>
</pd:Product>
') as Result
FROM Production.ProductModel
WHERE CatalogDescription.exist('/pd:ProductDescription/pd:Picture') = 1
AND
CatalogDescription.value('(/pd:ProductDescription/pd:Picture/pd:Angle)[1]',
'varchar(20)') = 'front'
AND
CatalogDescription.value('(/pd:ProductDescription/pd:Picture/pd:Size)[1]',
'varchar(20)') = 'small'
WITH XMLNAMESPACES ('http://www.plays.com/plays' AS pd)
SELECT pk,*
FROM XMLFulltext
where contains(XMLDOC, 'belief')
and
xmldoc.exist('/pd:plays[1]/pd:folder[1]/pd:page[1]/pd:submitdate[contains(.,
"belief")]')=1
WITH XMLNAMESPACES ('http://www.plays.com/plays' AS pd)
SELECT pk,*
FROM XMLFulltext
where contains(XMLDOC, 'belief')
and
xmldoc.exist('/pd:plays[1]/pd:folder[1]/pd:page[2]/pd:pagetext[contains(.,"b
elief")]')=1
WITH XMLNAMESPACES ('http://www.plays.com/plays' AS pd)
SELECT pk,*
FROM XMLFulltext
where contains(XMLDOC, 'belief')
and
xmldoc.exist('/pd:plays[1]/pd:folder[1]/pd:page/pd:pagetext[contains(.,"beli
ef")]')=1
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
<firechaser@.talk21.com> wrote in message
news:1139433546.490894.304730@.g47g2000cwa.googlegroups.com...
> Thanks very much for your time Luca.
> Unfortunately I was hoping to use the full-text capabilities in SQL
> Server 2005 to produce *multiple* search results from any single xml
> snippet stored in a single (and known in advance) row.
> All the FTS examples I've seen are of the type
> select xmlcol from t1
> where contains(xmlcol, '"dog cat"')
> which return the whole xml record. Is there a way to accomplish this on
> a single record as per xml snippet in my original post which exists as
> an xml document in a single row in the table?
> I will know *in advance* which record in the table needs to be
> queried, so I need to be able to query the actual content of the xml
> cell and return 0,1 or more matching results based upon that xml
> snippet and *not* based upon matches spanning several rows.
> Looking at the above snippet, I need to be able to search for the word
> "him" which would give me '@.id=9267' and '@.id=8423' returned.
> I'm not looking to search across rows - that appears straightforward -
> I'm trying to perform searches *within* the xml cell that exits in a
> single (and at the moment, unique) record in my table.
> I'd be very grateful for any help in trying to explain this, as I'm at
> a loss at the moment.
> Thanks again, David
>|||Even with Hilary's workaround there is no good full-text search support at
the sub XML datatype level yet. Mainly because the XQuery language is just
now working on a full-text extension to XQuery.
Can I please ask you to file a feature request at
http://lab.msdn.microsoft.com/productfeedback/
We are certainly looking into adding such functionality and actually
customer use cases are important for us to motivate the investment to our
purse holders.
As a short-term work around, I recommend to split the sections into one per
row (using the nodes() method) and then FT-index that column.
Best regards
Michael
<firechaser@.talk21.com> wrote in message
news:1139433546.490894.304730@.g47g2000cwa.googlegroups.com...
> Thanks very much for your time Luca.
> Unfortunately I was hoping to use the full-text capabilities in SQL
> Server 2005 to produce *multiple* search results from any single xml
> snippet stored in a single (and known in advance) row.
> All the FTS examples I've seen are of the type
> select xmlcol from t1
> where contains(xmlcol, '"dog cat"')
> which return the whole xml record. Is there a way to accomplish this on
> a single record as per xml snippet in my original post which exists as
> an xml document in a single row in the table?
> I will know *in advance* which record in the table needs to be
> queried, so I need to be able to query the actual content of the xml
> cell and return 0,1 or more matching results based upon that xml
> snippet and *not* based upon matches spanning several rows.
> Looking at the above snippet, I need to be able to search for the word
> "him" which would give me '@.id=9267' and '@.id=8423' returned.
> I'm not looking to search across rows - that appears straightforward -
> I'm trying to perform searches *within* the xml cell that exits in a
> single (and at the moment, unique) record in my table.
> I'd be very grateful for any help in trying to explain this, as I'm at
> a loss at the moment.
> Thanks again, David
>

没有评论:

发表评论