I was under the impression that among the improvements in FTS under SQL 2000
was faster population of an index.
I started a full text index on a table Monday evening at around 6:00PM.
30 hours later it's at an item count 391678. (actually that's about where
it was 12 hours ago as I recall.)
In any case, this is as slow, if not SLOWER than the SQL 7.0 box we were
running this on.
And this box has better CPUs and presumably better disks (not sure about the
latter.)
There's nothing in the logs that appear to show anything stuck or hung.
We're looking to move our application to SQL 2000, but if it takes as long
to build indices on SQL 2000 as it does on SQL 7.0, we'll have to look at a
3rd party package.
At this it really looks like nothing's happening.
This copy of the DB was restored from our production server. Would that
make a difference?
--
Greg D. Moore
President Green Mountain Software
Personal: http://stratton.greenms.com
Greg,
Yes, that is true. Although it was not as significant as SQL 2000 to SQL
2005 will be. Primarily, the improvements from SQL Sever 7.0 FTS to SQL
Server 2000 FTS were made in the areas of additional reliability and added
features.
For a 400K item count, it should of completed well before 12 hours. Could
you check your server's Application event log for any "Microsoft Search" and
especially MssCi (review the details for stack traces) source events -
errors, warnings AND informational and post anything that seems out of the
ordinary. If nothing is obvious, then it is possible that the true log entry
may of rolled off your server's application event log, if so, then clear the
log and stop the FT Population if it is in progress, and re-run a Full
Population.
Yes, depending upon how the DB was restored (from a backup or via Copy DB
Wizard, DTS, etc.) as well as any differences in hardware configurations
(especially a difference in the drive letters on where the FT Catalogs
resided) can affect and cause problems.
Regards,
John
"Greg D. Moore (Strider)" <mooregr_deleteth1s@.greenms.com> wrote in message
news:VXgkd.95690$l07.9241@.twister.nyroc.rr.com...
> I was under the impression that among the improvements in FTS under SQL
2000
> was faster population of an index.
> I started a full text index on a table Monday evening at around 6:00PM.
> 30 hours later it's at an item count 391678. (actually that's about where
> it was 12 hours ago as I recall.)
> In any case, this is as slow, if not SLOWER than the SQL 7.0 box we were
> running this on.
> And this box has better CPUs and presumably better disks (not sure about
the
> latter.)
> There's nothing in the logs that appear to show anything stuck or hung.
> We're looking to move our application to SQL 2000, but if it takes as long
> to build indices on SQL 2000 as it does on SQL 7.0, we'll have to look at
a
> 3rd party package.
> At this it really looks like nothing's happening.
> This copy of the DB was restored from our production server. Would that
> make a difference?
>
> --
> --
> Greg D. Moore
> President Green Mountain Software
> Personal: http://stratton.greenms.com
>
|||"John Kane" <jt-kane@.comcast.net> wrote in message
news:uMTtEHuxEHA.1204@.TK2MSFTNGP10.phx.gbl...
> Greg,
> Yes, that is true. Although it was not as significant as SQL 2000 to SQL
> 2005 will be. Primarily, the improvements from SQL Sever 7.0 FTS to SQL
> Server 2000 FTS were made in the areas of additional reliability and added
> features.
Well, I can't wait until SQL 2005 unfortunately. :-)
> For a 400K item count, it should of completed well before 12 hours.
That's what I thought.
>Could
> you check your server's Application event log for any "Microsoft Search"
and
> especially MssCi (review the details for stack traces) source events -
> errors, warnings AND informational and post anything that seems out of the
> ordinary. If nothing is obvious, then it is possible that the true log
entry
> may of rolled off your server's application event log, if so, then clear
the
> log and stop the FT Population if it is in progress, and re-run a Full
> Population.
Nope, as I stated, there's nothing in there out of the ordinary. Only real
event was a master merge right after midnight last night.
> Yes, depending upon how the DB was restored (from a backup or via Copy DB
> Wizard, DTS, etc.) as well as any differences in hardware configurations
> (especially a difference in the drive letters on where the FT Catalogs
> resided) can affect and cause problems.
>
It was a restore from a backup. Does an update stats change things as far
as FT Index is concerned?
In any case, since my first post I gave up, cancelled the obviously failling
population and moved the index to a different drive on the server.
In 20 minutes it's already done 113K rows. So, this is FAR better than what
I was seeing with my first attempt. At this point if it continues like
this, I think performance will be adequate for our needs.
I'll keep you posted on this build.
> Regards,
> John
>
>
> "Greg D. Moore (Strider)" <mooregr_deleteth1s@.greenms.com> wrote in
message[vbcol=seagreen]
> news:VXgkd.95690$l07.9241@.twister.nyroc.rr.com...
> 2000
where[vbcol=seagreen]
> the
long[vbcol=seagreen]
at
> a
>
|||Greg,
Yea, we all have been waiting for Yukon, but alas it's been delayed now for
years...
Q. Does an update stats change things as far as FT Index is concerned?
A. No. update stats are db only and do not affect the FT Index as most
likely the FT Indexing process was "paused", and the posted app. log message
rolled off the event log.
Regards,
John
"Greg D. Moore (Strider)" <mooregr_deleteth1s@.greenms.com> wrote in message
news:Mshkd.1$qv5.0@.twister.nyroc.rr.com...[vbcol=seagreen]
> "John Kane" <jt-kane@.comcast.net> wrote in message
> news:uMTtEHuxEHA.1204@.TK2MSFTNGP10.phx.gbl...
added[vbcol=seagreen]
> Well, I can't wait until SQL 2005 unfortunately. :-)
>
> That's what I thought.
> and
the
> entry
> the
> Nope, as I stated, there's nothing in there out of the ordinary. Only
real[vbcol=seagreen]
> event was a master merge right after midnight last night.
>
DB
> It was a restore from a backup. Does an update stats change things as far
> as FT Index is concerned?
> In any case, since my first post I gave up, cancelled the obviously
failling
> population and moved the index to a different drive on the server.
> In 20 minutes it's already done 113K rows. So, this is FAR better than
what[vbcol=seagreen]
> I was seeing with my first attempt. At this point if it continues like
> this, I think performance will be adequate for our needs.
> I'll keep you posted on this build.
>
> message
SQL[vbcol=seagreen]
6:00PM.[vbcol=seagreen]
> where
were[vbcol=seagreen]
about[vbcol=seagreen]
hung.[vbcol=seagreen]
> long
> at
that
>
|||"John Kane" <jt-kane@.comcast.net> wrote in message
news:ue7E0muxEHA.2568@.TK2MSFTNGP10.phx.gbl...
> Greg,
> Yea, we all have been waiting for Yukon, but alas it's been delayed now
for
> years...
> Q. Does an update stats change things as far as FT Index is concerned?
> A. No. update stats are db only and do not affect the FT Index as most
> likely the FT Indexing process was "paused", and the posted app. log
message
> rolled off the event log.
John, I'll repeat myself again. NOTHING has rolled off the application log.
In any case, I'm not suspecting a DB issue since it started great (doing on
average 6000 rows/minute). In the first 1/2 hour it did over 180K rows.
That was at 12:20 AM.
Since then it appears stuck at 371952 and the CPU usage has dropped to
basically zero. (when it ran for the 1/2 hour I observed it, CPU usage was
about 100% across all 4 CPUs).
The PopulateStatus is still 1 and the size of the errorlog appears to be 103
bytes.
I'm starting to suspect it's the DB, not the FT, but just want to make sure
to see if there's anything else I should be looking at.
|||Greg,
Ok, is it possible that you could save the current Application event log as
a .evt file as well as all non-zero length *.gthr files located under
\FTDATA\SQLServer\GatherLogs and zip these files together and post it here?
If there has been no events roll-off, then an event should be there, and as
Albert Einstein (said?) "God [or the devil] is in the details" as it does
appear that this process is stuck (or paused) for some reason at 371,952,
which is very close to the number (391,678) that you originally posted.
The CPU usage is usually low, except when there are Shadow Merges and at the
end of the Full Population during the Master Merge. A PopulateStatus of 1
means that the "Full population in progress", so hopefully a MSSearch or
MssCi source event will be present to help explain this!
Thanks,
John
"Greg D. Moore (Strider)" <mooregr_deleteth1s@.greenms.com> wrote in message
news:dXmkd.223$qv5.36@.twister.nyroc.rr.com...
> "John Kane" <jt-kane@.comcast.net> wrote in message
> news:ue7E0muxEHA.2568@.TK2MSFTNGP10.phx.gbl...
> for
> message
> John, I'll repeat myself again. NOTHING has rolled off the application
log.
> In any case, I'm not suspecting a DB issue since it started great (doing
on
> average 6000 rows/minute). In the first 1/2 hour it did over 180K rows.
> That was at 12:20 AM.
> Since then it appears stuck at 371952 and the CPU usage has dropped to
> basically zero. (when it ran for the 1/2 hour I observed it, CPU usage
was
> about 100% across all 4 CPUs).
> The PopulateStatus is still 1 and the size of the errorlog appears to be
103
> bytes.
> I'm starting to suspect it's the DB, not the FT, but just want to make
sure
> to see if there's anything else I should be looking at.
>
|||"John Kane" <jt-kane@.comcast.net> wrote in message
news:eznFI00xEHA.1192@.tk2msftngp13.phx.gbl...
> Greg,
> Ok, is it possible that you could save the current Application event log
as
> a .evt file as well as all non-zero length *.gthr files located under
> \FTDATA\SQLServer\GatherLogs and zip these files together and post it
here?
> If there has been no events roll-off, then an event should be there, and
as
> Albert Einstein (said?) "God [or the devil] is in the details" as it does
> appear that this process is stuck (or paused) for some reason at 371,952,
> which is very close to the number (391,678) that you originally posted.
Trust me John, there was NOTHING in the Event log. :-) I can read an event
log.
I also checked the Gatherlogs and nothing was out of the ordinary.
HOWEVER... when I manually stop the population and checked the event log
THEN I got an error.
"The crawl seed <MSSQL75://SQLServer/04659998> in project <SQLServer
SQL0001000007> cannot be accessed. Error: 80040d19 - The filtering process
was stopped because its memory quota was exceeded. To increase the memory
quota of the filtering process, increase the value for the following
registry key: HKEY_LOCAL_MACHINE\Software\Microsoft\Search\1.0\G athering
Manager\FilterProcessMemoryQuota. .
There's not much in the KB about this, but
http://support.microsoft.com/default...b;en-us;308771 appeared to
be close.
So I upped it from 1900000 to 2900000. So far things are working better.
(I did have to rebuild the catalog, but that may be a small price to pay on
a test box. :-)
It's up to 712417 records indexed and still going.
Yeah, I was struck by the total number in the earlier two runs being fairly
close. Not sure if it's coincidence or not.
If this population is still working as of tomorrow I'll call this one
closed.
> The CPU usage is usually low, except when there are Shadow Merges and at
the
> end of the Full Population during the Master Merge. A PopulateStatus of 1
> means that the "Full population in progress", so hopefully a MSSearch or
> MssCi source event will be present to help explain this!
Hopefully, but like I say, there was nothing until I manually stopped it.
Anyway, thanks for the help so far.
> Thanks,
> John
>
>
> "Greg D. Moore (Strider)" <mooregr_deleteth1s@.greenms.com> wrote in
message[vbcol=seagreen]
> news:dXmkd.223$qv5.36@.twister.nyroc.rr.com...
now
> log.
> on
> was
> 103
> sure
>
|||Greg,
Ok, while you may be able to read (all events), many who post here do not
seem to be able to do so... ;-)
Stopping the Full Population and then reviewing the event log was a very
good idea! Are you FT Indexing large (MS Office, Adobe PDF or other file
types?) files stored in an IMAGE column? The KB article doesn't specifically
mention Error: 80040d19, but the recently enhanced (SP3) message text does
make reference to the FilterProcessMemoryQuota registry key, so a connection
between the error and fix could be made, so I will doubt you no moore :-)
I'm not sure how many rows you have left to complete past the 712,417 mark,
but monitor it closely and if it appears to hang again, stop it and check
the event log, if you get the same message increase the
FilterProcessMemoryQuota value. What type of files are you FT Indexing and
what is their avg. and max sizes?
Good Catch!
John
"Greg D. Moore (Strider)" <mooregr_deleteth1s@.greenms.com> wrote in message
news:ZQBkd.468$qv5.49@.twister.nyroc.rr.com...[vbcol=seagreen]
> "John Kane" <jt-kane@.comcast.net> wrote in message
> news:eznFI00xEHA.1192@.tk2msftngp13.phx.gbl...
> as
> here?
> as
does[vbcol=seagreen]
371,952,
> Trust me John, there was NOTHING in the Event log. :-) I can read an
event
> log.
> I also checked the Gatherlogs and nothing was out of the ordinary.
> HOWEVER... when I manually stop the population and checked the event log
> THEN I got an error.
> "The crawl seed <MSSQL75://SQLServer/04659998> in project <SQLServer
> SQL0001000007> cannot be accessed. Error: 80040d19 - The filtering process
> was stopped because its memory quota was exceeded. To increase the memory
> quota of the filtering process, increase the value for the following
> registry key: HKEY_LOCAL_MACHINE\Software\Microsoft\Search\1.0\G athering
> Manager\FilterProcessMemoryQuota. .
> There's not much in the KB about this, but
> http://support.microsoft.com/default...b;en-us;308771 appeared to
> be close.
> So I upped it from 1900000 to 2900000. So far things are working better.
> (I did have to rebuild the catalog, but that may be a small price to pay
on
> a test box. :-)
> It's up to 712417 records indexed and still going.
> Yeah, I was struck by the total number in the earlier two runs being
fairly[vbcol=seagreen]
> close. Not sure if it's coincidence or not.
> If this population is still working as of tomorrow I'll call this one
> closed.
>
> the
1[vbcol=seagreen]
> Hopefully, but like I say, there was nothing until I manually stopped it.
> Anyway, thanks for the help so far.
>
> message
> now
concerned?[vbcol=seagreen]
most[vbcol=seagreen]
application[vbcol=seagreen]
(doing[vbcol=seagreen]
rows.[vbcol=seagreen]
usage[vbcol=seagreen]
be
>
|||"John Kane" <jt-kane@.comcast.net> wrote in message
news:eXHkxK7xEHA.3808@.TK2MSFTNGP15.phx.gbl...
> Greg,
> Ok, while you may be able to read (all events), many who post here do not
> seem to be able to do so... ;-)
> Stopping the Full Population and then reviewing the event log was a very
> good idea! Are you FT Indexing large (MS Office, Adobe PDF or other file
> types?) files stored in an IMAGE column? The KB article doesn't
specifically
> mention Error: 80040d19, but the recently enhanced (SP3) message text does
> make reference to the FilterProcessMemoryQuota registry key, so a
connection
> between the error and fix could be made, so I will doubt you no moore :-)
I have about 8 million rows in this table.
Interestingly enough, last night it appeared to stop at that 712,417 mark
right after I posted.
In fact it was still there when I woke up this morning.
And then I checked a few hours later, it was up over 1 million records and
climbing. But it appears to have done NOTHING for about 8-10 hours. Very
strange.
It's now up to 2,419,059 and still climbing.
> I'm not sure how many rows you have left to complete past the 712,417
mark,
> but monitor it closely and if it appears to hang again, stop it and check
> the event log, if you get the same message increase the
> FilterProcessMemoryQuota value. What type of files are you FT Indexing and
> what is their avg. and max sizes?
>
Not really files, but data uploaded from documents such as Word files,
Quark, Wordperfect, text files, etc.
Not sure avg size, but probably several meg on average. Total table size
is about 44 Gig. The text column is by far the biggest part of the index
(we also have some varchars in there), so I'd guess about 4-5 meg each.
> Good Catch!
> John
>
|||Greg,
Good, so at least you're above the 2.4 million row mark now, with much to
still be processed. Now, that I know you have an 8 million row table, I can
give you a ball park timeframe for when this might take to complete, but
keep in mind that it is a SWAG. As 1 million rows on a poorly configured
server (database & FT Catalog sharing the same disk drive, with 1 CPU and
1GB of RAM) can take up to 14 hours to complete, so expect a similarly
configured server to finish FT Indexing of a table with 8 million rows in
approx. 4 to 5 days. Of course, your mileage may vary and your server's
actual configuration (software & hardware) may be different...
Also, you say that you are not FT Indexing files (i.e.. binary files stored
in an IMAGE datatype), but are in fact storing the text data uploaded from
documents such as Word files, Quark, Wordperfect, text files, etc., so this
textual data is stored in a column defined with the TEXT (or NTEXT)
datatype. Correct? If so, you can get the max and avg metadata via the
following SQL query :
select avg(datalength(Text_column_name)) "Avg. Text Length" from
8_million_row_table_name
select max(datalength(Text_column_name)) "Max. Text Length" from
8_million_row_table_name
While the Text datatype certainly supports well above 4-5 Mb of text in each
row, I'm fairly certain that this is well above what is typical for most FT
Indexing applications and my lie at the heart of the problem. While the BOL
title "Filtering Supported File Types" is referencing "documents" stored in
an IMAGE datatype, when it is stated that "For full-text indexing, a
document must be less than 16 megabytes (MB) in size and must not contain
more than 256 kilobytes (KB) of filtered text." This statement *may* also
apply to plain text stored in a TEXT or NTEXT datatype and this is why
getting both the avg. and especially max values would be most helpful in
understanding this issue.
FYI, I suspect that you've read the SQL 2000 BOL title "Full-text Search
Recommendations" as well as the SQL 2000 FTS deployment white paper at:
http://support.microsoft.com/default...b;en-us;323739 both speak to
software & hardware configuration issues when used with SQL Server 2000 FT
Indexing of tables over 1 million rows.
Thanks,
John
"Greg D. Moore (Strider)" <mooregr_deleteth1s@.greenms.com> wrote in message
news:DPWkd.3206$qv5.2464@.twister.nyroc.rr.com...[vbcol=seagreen]
> "John Kane" <jt-kane@.comcast.net> wrote in message
> news:eXHkxK7xEHA.3808@.TK2MSFTNGP15.phx.gbl...
not[vbcol=seagreen]
> specifically
does[vbcol=seagreen]
> connection
:-)[vbcol=seagreen]
> I have about 8 million rows in this table.
> Interestingly enough, last night it appeared to stop at that 712,417 mark
> right after I posted.
> In fact it was still there when I woke up this morning.
> And then I checked a few hours later, it was up over 1 million records and
> climbing. But it appears to have done NOTHING for about 8-10 hours. Very
> strange.
> It's now up to 2,419,059 and still climbing.
> mark,
check[vbcol=seagreen]
and
> Not really files, but data uploaded from documents such as Word files,
> Quark, Wordperfect, text files, etc.
> Not sure avg size, but probably several meg on average. Total table size
> is about 44 Gig. The text column is by far the biggest part of the index
> (we also have some varchars in there), so I'd guess about 4-5 meg each.
>
>
2012年2月19日星期日
FTS Index Build on SQL 2000 and performance questions
标签:
2000was,
among,
build,
database,
fts,
impression,
improvements,
index,
microsoft,
mysql,
oracle,
performance,
population,
server,
sql,
table,
text
订阅:
博文评论 (Atom)
没有评论:
发表评论