2012年3月9日星期五

Full Text Filter (pdf)

After searching the internet for a way to add an IFilter for pdf files, I
found instructions to do this. I downloaded the Adobe filter (version 6), and
ran the installation.
Then, as per the instructions, ran sp_fulltext_service 'load_os_resources',1
and
sp_fulltext_service 'verify_signature', 0.
I see the pdf filter in sys.fulltext_document_types, however when I try to
search in a pdf doc in the database, I don't get any results. I have
re-imported the pdf file into the database, but this made no difference.
I also made sure the pdf document was not a scanned image, but is actual
text (I did some editing to make sure).
Any ideas what I'm missing here?
Thanks for any help,
Tomt
Hi Tomt,
I understand that although you installed Adobe filter you still could not
perform full-text index query on your document column which stored your pdf
document.
If I have misunderstood, please let me know.
I performed a test at my side and everything worked fine. I recommend that
you check if your steps were same as follows:
1. Store your pdf file in a varbinary column and meanwhile specify an
extension column.
For example:
CREATE TABLE [dbo].[Dictionary](
[ID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
[Doc] [varbinary](max) NULL,
[FileExtension] [nchar](10) NULL,
)
2. Download and install the PDF Filter
(http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611)
3. Execute the following to enable:
exec sp_fulltext_service 'load_os_resources',1
exec sp_fulltext_service 'verify_signature', 0
4. Create and populate your full-text index. For example:
ALTER DATABASE TestDB ADD FILE ( NAME = N'TestDBFT_data',
FILENAME = N'F:\TempData\TestDBFT_data.ndf' , SIZE = 2048KB , FILEGROWTH
=1024KB )
TO FILEGROUP [FullText]
Go
CREATE FULLTEXT CATALOG TestDBCatalog ON FILEGROUP FullText IN PATH
'F:\TempData' AS DEFAULT;
Go
CREATE FULLTEXT INDEX ON Dictionary ([Doc] TYPE COLUMN FileExtension) KEY
INDEX PK_Dictionary ON TestDBCatalog WITH CHANGE_TRACKING AUTO;
Go
After above steps, you may need to wait a while for SQL Server finishing
full population.
One thing you need pay attention to is that you shoud specify a file
extention column for specifying the document extention name "pdf".
Hope this helps. If you have any other questions or concerns, please feel
free to let me know.
Best regards,
Charles Wang
Microsoft Online Community Support
================================================== ===
When responding to posts, please "Reply to Group" via
your newsreader so that others may learn and benefit
from this issue.
================================================== ====
This posting is provided "AS IS" with no warranties, and confers no rights.
================================================== ====
|||Hi Charles, thanks for your response.
I had already gone thru all the steps (and had things working for Word docs,
for example). I added the pdf filter after everything else was set up, then
imported a pdf document with the pdf extension in a column FileExtension, and
the actual file in a column Document.
The table works fine for Word docs, but not for the pdf. Should I just start
all over again? Would it matter that the Adobe filter was added after the
table was created and the catalog set up?
"Charles Wang[MSFT]" wrote:

> Hi Tomt,
> I understand that although you installed Adobe filter you still could not
> perform full-text index query on your document column which stored your pdf
> document.
> If I have misunderstood, please let me know.
> I performed a test at my side and everything worked fine. I recommend that
> you check if your steps were same as follows:
> 1. Store your pdf file in a varbinary column and meanwhile specify an
> extension column.
> For example:
> CREATE TABLE [dbo].[Dictionary](
> [ID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
> [Doc] [varbinary](max) NULL,
> [FileExtension] [nchar](10) NULL,
> )
> 2. Download and install the PDF Filter
> (http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611)
> 3. Execute the following to enable:
> exec sp_fulltext_service 'load_os_resources',1
> exec sp_fulltext_service 'verify_signature', 0
> 4. Create and populate your full-text index. For example:
> ALTER DATABASE TestDB ADD FILE ( NAME = N'TestDBFT_data',
> FILENAME = N'F:\TempData\TestDBFT_data.ndf' , SIZE = 2048KB , FILEGROWTH
> =1024KB )
> TO FILEGROUP [FullText]
> Go
> CREATE FULLTEXT CATALOG TestDBCatalog ON FILEGROUP FullText IN PATH
> 'F:\TempData' AS DEFAULT;
> Go
> CREATE FULLTEXT INDEX ON Dictionary ([Doc] TYPE COLUMN FileExtension) KEY
> INDEX PK_Dictionary ON TestDBCatalog WITH CHANGE_TRACKING AUTO;
> Go
> After above steps, you may need to wait a while for SQL Server finishing
> full population.
> One thing you need pay attention to is that you shoud specify a file
> extention column for specifying the document extention name "pdf".
> Hope this helps. If you have any other questions or concerns, please feel
> free to let me know.
> Best regards,
> Charles Wang
> Microsoft Online Community Support
> ================================================== ===
> When responding to posts, please "Reply to Group" via
> your newsreader so that others may learn and benefit
> from this issue.
> ================================================== ====
> This posting is provided "AS IS" with no warranties, and confers no rights.
> ================================================== ====
>
>
>
>
>
|||Hi Tomt,
There are something differences between storing a PDF and a Word document.
SQL Server 2005 has built-in support for Full-text index on Word document.
You can even directly store a .doc document into a nvarchar(max) field and
no need to specify a file extension for it.
You may consider performing a SQL Server restart to see if it helps. If
not, drop the existing Full-text index and catalog and then perform a fresh
re-creation of them.
If this issue persists, could you please let me know the version of your
SQL Server 2005 and how you imported your .pdf file into the database
table? My test was on SQL Server 2005 SP2 (9.0.3054).
Look forward to your response and have a nice day!
Best regards,
Charles Wang
Microsoft Online Community Support
================================================== ===
When responding to posts, please "Reply to Group" via
your newsreader so that others may learn and benefit
from this issue.
================================================== ====
This posting is provided "AS IS" with no warranties, and confers no rights.
================================================== ====
|||Charles, I removed the index and catalog and re-created them, and re-started
the services and all seems to be working properly now: I can search in pdf's.
Thanks very much for your assistance with this...
"Charles Wang[MSFT]" wrote:

> Hi Tomt,
> There are something differences between storing a PDF and a Word document.
> SQL Server 2005 has built-in support for Full-text index on Word document.
> You can even directly store a .doc document into a nvarchar(max) field and
> no need to specify a file extension for it.
> You may consider performing a SQL Server restart to see if it helps. If
> not, drop the existing Full-text index and catalog and then perform a fresh
> re-creation of them.
> If this issue persists, could you please let me know the version of your
> SQL Server 2005 and how you imported your .pdf file into the database
> table? My test was on SQL Server 2005 SP2 (9.0.3054).
> Look forward to your response and have a nice day!
> Best regards,
> Charles Wang
> Microsoft Online Community Support
> ================================================== ===
> When responding to posts, please "Reply to Group" via
> your newsreader so that others may learn and benefit
> from this issue.
> ================================================== ====
> This posting is provided "AS IS" with no warranties, and confers no rights.
> ================================================== ====
>
>
>
>
>
>

没有评论:

发表评论