2012年2月24日星期五

Full Cached Lookup with Parameters

Parameterized queries are only allowed on partial or none cache style lookup transforms, not 'full' ones. Is there some "trick" to parameterizing a full cache lookup, or should the join simply be done at the source, obviating the need for a full cache lookup at all (other suggestion certainly welcome)

More particularly, I'd like to use the lookup transform in a surrogate key pipeline. However, the dimension is large (900 million rows), so its would be useful to restrict the lookup transform's cache by a join to the source.

For example:

Source query is: select a,b,c from t where z=@.filter (20,000 rows)

Lookup transform query: select surrogate_key,business_key from dimension (900 M rows, not tenable)

Ideal Lookup transform query:

select distinct surrogate_key

,business_key

from dimension d inner join

t on d.business_key = t.c

where t.z = @.filter

Where do the parameters come from? Obviously, you can't parametrize based on incoming row data, because in full cache mode the lookup data is cached before the incoming data is processed.

So your choice is to have some fixed query, or query based on package variables. If this is a fixed query, just create a stored procedure and use it as source. Unfortunately, Lookup Transform does not provide a way to pass variables to the query, but you can probably workaround it by having an auxilarly SQL table where you can store the values of the variables, and then use it in the stored procedure you call from Lookup.|||

You are correct, the filter is based on a package variable. I suppose an auxiiliary SQL table for variable storage is the way to go. I was hoping to get away without creating more tables, but its not really much of a price to pay.

Thanks.

|||Hi,

I have a workaround in my blog post here. Simon Sabin has another one in the comments.

Regards,

dong|||

I use a different approach than the view or file because many instances of the package will be running simultaneously, only differing with regard to the parameters.

The workaround involves using the "Application Name = ;' parameter of the connection string, which maps to the program_name attributes of master..sysprocesses in SQL Server.

Public Sub Main()

Dim connString As OleDbConnectionStringBuilder = New OleDbConnectionStringBuilder()

If Not Dts.Connections.Contains(connectionName) Then

Dts.Events.FireError(0, String.Empty, String.Format("Connection {0} not found", connectionName), String.Empty, 0)

End If

If Dts.Connections.Contains(connectionName) Then

connString.ConnectionString = Dts.Connections(connectionName).ConnectionString

If connString.ContainsKey(ApplicationName) Then

connString.Remove(ApplicationName)

End If

connString.Add(ApplicationName, "SSIS:" + Dts.Variables("User::TheParameter").Value.ToString())

Dts.Connections(connectionName).ConnectionString = connString.ToString()

Dts.Events.FireInformation(0, String.Empty, connectionName + " connection string set to " + connString.ToString(), String.Empty, 0, True)

End If

Dts.TaskResult = Dts.Results.Success

End Sub

|||A bit confused here. I guess the above code is trying to create OLE-DB connections with different Application Names. But where is the logic to limit the query result to be only a sub-set of the lookup table?|||Yes, the script does change the application name in the connection string for the OLEDB connections. The point of the script is the make the application name available in the program_name column of master..sysprocesses. master.dbo.sysprocesses.program_name is then extracted with a table valued function (see getID() below), which is joined to the lookup table, mimicing a parameter.

-- fully cached lookup query (parameterized via table valued function)
select *
from lookup t
inner join dbo.getID() l
where t.theID = l.theID

-- table valued function
CREATE FUNCTION [dbo].[getID] ()
RETURNS @.T TABLE
(theID INT NOT NULL PRIMARY KEY)
AS
BEGIN
INSERT @.T (theID)
SELECT top 1 theID = CAST(REPLACE(program_name,'SSIS:','') as int)
FROM master..sysprocesses
WHERE spid = @.@.SPID
AND ISNUMERIC(REPLACE(program_name,'SSIS:','')) = 1
RETURN
END

没有评论:

发表评论