The merged property values definition view
The merged property values definition view is used to calculate the property values of merged i2 Analyze records. You can modify the view to define which property values records receive.
The merged property values definition view is used to calculate a single value for each property type for each merged record.
After you inspect the merge contributors view and identify the rules that you want to define, you must modify the SQL statement to create a view that matches your requirements. For more information about the merge contributors view, see The merge contributors view.
After you create a merged property values definition view, you must maintain the view through the lifetime of your deployment. For more information, see How to maintain your merged property values definition view.Requirements of the view
The merged property values definition view is
over the merge contributors view. To produce a
single value for each property type, the view must
contain one row for each item_id
and populate each column with a value. You can
include NULL values for property types that have
no value.
- The view must have columns for the item
identifier and every property type column in the
merge contributors view. The column names in both
views must be the same.
The merged property values definition view must not contain the
origin_id_type
,origin_id_keys
,source_last_updated
,correlation_id_type
,correlation_id_keys
, oringestion_source_name
columns. - Each item identifier must be unique.Important: You cannot ingest data into the Information Store if the view contains multiple rows with the same item identifier.
Default behavior
SELECT item_id, p_unique_reference, p_title,
... (all property type columns) ...
p2_issued_date_and_time, p3_issued_date_and_time
FROM
(
SELECT MCV.*,
ROW_NUMBER ( ) OVER (PARTITION BY MCV.item_id
ORDER BY source_last_updated DESC NULLS LAST,
origin_id_keys DESC,
origin_id_type DESC)
AS partition_row_number
FROM IS_Public.E_Person_MCV AS MCV
INNER JOIN IS_Public.E_Person_STP AS STP
ON MCV.item_id = STP.item_id
) AS P
WHERE partition_row_number = 1
- To ensure that the view contains the correct
columns, the first
SELECT
statement returns theitem_id
and all property type columns. For example,p_title
. - To split the rows into groups that contain
only the data that contributed to a single merged
record, the
OVER
andPARTITION BY
clauses are used on theitem_id
column. - To define the value for each property, the
ROW_NUMBER
function andORDER BY
clauses are used. For the default behavior, theORDER BY
clause is used on thesource_last_updated
,origin_id_keys
, andorigin_id_type
columns. Each column is in descending order. By usingNULLS LAST
, you ensure that if thesource_last_updated
column does not contain a value, it is listed last.The
ROW_NUMBER
function returns the number of each row in the partition. After the rows are ordered, the first row is number1
. The initialSELECT
statement uses aWHERE
clause to select the values for each row with a value of1
forpartition_row_number
. Because only one row has a value of1
, the view contains only one row for each item identifier. - When you enable merged property views by using
the
enableMergedPropertyViews
toolkit task, a table alias for an internal staging table is also created. For example,IS_Public.E_Person_STP
is the table alias for the staging table that is created during a Person item type ingestion.During the ingestion process, the internal staging table contains the item identifiers of the records to be inserted or updated as part of the current ingestion in theitem_id
column. When the property values for records are calculated by using the_MPVDV
view, it uses the information in the_STP
alias to restrict the number of rows that are processed. The following statement from the previous example ensures that the_MPVDV
view uses the alias for a specific item type:INNER JOIN IS_Public.E_Person_STP AS STP ON MCV.item_id = STP.item_id
Examples
- Ingestion Source Name precedence
- In this example, the data is ordered so that a
specified ingestion source takes precedence. For
example, the DVLA ingestion
source takes precedence over the
PNC
one.
If an ingestion source name does not match DVLA or PNC, it is ordered last. If multiple contributing pieces of data have the same ingestion source name, the behavior is non-deterministic. In a production system, you must include another clause that provides deterministic ordering in all scenarios.SELECT item_id, p_unique_reference, p_title, ... (all property type columns) ... p2_issued_date_and_time, p3_issued_date_and_time FROM ( SELECT MCV.*, ROW_NUMBER ( ) OVER (PARTITION BY MCV.item_id ORDER BY CASE ingestion_source_name WHEN 'DVLA' THEN 1 WHEN 'PNC' THEN 2 ELSE 3 END ) AS partition_row_number FROM IS_Public.E_Person_MCV AS MCV INNER JOIN IS_Public.E_Person_STP AS STP ON MCV.item_id = STP.item_id ) AS P WHERE partition_row_number = 1
To order the ingestion sources, the
ORDER BY CASE
clause is used. The rows are ordered by the value that they are assigned in theCASE
expression. For more information about theORDER BY CASE
statement:- For PostgreSQL, see Sorting Rows (ORDER BY) and Simple CASE.
- For SQL Server, see Using CASE in an ORDER BY clause.
- For Db2, see ORDER BY clause and CASE expression.
The
ROW_NUMBER
function and theOVER
andPARTITION BY
clauses are used in the same way as the view for the default behavior. - Source last updated and non-NULL
- In this example, the data is ordered by the
source last updated time. However if there is a
NULL value for a particular property type, the
value is taken from the data that has the next
most recent time. This process is completed until
a value for the property is found, or no data is
left for that record. The definitions are slightly
different depending on the database management
system that hosts the Information Store database.
- Db2
-
SELECT DISTINCT item_id, FIRST_VALUE(p_first_given_name, 'IGNORE NULLS') OVER ( PARTITION BY MCV.item_id ORDER BY source_last_updated DESC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS p_first_given_name, ... (FIRST_VALUE function for all property types) ... FIRST_VALUE(CAST(p_additional_informatio AS VARCHAR(1000)), 'IGNORE NULLS') OVER ( PARTITION BY MCV.item_id ORDER BY source_last_updated DESC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS p_additional_informatio FROM IS_Public.E_Person_MCV INNER JOIN IS_Public.E_Person_STP AS STP ON MCV.item_id = STP.item_id
- PostgreSQL and SQL Server
-
SELECT DISTINCT P.item_id, P1.p_first_given_name, P2.p_additional_informatio ... (all property type columns) ... FROM IS_Public.E_Person_MCV AS P LEFT OUTER JOIN ( SELECT DISTINCT MCV.item_id, FIRST_VALUE(p_first_given_name) OVER ( PARTITION BY MCV.item_id ORDER BY CASE source_last_updated WHEN NULL THEN CAST('0001-01-01' AS date) ELSE source_last_updated END DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS p_first_given_name FROM IS_Public.E_Person_MCV AS MCV INNER JOIN IS_Public.E_Person_STP AS STP ON MCV.item_id = STP.item_id WHERE p_first_given_name IS NOT NULL ) AS P1 ON P1.item_id = P.item_id ... (LEFT OUTER JOIN and FIRST_VALUE function for all property types) ... LEFT OUTER JOIN ( SELECT DISTINCT MCV.item_id, FIRST_VALUE(CAST(p_additional_informatio AS VARCHAR(1000))) OVER ( PARTITION BY MCV.item_id ORDER BY CASE source_last_updated WHEN NULL THEN CAST('0001-01-01' AS date) ELSE source_last_updated END DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS p_additional_informatio FROM IS_Public.E_Person_MCV AS MCV INNER JOIN IS_Public.E_Person_STP AS STP ON MCV.item_id = STP.item_id WHERE p_additional_informatio IS NOT NULL ) AS P2 ON P2.item_id = P.item_id
FIRST_VALUE
function to identify which data has the first non-NULL value for a property type. TheFIRST_VALUE
function returns the first value in an ordered set of values. You must specify a scalar expression, andPARTITION BY
,ORDER BY
, andRANGE
clauses. In the example, the scalar expressions are the property type columns. ThePARTITION BY
andORDER BY
clauses are similar to the other examples, where they act on the item identifier and the source last updated time.In Db2, the
FIRST_VALUE
function allows you to ignore null values by specifying'IGNORE NULLS'
in the expression. This functionality is not available in PostgreSQL or SQL Server. To achieve the same result, you must perform a series ofLEFT OUTER JOIN
s for eachFIRST_VALUE
function. In theORDER BY CASE
clause for eachFIRST_VALUE
function, when the value for the column in theFIRST_VALUE
function isNULL
you can set the source last updated time to0001-01-01
. Setting the last updated time to this value and ordering the partition this way ensures that the row with a value for the property and the most recent source last updated time is returned.For more information about the
FIRST_VALUE
function:- For PostgreSQL, see Window Functions.
- For SQL Server, see FIRST_VALUE (Transact-SQL).
- For Db2, see OLAP specification - FIRST_VALUE.
To use theFIRST_VALUE
function, you must define a function for each property type separately. In the previous examples, there are two examples of theFIRST_VALUE
function for two property types:- The first acts on the
p_first_given_name
column. - The second acts on the
p_additional_informatio
column. The same process is completed, however in the merge contributors view thep_additional_informatio
column is of data type CLOB(32M). This data type is not supported by theFIRST_VALUE
function. To use this column, you must cast it into a supported data type. For example, VARCHAR(1000). You must ensure that if a column is cast, that you do not lose any data.