The merged property values definition view
The merged property values definition view is used to calculate the property values of merged i2 Analyze records. You can modify the view to define which property values records receive.
The merged property values definition view is used to calculate a single value for each property type for each merged record.
After you inspect the merge contributors view and identify the rules that you want to define, you must modify the SQL statement to create a view that matches your requirements. For more information about the merge contributors view, see The merge contributors view.
After you create a merged property values definition view, you must maintain the view through the lifetime of your deployment. For more information, see How to maintain your merged property values definition view.Requirements of the view
The merged property values definition view is
over the merge contributors view. To produce a
single value for each property type, the view must
contain one row for each item_id
and populate each column with a value. You can
include NULL values for property types that have
no value.
- The view must have columns for the item
identifier and every property type column in the
merge contributors view. The column names in both
views must be the same.The merged property values definition view must not contain the origin_id_type,origin_id_keys,source_last_updated,correlation_id_type,correlation_id_keys, oringestion_source_namecolumns.
- Each item identifier must be unique.Important: You cannot ingest data into the Information Store if the view contains multiple rows with the same item identifier.
Default behavior
SELECT item_id, p_unique_reference, p_title,
     ... (all property type columns) ...
     p2_issued_date_and_time, p3_issued_date_and_time
FROM
   (
       SELECT MCV.*,
            ROW_NUMBER ( ) OVER (PARTITION BY MCV.item_id
                                 ORDER BY  source_last_updated DESC NULLS LAST,
                                           origin_id_keys DESC,
                                           origin_id_type DESC)
                           AS partition_row_number
       FROM IS_Public.E_Person_MCV AS MCV
       INNER JOIN IS_Public.E_Person_STP AS STP
               ON MCV.item_id = STP.item_id
   ) AS P
WHERE partition_row_number = 1- To ensure that the view contains the correct
columns, the first SELECTstatement returns theitem_idand all property type columns. For example,p_title.
- To split the rows into groups that contain
only the data that contributed to a single merged
record, the OVERandPARTITION BYclauses are used on theitem_idcolumn.
- To define the value for each property, the
ROW_NUMBERfunction andORDER BYclauses are used. For the default behavior, theORDER BYclause is used on thesource_last_updated,origin_id_keys, andorigin_id_typecolumns. Each column is in descending order. By usingNULLS LAST, you ensure that if thesource_last_updatedcolumn does not contain a value, it is listed last.The ROW_NUMBERfunction returns the number of each row in the partition. After the rows are ordered, the first row is number1. The initialSELECTstatement uses aWHEREclause to select the values for each row with a value of1forpartition_row_number. Because only one row has a value of1, the view contains only one row for each item identifier.
- When you enable merged property views by using
the enableMergedPropertyViewstoolkit task, a table alias for an internal staging table is also created. For example,IS_Public.E_Person_STPis the table alias for the staging table that is created during a Person item type ingestion.During the ingestion process, the internal staging table contains the item identifiers of the records to be inserted or updated as part of the current ingestion in theitem_idcolumn. When the property values for records are calculated by using the_MPVDVview, it uses the information in the_STPalias to restrict the number of rows that are processed. The following statement from the previous example ensures that the_MPVDVview uses the alias for a specific item type:INNER JOIN IS_Public.E_Person_STP AS STP ON MCV.item_id = STP.item_id
Examples
- Ingestion Source Name precedence
- In this example, the data is ordered so that a
specified ingestion source takes precedence. For
example, the DVLA ingestion
source takes precedence over the
PNC
one.
 If an ingestion source name does not match DVLA or PNC, it is ordered last. If multiple contributing pieces of data have the same ingestion source name, the behavior is non-deterministic. In a production system, you must include another clause that provides deterministic ordering in all scenarios.SELECT item_id, p_unique_reference, p_title, ... (all property type columns) ... p2_issued_date_and_time, p3_issued_date_and_time FROM ( SELECT MCV.*, ROW_NUMBER ( ) OVER (PARTITION BY MCV.item_id ORDER BY CASE ingestion_source_name WHEN 'DVLA' THEN 1 WHEN 'PNC' THEN 2 ELSE 3 END ) AS partition_row_number FROM IS_Public.E_Person_MCV AS MCV INNER JOIN IS_Public.E_Person_STP AS STP ON MCV.item_id = STP.item_id ) AS P WHERE partition_row_number = 1To order the ingestion sources, the ORDER BY CASEclause is used. The rows are ordered by the value that they are assigned in theCASEexpression. For more information about theORDER BY CASEstatement:- For PostgreSQL, see Sorting Rows (ORDER BY) and Simple CASE.
- For SQL Server, see Using CASE in an ORDER BY clause.
- For Db2, see ORDER BY clause and CASE expression.
 The ROW_NUMBERfunction and theOVERandPARTITION BYclauses are used in the same way as the view for the default behavior.
- Source last updated and non-NULL
- In this example, the data is ordered by the
source last updated time. However if there is a
NULL value for a particular property type, the
value is taken from the data that has the next
most recent time. This process is completed until
a value for the property is found, or no data is
left for that record. The definitions are slightly
different depending on the database management
system that hosts the Information Store database.- Db2
- 
SELECT DISTINCT item_id, FIRST_VALUE(p_first_given_name, 'IGNORE NULLS') OVER ( PARTITION BY MCV.item_id ORDER BY source_last_updated DESC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS p_first_given_name, ... (FIRST_VALUE function for all property types) ... FIRST_VALUE(CAST(p_additional_informatio AS VARCHAR(1000)), 'IGNORE NULLS') OVER ( PARTITION BY MCV.item_id ORDER BY source_last_updated DESC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS p_additional_informatio FROM IS_Public.E_Person_MCV INNER JOIN IS_Public.E_Person_STP AS STP ON MCV.item_id = STP.item_id
- PostgreSQL and SQL Server
- 
SELECT DISTINCT P.item_id, P1.p_first_given_name, P2.p_additional_informatio ... (all property type columns) ... FROM IS_Public.E_Person_MCV AS P LEFT OUTER JOIN ( SELECT DISTINCT MCV.item_id, FIRST_VALUE(p_first_given_name) OVER ( PARTITION BY MCV.item_id ORDER BY CASE source_last_updated WHEN NULL THEN CAST('0001-01-01' AS date) ELSE source_last_updated END DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS p_first_given_name FROM IS_Public.E_Person_MCV AS MCV INNER JOIN IS_Public.E_Person_STP AS STP ON MCV.item_id = STP.item_id WHERE p_first_given_name IS NOT NULL ) AS P1 ON P1.item_id = P.item_id ... (LEFT OUTER JOIN and FIRST_VALUE function for all property types) ... LEFT OUTER JOIN ( SELECT DISTINCT MCV.item_id, FIRST_VALUE(CAST(p_additional_informatio AS VARCHAR(1000))) OVER ( PARTITION BY MCV.item_id ORDER BY CASE source_last_updated WHEN NULL THEN CAST('0001-01-01' AS date) ELSE source_last_updated END DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS p_additional_informatio FROM IS_Public.E_Person_MCV AS MCV INNER JOIN IS_Public.E_Person_STP AS STP ON MCV.item_id = STP.item_id WHERE p_additional_informatio IS NOT NULL ) AS P2 ON P2.item_id = P.item_id
 FIRST_VALUEfunction to identify which data has the first non-NULL value for a property type. TheFIRST_VALUEfunction returns the first value in an ordered set of values. You must specify a scalar expression, andPARTITION BY,ORDER BY, andRANGEclauses. In the example, the scalar expressions are the property type columns. ThePARTITION BYandORDER BYclauses are similar to the other examples, where they act on the item identifier and the source last updated time.In Db2, the FIRST_VALUEfunction allows you to ignore null values by specifying'IGNORE NULLS'in the expression. This functionality is not available in PostgreSQL or SQL Server. To achieve the same result, you must perform a series ofLEFT OUTER JOINs for eachFIRST_VALUEfunction. In theORDER BY CASEclause for eachFIRST_VALUEfunction, when the value for the column in theFIRST_VALUEfunction isNULLyou can set the source last updated time to0001-01-01. Setting the last updated time to this value and ordering the partition this way ensures that the row with a value for the property and the most recent source last updated time is returned.For more information about the FIRST_VALUEfunction:- For PostgreSQL, see Window Functions.
- For SQL Server, see FIRST_VALUE (Transact-SQL).
- For Db2, see OLAP specification - FIRST_VALUE.
 To use theFIRST_VALUEfunction, you must define a function for each property type separately. In the previous examples, there are two examples of theFIRST_VALUEfunction for two property types:- The first acts on the
p_first_given_namecolumn.
- The second acts on the
p_additional_informatiocolumn. The same process is completed, however in the merge contributors view thep_additional_informatiocolumn is of data type CLOB(32M). This data type is not supported by theFIRST_VALUEfunction. To use this column, you must cast it into a supported data type. For example, VARCHAR(1000). You must ensure that if a column is cast, that you do not lose any data.
 
