Netherlands: Software

Introductie van Micorosoft SQL Server 2016

Issue link: http://hub-nl.insight.com/i/692679

Contents of this Issue

Navigation

Page 204 of 212

192 C H A P T E R 9 | Introducing Azure SQL Data Warehouse Dynamically filter source data Before the Data Flow Task, add one Execute SQL Task to retrieve the minimum key value from the source table and store it in a variable. In a separate Execute SQL Task, get the maximum key value from the target table and store it in another variable. If there is no maximum key value in the target table because the table is empty, store the value of the minimum key instead. Use that latter variable in a WHERE clause appended to the SELECT statement in the data flow's data source object to filter the data and ensure that you load only new data into the SQL Data Warehouse. Be sure to include an ORDER BY clause so that the data is loaded in key order. Add retry logic Set up one variable in a loop as a counter for the current try and another variable in which you define a constant as a maximum number of retries. Use these variables in a For Loop Container. Place your Data Flow Task (and another task that you want to retry in case of failure) inside the container. Create a failure precedence constraint after the Data Flow Task that connects to a Script Task that forces the package execution to sleep briefly. When the sleep duration ends, the loop starts again if the maximum number of retries has not yet been reached. You must also add a success precedence constraint after the Data Flow task that connects to an Expression Task that updates the current try counter to a value that forces a break out of the loop. Statistics for SQL Data Warehouse Like SQL Server, SQL Data Warehouse uses statistics to compare the costs of performing a query in different ways so that it can generate an optimal query plan. Statistics on a single column describe the range and frequency of values in a single column. This information is useful to the query optimizer for estimating how many rows the query will return. Multicolumn statistics provide information about a list of columns, including the single-column statistics for the first column in the list and a density vector for cross-column correlations. This type of statistics helps the query optimizer improve composite joins and grouping operations. Although SQL Data Warehouse relies heavily on statistics, it does not automatically manage statistics for newly inserted or updated data. To achieve optimal query performance, create statistics on each column of the table after your initial load and update them after each substantial change to the data. Note You can use the DBCC SHOW_STATISTICS function in SQL Data Warehouse to understand how up to date the statistics in your warehouse are. To learn more, see "DBCC SHOW_STATISTICS (Transact-SQL)" at https://msdn.microsoft.com/library/ms174384.aspx. Creating statistics As you develop your SQL Data Warehouse, be sure to create single-column statistics on each column in your table and multicolumn statistics on the columns that are regularly used in join operations and GROUP BY clauses. If you later find that the process to update the single-column statistics takes too long, consider dropping statistics on columns that are less frequently queried or updating the statistics on those columns less often. Maintaining statistics You should have a plan to update statistics on a regular basis in your data warehouse. When the distribution of the data changes, the statistics might no longer be valid and could result in query performance that is less than optimal. Unfortunately, you cannot simply measure how up to date statistics are based on their age. A column in which the number of distinct values increases dramatically after each data load, such as a date column, requires a more frequent update of statistics as compared to a column with a relatively small and static number of distinct values, such as a column for gender, which might never require an update. Consequently, the methodology you use to update

Articles in this issue

Links on this page

Archives of this issue

view archives of Netherlands: Software - Introductie van Micorosoft SQL Server 2016