92 C H A P T E R 5 | Broader data access
Example 5-30: Creating a database-scoped credential for Kerberos-enabled Hadoop
-- Create a master key if one does not already exist
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '';
-- Create a database-scoped credential
CREATE DATABASE SCOPED CREDENTIAL HadoopCredential
WITH IDENTITY = '', SECRET = '';
External data source
An external data source tells PolyBase where to find your data. If you are using blob storage, you
specify the name of the blob container and the storage account in the LOCATION argument of the
CREATE EXTERNAL DATA SOURCE statement, as shown in Example 5-31. The CREDENTIAL argument
uses the database-scoped credential that you created for accessing blob storage.
Example 5-31: Creating an external data source for blob storage
CREATE EXTERNAL DATA SOURCE AzureBlobStorage with (
TYPE = HADOOP,
LOCATION ='wasbs://@.blob.core.windows.net/',
CREDENTIAL = AzureBlobStorageCredential
);
To create an external data source for Hadoop storage, you specify the Uniform Resource Indicator
(URI) for Hadoop in the LOCATION argument of the CREATE EXTERNAL DATA SOURCE statement, as
shown in Example 5-32. For Hadoop storage, you can optionally include the
RESOURCE_MANAGER_LOCATION argument and provide the address and port for the Hadoop Name
Node. If you enabled pushdown computations for PolyBase, SQL Server's query optimizer can send
data to the Name Node for preprocessing to reduce the volume of data transferred between SQL
Server and Hadoop. If you omit this argument, pushdown computations are disabled.
Example 5-32: Creating an external data source for Hadoop storage
CREATE EXTERNAL DATA SOURCE HadoopStorage WITH (
TYPE = HADOOP,
LOCATION ='hdfs://10.10.10.10:8020',
RESOURCE_MANAGER_LOCATION = '10.10.10.10:8032',
CREDENTIAL = HadoopCredential
);
External file format
You must create an external file format to describe the general structure of your data by using the
CREATE EXTERNAL FILE FORMAT statement applicable to your file type, as shown in Example 5-33.
You define the structure of your data by using the following arguments:
FORMAT_TYPE You can use one of the following options: PARQUET, ORC, RCFILE, or
DELIMITEDTEXT. If you use RCFILE, you must also specify one of the following Hive Serializer and
Deserializer (SerDe) methods: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe
or org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe.