89 C H A P T E R 5 | Broader data access
PolyBase services, but a server reboot is necessary before you can create PolyBase data objects, as we
describe later in this chapter.
If you are using Hadoop, you can optionally perform these additional steps:
1. Enable pushdown computations to your Hadoop cluster by copying the value of the
yarn.application.classpath configuration key from the yarn-site.xml file in the Hadoop
configuration directory and pasting it into the yarn.application.classpath property in the
yarn_site.xml file in the \Program Files\Microsoft SQL Server\\Binn\Polybase\Hadoop\conf directory.
2. Configure Kerberos authentication to your Hadoop cluster by copying configuration key values
from Hadoop configuration files into the value properties of the corresponding files in the
\Program Files\Microsoft SQL Server\\Binn\Polybase\Hadoop\conf
directory, as shown in the following table:
Configuration file Configuration key Description
core-site.xml polybase.kerberos.kdchost KDC host name
polybase.kerberos.realm Kerberos realm
hadoop.security.authentication Authentication configuration, such as
kerberos
hdfs-site.xml dfs.namenode.kerberos.principal Name node security principal, such as
hdfs/_HOST@YOUR-REALM.COM
mapred-site.xml mapreduce.jobhistory.address Job history server IPC host:port, such as
0.0.0.0:10020
mapreduce.jobhistory.principal Kerberos principal name for the job history
server, such as mapred/_HOST@YOUR-
REALM.COM
yarn-site.xml yarn.resourcemanager.principal Yarn resource manager principal name,
such as yarn/_HOST@YOUR-REALM.COM
Scaling out with PolyBase
Because data sets can become quite large in Hadoop or blob storage, you can create a PolyBase
scale-out group, as shown in Figure 5-9, to improve performance. A PolyBase scale-out group has one
head node and one or more compute nodes. The head node consists of the SQL Server database
engine, the PolyBase engine service, and the PolyBase data movement service, whereas each compute
node consists of a database engine and data movement service. The head node receives the PolyBase
queries, distributes the work involving external tables to the data movement service on the available
compute nodes, receives the results from each compute node, finalizes the results in the database
engine, and then returns the results to the requesting client. The data movement service on the head
node and compute nodes is responsible for transferring data between the external data sources and
SQL Server and between the SQL Server instances on the head and compute nodes.