amazon-redshiftHow can I use Amazon Redshift and Amazon EMR together?

Amazon Redshift and Amazon EMR can be used together to perform large-scale data analysis. The data stored in Redshift can be accessed and processed using Amazon EMR, which provides a managed Hadoop framework.

For example, the following code can be used to run a Hive query on data stored in Amazon Redshift:

aws emr create-cluster --name "My Cluster" --release-label emr-5.20.0 \
--applications Name=Hive --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole \
--log-uri s3://my-log-bucket/logs/ --configurations file://config.json \
--steps Type=Hive,Name="My Hive Step",ActionOnFailure=CONTINUE,Args=[-f,script.q]

The --configurations parameter can be used to specify the connection details for Amazon Redshift. The --steps parameter can be used to specify the Hive query to be executed.

The following components are used in this example:

aws emr create-cluster: Creates an Amazon EMR cluster with the specified configuration.
--name "My Cluster": Specifies the name of the cluster.
--release-label emr-5.20.0: Specifies the version of Amazon EMR to use.
--applications Name=Hive: Specifies that Hive should be installed on the cluster.
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole: Specifies the IAM role to use for the cluster.
--log-uri s3://my-log-bucket/logs/: Specifies the S3 bucket to use for logging.
--configurations file://config.json: Specifies the connection details for Amazon Redshift.
--steps Type=Hive,Name="My Hive Step",ActionOnFailure=CONTINUE,Args=[-f,script.q]: Specifies the Hive query to be executed.

Further information about using Amazon Redshift and Amazon EMR together can be found in the Amazon EMR documentation.

Edit this code on GitHub

More of Amazon Redshift

See more codes...