I like the AWS Data Pipeline and love to start EMR clusters with it. Unfortunately it’s currently not possible to use the EMR-4.0.0 release when starting a EMR cluster using the pipeline. You currently just find the amiVersion option in the EmrCluster options. Which is bad if you like to use a Hadoop version greater than 2.4.0. This is not supported in the highest amiVersion 3.9.0.
This is the most up-to-date configuration you can get:
"hadoopVersion": "2.4.0", "amiVersion": "3.9.0"
What does Amazon says to this? I just found this comment in the forum:
currently emr-4.0.0 is not supported on Datapipeline, we are working on it but at the moment I cannot provide and ETA on this.
The post EMR 4.0.0 in AWS Data Pipeline appeared first on Happy Coding Journal.