Test purpose only!
All-in-one HDFS container with:
- HDFS namenode
- HDFS secondary namenode
- HDFS datanode
-
mtsrus/hadoop:hadoop2.7.3-hdfs -
mtsrus/hadoop:hadoop2-hdfs- same as above -
mtsrus/hadoop:hadoop3.3.6-hdfs -
mtsrus/hadoop:hadoop3-hdfs- same as above
Minimal resources could start with are:
- 200m CPU
- 700Mb RAM
- 1Gb storage
See docker-compose.yml.
NOTE: Hadoop 2 image uses the same port numbers as Hadoop 3:
9820:9820- HDFS IPC9870:9870- WebHDFS
You can mount custom config files to /var/hadoop/conf directory inside container to override default Hadoop configuration.
The following substitutions are replaced with proper values:
{{hostname}}- current hostname
WAIT_TIMEOUT_SECONDS=120- timeout in seconds after starting each service to check if it is alive
export HADOOP_HEAPSIZE=512- max JVM memory in megabytes, applied for all Hadoop components (if no overrides)
If container fails with OutOfMemory, you should increase this value, e.g. up to 1024 or 2048.
export HADOOP_NAMENODE_OPTS=-Xmx2048m- max JVM memory for Namenodeexport HADOOP_SECONDARYNAMENODE_OPTS=-Xmx2048m- max JVM memory for Secondary Namenodeexport HADOOP_DATANODE_OPTS=-Xmx1024m- max JVM memory for Datanode
All-in-one Yarn container with:
- HDFS namenode
- HDFS secondary namenode
- HDFS datanode
- Yarn ResourceManager
- Yarn NodeManager
- MapReduce JobHistory server (if
WITH_JOBHISTORY_SERVER=true)
-
mtsrus/hadoop:hadoop2.7.3-yarn -
mtsrus/hadoop:hadoop2-yarn- same as above -
mtsrus/hadoop:hadoop3.3.6-yarn -
mtsrus/hadoop:hadoop3-yarn- same as above
Minimal resources could start with are:
- 400m CPU
- 1Gb RAM
- 1Gb storage
See docker-compose.yml.
NOTE: Hadoop 2 image uses the same port numbers as Hadoop 3:
9820:9820- HDFS IPC9870:9870- HDFS WebHDFS8042:8042- NodeManager UI8088:8088- Yarn UI
if WITH_JOBHISTORY_SERVER=true:
10020:10020- MapReduce JobServer19888:19888- MapReduce JobServer History
- /var/hadoop/conf/core-site.xml
- /var/hadoop/conf/hdfs-site.xml
- /var/hadoop/conf/yarn-site.xml
- /var/hadoop/conf/capacity-scheduler.xml
- /var/hadoop/conf/mapred-site.xml
You can mount custom config files to /var/hadoop/conf directory inside container to override default Hadoop configuration.
The following substitutions are replaced with proper values:
{{hostname}}- current hostname
WAIT_TIMEOUT_SECONDS=120- ti_meout in seconds after starting each service to check if it is aliveWITH_JOBHISTORY_SERVER=false- set totrueto start MapReduce JobHistory server
See HDFS image documentation.
export YARN_RESOURCEMANAGER_OPTS=-Xmx1024m- max JVM memory for Yarn ResourceManagerexport YARN_NODEMANAGER_OPTS=-Xmx1024m- max JVM memory for NodeManagerexport HADOOP_JOB_HISTORYSERVER_OPTS=-Xmx1024m- max JVM memory for MapReduce JobHistory server
All-in-one Hive container with:
- HDFS namenode
- HDFS secondary namenode
- HDFS datanode
- Yarn ResourceManager
- Yarn NodeManager
- MapReduce JobHistory server
- Hive server
- Hive Metastore server
-
mtsrus/hadoop:hadoop2.7.3-hive2.3.10 -
mtsrus/hadoop:hadoop2-hive- same as above -
mtsrus/hadoop:hadoop3.3.6-hive3.1.3 -
mtsrus/hadoop:hadoop3-hive- same as above
Minimal resources could start with are:
- 500m CPU
- 2Gb RAM
- 1Gb storage
- Running RDBMS (e.g. Postgres) instance to operate Metastore
See docker-compose.yml.
NOTE: Hadoop 2 image uses the same port numbers as Hadoop 3:
9820:9820- HDFS IPC9870:9870- HDFS WebHDFS
if WITH_HIVE_SERVER=true:
8042:8042- NodeManager UI8088:8088- Yarn UI19888:19888- MapReduce JobServer History10000:10000- Hive server10002:10002- Hive Admin UI
if WITH_HIVE_METASTORE_SERVER=true:
9083:9083- Hive Metastore server
You can mount custom config files to /var/hive/conf directory inside container to override default Hive configuration.
HDFS and Yarn configs still can be passed to var/hadoop/conf directory.
The following substitutions are replaced with proper values:
{{hostname}}- current hostname{{HIVE_METASTORE_DB_URL}}-HIVE_METASTORE_DB_URLenv variable (defaultjdbc:postgresql://postgres:5432/metastore){{HIVE_METASTORE_DB_DRIVER}}-HIVE_METASTORE_DB_DRIVERenv variable (defaultorg.postgresql.Driver){{HIVE_METASTORE_DB_USER}}-HIVE_METASTORE_DB_USERenv variable (defaulthive){{HIVE_METASTORE_DB_PASSWORD}}-HIVE_METASTORE_DB_PASSWORDenv variable (defaulthive)
Hive stores metadata in {{HIVE_METASTORE_DB_URL}} using driver from {{HIVE_METASTORE_DB_DRIVER}}. By default, Postgres is used.
You can change URL components by setting environment variables mentioned above, or replace the entire URL by updating the /var/hive/conf/hive-site.xml file.
You can also use any other supported RDMBS, like MySQL, by changing connection URL and embedding/mounting JDBC driver to /opt/hive/lib/drivername.jar path inside container. Postgres JDBC driver is already embedded into image.
WAIT_TIMEOUT_SECONDS=120- timeout in seconds after starting each service to check if it is aliveWITH_HIVE_SERVER=true- set tofalseto disable Hive serverWITH_HIVE_METASTORE_SERVER=true- set tofalseto disable Hive metastore server
See HDFS image documentation.
See Yarn image documentation.
export HIVE_SERVER2_HEAPSIZE=256- max JVM memory in megabytes for Hive serverexport HIVE_METASTORE_HEAPSIZE=256- max JVM memory in megabytes for Hive metastore server
See https://www.alibabacloud.com/help/en/emr/emr-on-ecs/user-guide/modify-the-memory-parameters-of-hive