This is a project that contains a rendering raster process via GeoTrellis.
The input data is a Kafka stream, the output - a set of rasters and some json metadata outputs.
The project contains two subprojects: streaming (the actual streaming application) and
producer (subproject used for test purposes, that generates test kafka messages).
To run this app in any Spark mode, be sure that you have a proper installed Spark client on your machine.
To rise a Kafka instance it's enough to run make kafka command (more detailed information about kafka in docker is provided in Kafka in docker section).
Be sure that all neccesary changes were introduced into the application.conf file.
Makefile is provided to simplify launch and integration tests of the application.
| Command | Description |
|---|---|
| local-spark-demo | Run a spark streaming assembly on a local Spark server |
| local-spark-shell | Run a spark shell with included fat jar locally |
| build | Build a fat jar to run on Spark |
| clean | Clean up targets |
| kafka | Run a dockerized kafka, see README.md to know more about it |
| kafka-send-messages | Produce demo kafka messages |
| sbt-spark-demo | Run a spark streaming application from the SBT shell |
Application settings provided via configuration file in the resources folder (streaming).
ingest.stream {
# kafka setting
kafka {
threads = 10
topic = "geotrellis-streaming"
otopic = "geotrellis-streaming-output"
application-id = "geotrellis-streaming"
bootstrap-servers = "localhost:9092"
}
# spark streaming settings
spark {
batch-duration = 10 // in seconds
partitions = 10
auto-offset-reset = "latest"
auto-commit = true
publish-to-kafka = true
group-id = "spark-streaming-data"
checkpoint-dir = ""
}
}
# geotrellis gdal VLM settings
vlm {
geotiff.s3 {
allow-global-read: false
region: "us-west-2"
}
gdal.options {
GDAL_DISABLE_READDIR_ON_OPEN = "YES"
CPL_VSIL_CURL_ALLOWED_EXTENSIONS = ".tif"
}
# if true then uses GDALRasterSources, if false GeoTiffRasterSources
source.gdal.enabled = true
}
Application settings provided via configuration file in the resources folder (producer).
lc8 {
scenes = [
{
name = "LC08_L1TP_139044_20170304_20170316_01_T1" # name of the LC8 scene
band = "1" # band number
count = 2 # number of generated polygons
crs = "EPSG:4326" # desired generated CRS
output-path = "../data/img" # the output path where the result output should be placed after processing
},
{
name = "LC08_L1TP_139045_20170304_20170316_01_T1"
band = "2"
count = 2
crs = "EPSG:4326"
output-path = "../data/img"
},
{
name = "LC08_L1TP_139046_20170304_20170316_01_T1"
band = "2"
count = 2
crs = "EPSG:4326"
output-path = "../data/img"
}
]
}
- For instance we already have
Kafkarunning localy on 9092 port. 1.1 If not, it is possible to launch Kafka in docker - Open two projects
producerandstreamingin two separate terminal windows runa streaming application (project streaming,run)runa procuder application (project producer,run --generate-and-send)
To summarise:
Terminal №1:
$ make kafkaTerminal №2:
$ cd app; ./sbt
$ project streaming
$ runor
$ make sbt-spark-demoTerminal №3:
$ ./sbt
$ project producer
$ run --generate-and-sendor
$ make kafka-send-messagesExtra summary:
# terminal 1
make kafka
# terminal 2
make sbt-spark-demo
# terminal 3
make kafka-send-messages- For instance we already have
Kafkarunning localy on 9092 port. If not, it is possible to launch Kafka in docker - Build a fat assembly jar:
make build - Launch a
Sparkapp:make local-spark-processing - Post a test kafka message:
make kafka-send-messages
To summarise:
$ make build && make local-spark-processing
$ make kafka-send-messages- Add into the
/etc/hostsfile the following alias:127.0.0.1 localhost kafka(definitely a working variant of a Mac OS setup:127.0.0.1 localhost.localdomain localhost kafka) - Run
make kafka