Open
Description
Use case description:
I have some interesting events coming in from my users, which I'll store in a user_event topic. One of the fields will be ip_address.
I also have a 3rd party service I can use that lets me send an IP address, and get back a LatLng as the best-guess for where in the world that IP lives. The service is slowish, and unreliable, so I'd like to store the result in a second topic, ip_location.
ip_location starts out empty, but I'm hoping that over time it will turn into a reasonably comprehensive cache of all my users' locations.
So, I'd like to build something that watches user_event, checks if we already have that IP in ip_location, and if not, calls the 3rd party. It it fails for any reason, I can expect it to retry later.
Discussion on solutions:
- so perhaps you do a stream-table join between user_event and ip_location and if you get a null back for latlng you somehow trigger the lookup against the service and write the result back to the table (and thus topic)
- Unless the app blocks (which is presumably undesirable) then it suggests needing a separate app to look through a cache of unresolved user events and following up when the IP location is updated.
- stream-table join is definitely a great choice in this scenario. I also think here's a way to use the Kafka Streams Processor API for the lookup and caching: you could do the look up and if not found forward a marker record to a sink processor that will write the missing lat-long record to a topic, then an external consumer could perform the lookup and write the result back to the topic backing the KTable.
- External calls from a streams application is considered an anti-pattern?
- For external service calls this is a good way to go: https://github.com/confluentinc/parallel-consumer