Wrong replication of Kafka messages happening in the same millisecond

### Overview
When sending Kafka messages quickly, some of them might have the same timestamp (millisecond precision). If multiple messages regarding the same row happen on a single millisecond, the Connector incorrectly applies only the first message (in that millisecond) onto the database, because it relies on the timestamp for determining order.

The environment I was testing the Connector on involved only a single partition. Kafka guarantees preserving the order of messages within a partition (partition offset). The exact scenario can also happen on a multi-partition topic, as many Kafka producers send messages with the same key to the same partition, so messages regarding the same row will end up in the same Kafka topic partition.

As you will see in "Futher investigation" the Connector already receives the messages in correct order (in my testing), but is unable to apply them correctly. A partition offset is also available to determine the correct order.

### Reproduction
1. Start up Confluent, ScyllaDB Sink Connector. Set up Connector with topic `t`.
2. Download the `input` file: [input](https://gist.githubusercontent.com/avelanarius/daa908330bd7cbed8e874ff4ef70b46b/raw/ffbcc22f78fe69624eb277bf9dca2ed8a5530bc7/input). It consists of 10 operation triplets, which add a row with `v = 0`, delete the row and add it again with `v = 1`. Therefore, the final table should contain 10 rows with `v = 1`:
```
{"pk":{"int":1},"ck":{"int":1}}${"ks.t.value_schema":{"pk":{"int":1},"ck":{"int":1},"v":{"int":0}}}
{"pk":{"int":1},"ck":{"int":1}}$null
{"pk":{"int":1},"ck":{"int":1}}${"ks.t.value_schema":{"pk":{"int":1},"ck":{"int":1},"v":{"int":1}}}
{"pk":{"int":2},"ck":{"int":2}}${"ks.t.value_schema":{"pk":{"int":2},"ck":{"int":2},"v":{"int":0}}}
{"pk":{"int":2},"ck":{"int":2}}$null
{"pk":{"int":2},"ck":{"int":2}}${"ks.t.value_schema":{"pk":{"int":2},"ck":{"int":2},"v":{"int":1}}}
[... MORE ...]
```

3. Using the `kafka-avro-console-producer` provided by Confluent to write messages from `input`:
```
bin/kafka-avro-console-producer --broker-list localhost:9092 --topic t --property parse.key=true \ 
--property key.schema='{"fields":[{"name":"pk","type":["null","int"]},{"name":"ck","type":["null","int"]}],"name":"key_schema","namespace":"ks.t","type":"record"}'  \
--property "key.separator=$" --property value.schema='["null",{"fields":[{"name":"pk","type":["null","int"]},{"name":"ck","type":["null","int"]},{"name":"v","type":["null","int"]}],"name":"value_schema","namespace":"ks.t","type":"record"}]' \
--timeout 100 --request-required-acks 0 < input
```
4. Select rows in the destination table:
```
SELECT * FROM test.t;
```
**Got:**

![image](https://user-images.githubusercontent.com/10003183/87881171-c9ebcb80-c9f7-11ea-9444-ef35decc831b.png)

**Expected:** 10 rows with `v = 1` and `pk`, `ck` from 1 to 10.

### Futher investigation
Using `kafka-avro-console-consumer` I verified the messages were sent in the correct order (value part only shown):

![image](https://user-images.githubusercontent.com/10003183/87881254-63b37880-c9f8-11ea-94b0-7aee5e44f890.png)

After adding additional log statements in the Connector, (surprisingly?) it also received the messages in the correct order:

![image](https://user-images.githubusercontent.com/10003183/87881315-bbea7a80-c9f8-11ea-8da9-a3b82cc5922a.png)

### Root cause
https://github.com/scylladb/kafka-connect-scylladb/blob/aa89618ccbab7aa2d14da67c7b86a305dd1e914d/src/main/java/io/connect/scylladb/ScyllaDbSinkTaskHelper.java#L94-L98

The Connector uses a `setDefaultTimestamp()` method with the timestamp of a Kafka message. It is translated into CQL `USING TIMESTAMP` when executing a query and it prevents execution of next queries with timestamp lesser or equal.

In the reproduction example, row with `pk = 1, ck = 1` is missing. It is caused by the `DELETE` and `INSERT` operation having the same `USING TIMESTAMP`, so `INSERT` is ignored:

![image](https://user-images.githubusercontent.com/10003183/87881707-50ee7300-c9fb-11ea-9e07-5d302ed3a8a8.png)

There is another smaller issue in those lines in `ScyllaDbSinkTaskHelper.java`: `setDefaultTimestamp()` expects a epoch timestamp in microseconds, but a millisecond timestamp (from Kafka) is assigned, so the `WRITETIME` in database is off by a factor of 1000:

![image](https://user-images.githubusercontent.com/10003183/87881608-b42bd580-c9fa-11ea-9ad8-faddc21787f0.png)


	boundStatement.setConsistencyLevel(topicConfigs.getConsistencyLevel());
	boundStatement.setDefaultTimestamp(topicConfigs.getTimeStamp());
	} else {
	boundStatement.setConsistencyLevel(this.scyllaDbSinkConnectorConfig.consistencyLevel);
	boundStatement.setDefaultTimestamp(record.timestamp());

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrong replication of Kafka messages happening in the same millisecond #20

Overview

Reproduction

Futher investigation

Root cause

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wrong replication of Kafka messages happening in the same millisecond #20

Description

Overview

Reproduction

Futher investigation

Root cause

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions