`last` function when the timestamp is the same

marcogroot · April 18, 2024, 6:08am

Hey all,

Say i have a dataset of 100,000,000 rows, many of which share the same timestamp. If i call the last function on them, say last(id, timestamp), is there deterministic behaviour to determine what the last value produced will be? Or alternatively, can we provide a third column to break these ties.

Thanks!

jonatasdp · April 18, 2024, 12:32pm

Hi @marcogroot , my understanding is that if you create an index, it will automatically be deterministic as it will use the index to find the last.

I’m not sure if I understand the third column idea. Do you already have the data? Would it be like a time frame which should wrap the last from?

marcogroot · April 18, 2024, 3:29pm

I have a service which is consuming data that has a timestamp column, and a sequence_number column which indicates the order which the data was created (lower number means created first).

My question is If my service consumes a record and stores it in the database, if i then consume another record that has the SAME timestamp but a lower sequence_number (since it was created first in an external service), can i make it so that the lower sequence_number item is chosen with higher priority when using the last function?

For example

last(id, timestamp, sequence_number) // sort with sequence_number if timestamp is the same

jonatasdp · April 29, 2024, 1:21pm

Hey @marcogroot , I guess you can impose it by adding the proper order by:

last(id, timestamp)
...
from ... 
order by timestamp, sequence_number.

Would it work?