Hey all,
Say i have a dataset of 100,000,000 rows, many of which share the same timestamp. If i call the last
function on them, say last(id, timestamp), is there deterministic behaviour to determine what the last value produced will be? Or alternatively, can we provide a third column to break these ties.
Thanks!
Hi @marcogroot , my understanding is that if you create an index, it will automatically be deterministic as it will use the index to find the last.
I’m not sure if I understand the third column idea. Do you already have the data? Would it be like a time frame which should wrap the last from?
I have a service which is consuming data that has a timestamp column, and a sequence_number
column which indicates the order which the data was created (lower number means created first).
My question is If my service consumes a record and stores it in the database, if i then consume another record that has the SAME timestamp but a lower sequence_number (since it was created first in an external service), can i make it so that the lower sequence_number item is chosen with higher priority when using the last function?
For example
last(id, timestamp, sequence_number) // sort with sequence_number if timestamp is the same
Hey @marcogroot , I guess you can impose it by adding the proper order by:
last(id, timestamp)
...
from ...
order by timestamp, sequence_number.
Would it work?