Hi @pgloader, I think the multi-node design is not a distributed system but distributes the workload in multiple machines. It means all access should be done from the access node.
If you want to use it the other way around, you can use the distributed_exec as a way to spread your data into all the nodes.
In theory, a PostgreSQL instance can serve as both an access node and a data node at the same time in different databases. However, we do not recommend mixed setups like this, because it can be complicated, and server instances are often provisioned differently depending on their role.
I just want to be able to login to the PostgreSQL cluster to find out if this is an access node of a multi-node TimescaleDB deployment or it’s just a single node deployment.
I also want to be able able to tell from data nodes the host name or IP address of the access node
SELECT true as is_access_node
FROM _timescaledb_catalog.metadata
WHERE key = 'dist_uuid' and value IN
(SELECT value FROM _timescaledb_catalog.metadata WHERE key = 'uuid');
The logic here is:
If the dist_uuid key exists, it is a multi-node cluster and you are connected to a node.
If the dist_uuid key exists and it maches uuid this is an access node.
I guess we should come up to a simple API function to do it.
timescaledb_information.data_nodes shows only data nodes if you query if from access node
I don’t think we store that information currently on data nodes, nodes only aware about unique id of the distributed database (access node). That makes sense in case if you have HA for access node and it can failover to replica which would require to update that data on the data nodes too. Also a replica can act as an access node for queries. Nodes do not communicate with the access node on their own.
gsf=# select * from _timescaledb_catalog.metadata;
key | value | include_in_telemetry
-------------------±-------------------------------------±---------------------
uuid | 17449bca-65bf-4139-b3f0-c626a377bea7 | t
install_timestamp | 2022-03-20 22:50:05.964062-04 | t
(2 rows)
This is from a data node of a multi-node deployment
polarmetrics=# select * from _timescaledb_catalog.metadata;
key | value | include_in_telemetry
-------------------±-------------------------------------±---------------------
uuid | 9fa1a125-c0d0-46f4-aebf-27a8d57d7274 | t
install_timestamp | 2022-05-24 21:06:46.37609-04 | t
dist_uuid | c1ee6651-454f-4f82-83ca-5586217ec0da | t
(3 rows)
The value of uuid and dist_uuid are different
This is from the access node
polarmetrics=# select * from _timescaledb_catalog.metadata;
key | value | include_in_telemetry
-------------------±-------------------------------------±---------------------
uuid | c1ee6651-454f-4f82-83ca-5586217ec0da | t
install_timestamp | 2022-05-24 21:06:41.513906-04 | t
dist_uuid | c1ee6651-454f-4f82-83ca-5586217ec0da | t
(3 rows)
The query will only return true on this access node.
Can we say if uuid is different from dist_uuid then this is a data node. If there is no dist_uuid, then this is a single node deployment ?