Skip to content

Commit 97dd089

Browse files
committed
Add contrib/pglogical_output, a logical decoding plugin
1 parent 49b4950 commit 97dd089

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+6138
-0
lines changed

contrib/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ SUBDIRS = \
3535
pg_stat_statements \
3636
pg_trgm \
3737
pgcrypto \
38+
pglogical_output \
3839
pgrowlocks \
3940
pgstattuple \
4041
postgres_fdw \

contrib/pglogical_output/.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pglogical_output.so
2+
results/
3+
regression.diffs
4+
tmp_install/
5+
tmp_check/
6+
log/

contrib/pglogical_output/Makefile

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
MODULE_big = pglogical_output
2+
PGFILEDESC = "pglogical_output - logical replication output plugin"
3+
4+
OBJS = pglogical_output.o pglogical_proto.o pglogical_config.o pglogical_hooks.o
5+
6+
REGRESS = params basic hooks
7+
8+
9+
ifdef USE_PGXS
10+
11+
# For regression checks
12+
# http://www.postgresql.org/message-id/CAB7nPqTsR5o3g-fBi6jbsVdhfPiLFWQ_0cGU5=94Rv_8W3qvFA@mail.gmail.com
13+
# this makes "make check" give a useful error
14+
abs_top_builddir = .
15+
NO_TEMP_INSTALL = yes
16+
# Usual recipe
17+
PG_CONFIG = pg_config
18+
PGXS := $(shell $(PG_CONFIG) --pgxs)
19+
include $(PGXS)
20+
21+
# These don't do anything yet, since temp install is disabled
22+
EXTRA_INSTALL += ./examples/hooks
23+
REGRESS_OPTS += --temp-config=regression.conf
24+
25+
plhooks:
26+
make -C examples/hooks USE_PGXS=1 clean install
27+
28+
installcheck: plhooks
29+
30+
else
31+
32+
subdir = contrib/pglogical_output
33+
top_builddir = ../..
34+
include $(top_builddir)/src/Makefile.global
35+
include $(top_srcdir)/contrib/contrib-global.mk
36+
37+
# 'make installcheck' disabled when building in-tree because these tests
38+
# require "wal_level=logical", which typical installcheck users do not have
39+
# (e.g. buildfarm clients).
40+
installcheck:
41+
;
42+
43+
EXTRA_INSTALL += $(subdir)/examples/hooks
44+
EXTRA_REGRESS_OPTS += --temp-config=./regression.conf
45+
46+
endif
47+
48+
install: all
49+
$(MKDIR_P) '$(DESTDIR)$(includedir)'/pglogical_output
50+
$(INSTALL_DATA) pglogical_output/compat.h '$(DESTDIR)$(includedir)'/pglogical_output
51+
$(INSTALL_DATA) pglogical_output/hooks.h '$(DESTDIR)$(includedir)'/pglogical_output

contrib/pglogical_output/README.md

Lines changed: 535 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# Design decisions
2+
3+
Explanations of why things are done the way they are.
4+
5+
## Why does pglogical_output exist when there's wal2json etc?
6+
7+
`pglogical_output` does plenty more than convert logical decoding change
8+
messages to a wire format and send them to the client.
9+
10+
It handles format negotiations, sender-side filtering using pluggable hooks
11+
(and the associated plugin handling), etc. The protocol its self is also
12+
important, and incorporates elements like binary datum transfer that can't be
13+
easily or efficiently achieved with json.
14+
15+
## Custom binary protocol
16+
17+
Why do we have a custom binary protocol inside the walsender / copy both protocol,
18+
rather than using a json message representation?
19+
20+
Speed and compactness. It's expensive to create json, with lots of allocations.
21+
It's expensive to decode it too. You can't represent raw binary in json, and must
22+
encode it, which adds considerable overhead for some data types. Using the
23+
obvious, easy to decode json representations also makes it difficult to do
24+
later enhancements planned for the protocol and decoder, like caching row
25+
metadata.
26+
27+
The protocol implementation is fairly well encapsulated, so in future it should
28+
be possible to emit json instead for clients that request it. Right now that's
29+
not the priority as tools like wal2json already exist for that.
30+
31+
## Column metadata
32+
33+
The output plugin sends metadata for columsn - at minimum, the column names -
34+
before each row. It will soon be changed to send the data before each row from
35+
a new, different table, so that streams of inserts from COPY etc don't repeat
36+
the metadata each time. That's just a pending feature.
37+
38+
The reason metadata must be sent is that the upstream and downstream table's
39+
attnos don't necessarily correspond. The column names might, and their ordering
40+
might even be the same, but any column drop or column type change will result
41+
in a dropped column on one side. So at the user level the tables look the same,
42+
but their attnos don't match, and if we rely on attno for replication we'll get
43+
the wrong data in the wrong columns. Not pretty.
44+
45+
That could be avoided by requiring that the downstream table be strictly
46+
maintained by DDL replication, but:
47+
48+
* We don't want to require DDL replication
49+
* That won't work with multiple upstreams feeding into a table
50+
* The initial table creation still won't be correct if the table has dropped
51+
columns, unless we (ab)use `pg_dump`'s `--binary-upgrade` support to emit
52+
tables with dropped columns, which we don't want to do.
53+
54+
So despite the bandwidth cost, we need to send metadata.
55+
56+
In future a client-negotiated cache is planned, so that clients can announce
57+
to the output plugin that they can cache metadata across change series, and
58+
metadata can only be sent when invalidated by relation changes or when a new
59+
relation is seen.
60+
61+
Support for type metadata is penciled in to the protocol so that clients that
62+
don't have table definitions at all - like queueing engines - can decode the
63+
data. That'll also permit type validation sanity checking on the apply side
64+
with logical replication.
65+
66+
## Hook entry point as a SQL function
67+
68+
The hooks entry point is a SQL function that populates a passed `internal`
69+
struct with hook function pointers.
70+
71+
The reason for this is that hooks are specified by a remote peer over the
72+
network. We can't just let the peer say "dlsym() this arbitrary function name
73+
and call it with these arguments" for fairly obvious security reasons. At bare
74+
minimum all replication using hooks would have to be superuser-only if we did
75+
that.
76+
77+
The SQL entry point is only called once per decoding session and the rest of
78+
the calls are plain C function pointers.
79+
80+
## The startup reply message
81+
82+
The protocol design choices available to `pg_logical` are constrained by being
83+
contained in the copy-both protocol within the fe/be protocol, running as a
84+
logical decoding plugin. The plugin has no direct access to the network socket
85+
and can't send or receive messages whenever it wants, only under the control of
86+
the walsender and logical decoding framework.
87+
88+
The only opportunity for the client to send data directly to the logical
89+
decoding plugin is in the `START_REPLICATION` parameters, and it can't send
90+
anything to the client before that point.
91+
92+
This means there's no opportunity for a multi-way step negotiation between
93+
client and server. We have to do all the negotiation we're going to in a single
94+
exchange of messages - the setup parameters and then the replication start
95+
message. All the client can do if it doesn't like the offer the server makes is
96+
disconnect and try again with different parameters.
97+
98+
That's what the startup message is for. It reports the plugin's capabilities
99+
and tells the client which requested options were honoured. This gives the
100+
client a chance to decide if it's happy with the output plugin's decision
101+
or if it wants to reconnect and try again with different options. Iterative
102+
negotiation, effectively.
103+
104+
## Unrecognised parameters MUST be ignored by client and server
105+
106+
To ensure upward and downward compatibility, the output plugin must ignore
107+
parameters set by the client if it doesn't recognise them, and the client
108+
must ignore parameters it doesn't recognise in the server's startup reply
109+
message.
110+
111+
This ensures that older clients can talk to newer servers and vice versa.
112+
113+
For this to work, the server must never enable new functionality such as
114+
protocol message types, row formats, etc without the client explicitly
115+
specifying via a startup parameter that it understands the new functionality.
116+
Everything must be negotiated.
117+
118+
Similarly, a newer client talking to an older server may ask the server to
119+
enable functionality, but it can't assume the server will actually honour that
120+
request. It must check the server's startup reply message to see if the server
121+
confirmed that it enabled the requested functionality. It might choose to
122+
disconnect and report an error to the user if the server didn't do what it
123+
asked. This can be important, e.g. when a security-significant hook is
124+
specified.

0 commit comments

Comments
 (0)