[repl] Plugin Group_replication Reported: '[gcs] Read Failed'

In the last post I shared the simple gear up of steps to configure a Group Replication setup using SQL commands, and a few in the configuration file too. Indeed, it can be simple.  But then there are times where there are more requirements and configurations need more attention.

Maybe the OS environment we use for MySQL setups has never impacted usa earlier edifice a Group similar this.

Or just the Grouping Replication plugin introduces new things we never needed such as an extra network port in play or new security requirements nosotros're not used to.  So let's look at a couple of these common things nosotros might not expect to encounter, so that we tin can remedy them more apace.

Security in Grouping Replication

There are a few areas of added security that is offered and possibly desired in Group Replication.  One of them is SSL support in the Group Advice System (GCS) besides every bit the Recovery process of bringing members dorsum online into the Group.  You lot tin can read nearly the SSL features and involved configuration here.  It's not something that I'll blog about correct now.  Other areas include the following…

OS Configuration Settings

The MySQL install parcel for your Operating System distribution may include various security settings that integrate MySQL properly, some of which "may" include SELinux. Others that need to be handled would be firewalls likeIptables.  This is of course assuming that SELinux and/or the firewall are enabled. For both of these there are references in the MySQL Group Replication documentation for changes that "may" need to exist addressed in your environment.  Specifically, these topics are covered in the Group Replication FAQ section as we'll see next.

MySQL GCS and firewalls

The firewall named Iptables on Linux will help to illustrate how Group Replication is configured. At that place is a new network port that is required for Group Replication to work.  There is no explicit port that must be used, with one core exception….information technology tin can't exist the same port that the MySQL database instance uses (typically 3306 unless otherwise changed for diverse reasons).  The port that is called should likewise not disharmonize with other services defined on your OS (more than for clarity when observing services running through netstat or the like) or other services in-utilise with other systems in your company infrastructure.  Don't want conflicts to arise, then choose wisely.

The higher up aside, the firewall on your server likely needs to have this port configured, and instructions for doing so can be found in this FAQ entry for Iptables.

Then if y'all're configuration files await like the following, then these details may be needed in your firewall.

group_replication_local_address = 'HOST1:16601' group_replication_group_seeds = 'HOST2:16601,HOST3:16601'

Then you're associated Iptables or perchance firewalld entries 'might' look like this:

# Handling Iptables for Enterprise Linux vi, CentOS6/etc (co-ordinate to the MySQL Docs FAQ document) # look at the current rules in place to see if port 16601 is listed already iptables -L  # if our port is not listed, then add together a rule to accept network traffic on port 16601 and save iptables -A INPUT -p tcp --dport 16601 -j ACCEPT service iptables save  # Handling firewalld for Enterprise Linux 7, CentOS7/etc # look at the current rules and/ports in place firewall-cmd --list-ports firewall-cmd --permanent --list-ports  # if our port isn't at that place than add information technology permanently to the suitable zone  firewall-cmd --zone=public --permanent --add-port=16601/tcp firewall-cmd --zone=public --permanent --listing-ports        

MySQL GCS and SELinux

For SELinux with Group Replication, you can also refer to the MySQL Group Replication FAQ Document.  The SELinux notes in the FAQ evidence the following in the scripted department below.  This setup is needed and then that SELinux volition allow MySQL to receive traffic across the port yous've defined for GCS.

# SELinux notes as from the MySQL Group Replication FAQ page # verify the status of selinux sestatus -5  # check which ports MySQL is allowed to use semanage port -l | grep mysqld  # add the port used by your configuration file if its not already establish semanage port -a -t mysqld_port_t -p tcp 16601  # verify the change, output should include both 16601 and 3306 (if using the default) semanage port -l | grep mysqld  # Of class there are more than sophisticated ways of handling SELinux,  # this is a minimal highlight just

Then by at present you've possibly made some changes as above. Did you find things that needed to be configured?  Are things working at present?

No changes made as they weren't relevant for a fix? (mayhap your firewall is off and SELinux is permissive)

…..if this is the case, then continue to the next section of this mail!

IP Address Whitelisting

The other area of security that the MySQL squad embarked on in Group Replication was the setup of Network IP Whitelist back up.  This is a really interesting feature for a few reasons.

  1. It only pertains to the GCS implementation which is responsible for all of the Group's membership sensation, transaction certification consensus and is by and large the intelligence in the Group that makes this new adequacy and offering from the MySQL squad to work.
  2. Information technology otherwise, does not bear upon the MySQL database instances any.
  3. It involves all currently active and time to come members by defining their network locations as (implicitly or explicitly) safe.
  4. By default, it's kind of like an auto-configured firewall for GCS, unless you define it otherwise.

Then you're proverb to yourself now…this looks neat, how could annihilation go wrong!  **famous last words**

Let's explore!

Diagnosing & Solving GCS Communication Problems

So you've built your MySQL instances, you have members spanning your cloud platform and and then yous notice that the Group membership is failing and some nodes are offline from the Group.  You lot call up to yourself…this is odd, I did the exact same setup in our lab surround and everything worked great, what is the deviation now?  Plus I know (from above) that SELinux and our firewall are configured properly.

The likely and showtime suspect to look at, the network!

Where to look get-go though….the MySQL Mistake Log.  **huh! you say.**

GCS Logging in the MySQL Error Log

Yes…the error log is useful for trouble-shooting all sorts of things.  I used to have bash alias setups in the past then that I could blazon 2 characters and immediately be able to view the precious Error Log….source for confirmations of server startup progress, successes, failures, etc.

Specifically though, the GCS team logs all sorts of useful information into it. They've done so particularly when it involves the local Group member trying to initialize or join a Grouping Replication Cluster.  So let's look at an case where a 2nd fellow member to join fails, and how to diagnose in the fault log.

Log Observations for a Boostrap Fellow member

What is a bootstrap member again?  It is the offset member to initialize a Group Replication cluster.  All subsequent members will utilize the cluster that this member began.  The bootstrap setup tin can be reviewed again hither which the URL takes you to the right section of my previous weblog.

What practice nosotros look for in the error log for a new "bootstrap member" initializing a group?  Allow's wait and I'll annotate beneath:

# Get-go set of log entries outline the Groups configurations, defined or defaulted # The 1st log entry shows the automobile-configured whitelist settings # The second log entry besides very valuable showing the IP address for the current host's hostname 2017-03-14T17:27:21.487697Z 7 [Note] Plugin group_replication reported: '[GCS] Added automatically IP ranges 127.0.0.1/8,192.168.56.127/32 to the whitelist' 2017-03-14T17:27:21.489565Z 7 [Annotation] Plugin group_replication reported: '[GCS] Translated 'HOST1' to 192.168.56.127' 2017-03-14T17:27:21.489682Z 7 [Note] Plugin group_replication reported: '[GCS] SSL was non enabled' 2017-03-14T17:27:21.489697Z 7 [Note] Plugin group_replication reported: 'Initialized group communication with configuration: group_replication_group_name: "1a1c5221-fd26-11e6-8e12-1246aeecf2d5"; group_replication_local_address: "HOST1:16601"; group_replication_group_seeds: "HOST2:16601,HOST3:16601"; group_replication_bootstrap_group: true; group_replication_poll_spin_loops: 0; group_replication_compression_threshold: meg; group_replication_ip_whitelist: "Automatic"' ... ... # Last 3 rows here ostend this is the bootstrap member for the Group, all is well 2017-03-14T17:27:22.543425Z 0 [Note] Plugin group_replication reported: 'Starting grouping replication recovery with view_id 14895124425433056:1' 2017-03-14T17:27:22.543713Z fifteen [Annotation] Plugin group_replication reported: 'Simply one server alive. Declaring this server as online within the replication group' 2017-03-14T17:27:22.555434Z 0 [Note] Plugin group_replication reported: 'This server was declared online inside the replication group'

So nosotros have a successfully initialized boostrap member to start our Group Replication clustered setup.

Log Observations for a Failing Member Trying to Bring together

Adjacent, nosotros configure the subsequent fellow member as I've explained earlier here, and it should join the Group Member that nosotros've established higher up.  Allow's review its log entries:

# second fellow member trying to bring together the grouping but failing !!!!  # lines directly below, same as bootstrap member outlines the configuruations 2017-03-14T17:37:25.890691Z half dozen [Note] Plugin group_replication reported: '[GCS] Added automatically IP ranges 127.0.0.one/8,192.168.56.128/32 to the whitelist' 2017-03-14T17:37:25.892653Z 6 [Notation] Plugin group_replication reported: '[GCS] Translated 'HOST2' to 192.168.56.128' 2017-03-14T17:37:25.892770Z 6 [Notation] Plugin group_replication reported: '[GCS] SSL was not enabled' 2017-03-14T17:37:25.892784Z 6 [Note] Plugin group_replication reported: 'Initialized group communication with configuration: group_replication_group_name: "1a1c5221-fd26-11e6-8e12-1246aeecf2d5"; group_replication_local_address: "HOST2:16601"; group_replication_group_seeds: "HOST1:16601,HOST3:16601"; group_replication_bootstrap_group: false; group_replication_poll_spin_loops: 0; group_replication_compression_threshold: 1000000; group_replication_ip_whitelist: "Automated"' ... # The line directly below we can see it is reaching out to HOST1 on port 16601 # The 2nd line notes that information technology times out, and the 3rd states this member is Not prepare to join 2017-03-14T17:37:25.943647Z 0 [Notation] Plugin group_replication reported: 'client connected to HOST1 16601 fd 130' 2017-03-14T17:37:55.944076Z 0 [Mistake] Plugin group_replication reported: '[GCS] Timeout while waiting for the grouping communication engine to exist set up!' 2017-03-14T17:37:55.944116Z 0 [Error] Plugin group_replication reported: '[GCS] The grouping advice engine is non ready for the member to bring together. Local port: 16601'

With the errors are noted above….permit'due south ostend if the Bootstrap member participated

# There is one line entry added to the fault log since we last looked # It identifies the IP attempting to connect, and states the reason for rejection # It wasn't in the IP whitelist!  2017-03-14T17:37:25.935650Z 0 [Warning] Plugin group_replication reported: '[GCS] Connection attempt from IP address 192.168.56.128 refused. Address is non in the IP whitelist.'

Yes, it did!  The Boostrap member rejected the second member that tried to join because information technology wasn't included in the automobile-configured IP white-listing.   But why not?

Looking at the automatically created IP whitelist entry of 192.168.56.128/32 information technology which may not seem curious correct away.  Looking at fleck closer at the netmask though and the 32 indicates that this IP is in a subnet of its own.  So here we can conclude that the GCS automatic IP whitelist that is generated accounts for the configured netmask of the host's IP and includes that IP range in the whitelist (along with the very liberal localhost IP range).  Had the netmask of the network on all the group replication servers been 192.168.56.128/24 which allows a total range of IPs for the concluding octet, then no problems would take been noticed as the automated IP whitelist would have been sufficient.

Options for the Fix

In that location are a few ways that this can exist addressed.

  1. Suit the netmask on all the servers to include a suitable range of IPs, which include all Group Replication related servers.  If this is adequate, and then you'll need to stop Group Replication on the inital Fellow member (HOST1 in this example), and re-bootstrap information technology so that the automatic IP whitelist picks up the new netmask configured on its server.  Other members should join properly later that, but their own netmask setups should besides take been adjusted besides.
  2. Perchance the restrictive netmask was intentional, in which example y'all can purposely construct the configuration for the IP whitelist instead using the explicit IP address of each member, netmask do not demand to be included.  Come across the IP whitelist documentation for more than data
    group_replication_ip_whitelist="192.168.56.127,192.168.56.128,192.168.56.129,127.0.0.ane/eight″;

    Don't forget, as per option i, once you've added the higher up entry to all your Group Member configuration files, it still needs to be made active. There are two ways to do this:

    a) Restart the instance one time the configuration file is setup.  Since the group in our case never made it past the commencement member, nosotros'll need to bootstrap that fellow member once again.  Once the initial bootstrap member is ready, then other servers needs to restart before joining.

    b) Assuming the config file has likewise been adjusted to include the config entry above, then the configuration can exist made dynamically to the server.  However the Grouping Replication "service" needs to be recycled.

    # Here are the needed commands to dynamically conform the IP whitelist mysql> STOP GROUP_REPLICATION; mysql> Gear up GLOBAL group_replication_ip_whitelist="192.168.56.127,192.168.56.128,192.168.56.129,127.0.0.one/eight"; mysql> START GROUP_REPLICATION;  # Run the following to execute the bootstrap commands before and after mysql> STOP GROUP_REPLICATION; mysql> SET GLOBAL group_replication_ip_whitelist="192.168.56.127,192.168.56.128,192.168.56.129,127.0.0.1/8"; mysql> Prepare GLOBAL group_replication_bootstrap_group=ON; mysql> Kickoff GROUP_REPLICATION; mysql> Prepare GLOBAL group_replication_bootstrap_group=OFF;            

Offline Mode Usage with Group Replication

One more item to add: Anytime you plan to execute a restart of the Group Replication Service, ALWAYS dynamically enable offline_mode=ON earlier stopping the service. Once the group replication service is running again, and then you can turn offline_mode=OFF.

# Here are the needed commands to dynamically adjust the IP whitelist mysql> Set up GLOBAL OFFLINE_MODE=ON; mysql> STOP GROUP_REPLICATION; mysql> Prepare GLOBAL group_replication_ip_whitelist="192.168.56.127,192.168.56.128,192.168.56.129,127.0.0.1/8"; mysql> Beginning GROUP_REPLICATION; mysql> SET GLOBAL OFFLINE_MODE=OFF;        

Reasons and understanding for the to a higher place volition be in my adjacent web log post.

Conclusion

Hopefully this review of walking through some key messages in the error log will help you surpass possible complications you might meet.  At that place are a variety of things that might hold users upward from getting it going, but the items noted in this blog post I consider the likely candidates based on my experience and engagements with companies that I've been working with and then far.

Would beloved to hear your feedback and experience with Grouping Replication and expect forrard to supporting the wider MySQL Community and the commercial crowds alike!

gabrieleformened1942.blogspot.com

Source: https://thesubtlepath.com/mysql/group-replication-gcs-troubleshooting/

0 Response to "[repl] Plugin Group_replication Reported: '[gcs] Read Failed'"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel