Troubleshooting Provider Appliance

From regify WIKI
Jump to navigation Jump to search

Process of registration,invitation and password reset

Many people complain that they do not receive the portal e-mails

There are hundreds of possible reasons but in > 90% of all such cases, the message simply stuck in the spam/junk folder of the recipient. If you have many users complaining, you should test the following:

  • Go to mail-tester.com
  • Use the shown address to send a regify invitation to (or register as new user).
  • Now, back on that mail-tester page, you can click the blue button "Then check your score" and get a result for the received email (make sure it is the same email address in the tree).

This tool rates from 1 to 10 points. With a regify provider, 9 is the maximum you can get (DKIM not supported). If you are less 9, please try to fix the issues the tool points to. It gives very good explanation.

Please also note the tips here: Regify_provider_appliance_tech#PostFix_E-Mail_Service

If all looks good, there is nothing more you can do. But please note that some missing e-mails from a few hundred or even thousands is normal. And even the most perfect configured email sending system does not prevent your emails from getting sorted out as spam. Spam filters are mainly based on training . Therefore, if and why regify messages are classified as spam or not is very specific to the recipient.

What exactly happens during registration and invitation?

The following chart shows you the different processes in simplified manner (click to zoom in):

Processes drawing











-

Appliance diagnostics

If you want the regify support to help you on a specific appliance issue (eg IP addresses, SSL certificates, E-Mail), please go to your SSH appliance menu and visit Appliance -> Other Settings -> Support Diagnostics. Please enter an e-mail address as destination. This will send the appliance configuration to the given address. Passwords and sensitive information are not part of this report.

Common messages in appliance error log

Clearing Related

CRIT: Failed URL: https://dc1.regify....  [Error executing curl call (0)....]
CRIT: Locked connection-id 1. Check /opt/provider/REGIFY_INCLUDE/../ClearingLock.dat [....]
CRIT: clearing connection 1 still not available [....]

This happens if a clearing connection was not functional. Most providers today are using two connections (1 and 2). In the above case, connection 1 failed to connect to the clearing. The connection got locked and is no longer used. Here, a further check also found it to be not yet functional.

If both connections are failing, the regify provider starts maintenance mode. In our experience, this happens every now and then. Internet connections are not that reliable as you might think. If such happens more often than once a month, you should examine your Internet connections or maybe ask your Internet provider about the reason. Especially if the maintenance mode is longer than 30 minutes.

Every five minutes, the provider appliance try's to re-use locked clearing connections. If they are working, they will get un-locked automatically. If at least one clearing connection is working, the maintenance mode is automatically removed.

CRIT: No more clearing connections available. Set provider to maintain-mode... [....]

This means that no more functional clearing connection is available. In this case, the regify provider appliance automatically enters the maintenance mode.

If the regify provider is in maintenance mode, the end users get some message saying that the regify provider is currently under maintenance. During maintenance mode, users can no longer open or send any regify messages.

CRIT: clearing connection 1 seems now available and working

This message indicates that the provider has been in maintenance mode before and this now ended. If the provider appliance finds out that at least one clearing connection is working again, the maintenance mode is removed automatically and this message is triggered. Users can use the system again.

CRIT: removed maintenance-state of provider (optional)
CRIT: removed clearing lock file (optional)
CRIT: all clearing connections are reactivated now

This means that all defined clearing connections are found as working (mostly after one or more connections have been locked before).

Security Related

CRIT: Wrong session-host for url /downloads.php?lg=EN / /downloads.php called by xxx

We currently do not know the origin of this problem. It looks like someone with an old cookie (maybe wrong system time) is trying to connect to your provider with this old cookie. As the session is already finished on the provider, this entry is created. You can ignore this. But if you know the origin of this, please contact us. We are interested in investigating this issue.

PHP Warning:  session_start(): The session id is too long or contains illegal characters, valid characters are a-z, A-Z, 0-9...
PHP Warning:  Unknown: The session id is too long or contains illegal characters, valid characters are a-z, A-Z, 0-9...
PHP Warning:  Unknown: Failed to write session data (files). Please verify that the current setting of session.save_path is correct
PHP Warning:  Unknown: Input variables exceeded 1000.

All these entries are coming from intrusion attempts by hackers. Mostly some hacker-tools are testing if systems are vulnerable on such attacks. Currently, the regify provider is not affected but it triggers such kind of entries. You can ignore and delete such entries.

Other log entries

PHP Warning:  htmlentities(): Invalid multibyte sequence...
PHP Warning:  htmlspecialchars(): Invalid multibyte sequence...

This happens more often since provider V3.4.0. We are investigating the problem.

Handle known appliance problems

The web server does not start

The web server does not start with an error message like this or similar:

(98)Address already in use: make_sock: could not bind to address 0.0.0.0:80

This means, that some httpd processes are hanging. Please use the following to identify the processes:

[root@clearing ~]# ps -ax | grep httpd
22680 ?        S      0:06 /usr/sbin/httpd -k start
22681 ?        S      0:03 /usr/sbin/httpd -k start
22682 ?        S      0:05 /usr/sbin/httpd -k start

In this case, three processes hanging. To get rid of them, try to kill the first one with the lowest process id (22680):

[root@clearing ~]# kill 22680

Check, if the others are also gone. If not, repeat this with the other process id's, too. If all httpd entries are gone, start the webserver using

[root@clearing ~]# apachectl start

The websock server does not start

Sometimes, after an upgrade, the websocket server may not become restarted. In this case, regibox managers showing connectivity issues and the builtin regify monitoring is showing ERROR: Websock server for regibox not running.

In order to fix this, log in as root (or drop to shell and use su root). Now, execute the following:

svc -u /service/lwsws/

How to fix broken cross-master replication

  1. login to both systems by using SSH (PuTTY) with user root and the password you set during installation
  2. Make a copy of your databases on both machines by using mysqldump
    mysqldump --databases regify > /tmp/mysql_backup.sql
  3. Enter provider configuration on both systems by typing providerConfig
  4. On both systems, navigate to "Database..." -> "View Database Status"
  5. Determine, which system is the one with most recent data
    1. One of the two systems is showing "Master Status" and "Slave Status". The other one only showing "Master Status".
    2. If it is not like descibed above, this tutorial will not help you! Please contact regify support!
    3. The one with both "Master Status" and "Slave Status" is the one with the most recent data. We call it now Origin. The other one is called Behind now.
  6. Navigate to "Database..." -> "Configure Replication..."
  7. Choose "no replication" on both systems ("master" and "slave")
  8. Choose "Cross Master" on the Origin system
  9. Fill in the data of the master dialogue
    1. Enter Unique Server ID (now 1 for the Origin)
    2. Enter correct number of servers (2 in most cases)
    3. Enter IP address/subnet of the Behind system
    4. Enter a username like 'providerRep' and a password (note the password!)
    5. Choose 'Next'
  10. In slave dialogue
    1. Enter the IP of the Behind system (already filled in)
    2. Enter the same credentials as in the dialogue before
    3. Choose 'Next'
  11. Answer the "Replication Synchronisation" question with "Yes"
  12. Fill in the IP address of the Behind system and enter the root password of the Behind system.
  13. Choose 'Ok'
  14. Wait
  15. Choose "Cross Master" on the Behind system.
  16. Fill in the data of the master dialogue
    1. Enter Unique Server ID (now 2 for the Behind)
    2. Enter correct number of servers (like on the Origin)
    3. Enter IP address/subnet of the Origin system
    4. Enter the same credentials as in the dialogue on the Origin
    5. Choose 'Next'
  17. In slave dialogue
    1. Enter the IP of the Origin system (already filled in)
    2. Enter the same credentials as in the dialogue before
    3. Choose 'Next'
  18. Answer the "Replication Synchronisation" question with "No" (important!)
  19. Wait
  20. Done