Difference between revisions of "Troubleshooting Provider Appliance"

From regify WIKI
Jump to navigation Jump to search
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
== Process of registration,invitation and password reset  ==
 
== Process of registration,invitation and password reset  ==
 +
=== Many people complain that they do not receive the portal e-mails ===
 +
There are hundreds of possible reasons but in > 90% of all such cases, the message simply stuck in the spam/junk folder of the recipient. If you have many users complaining, you should test the following:
 +
* Go to [https://www.mail-tester.com/ mail-tester.com]
 +
* Use the shown address to send a regify invitation to (or register as new user).
 +
* Now, back on that mail-tester page, you can click the blue button "Then check your score" and get a result for the received email (make sure it is the same email address in the tree).
 +
 +
This tool rates from 1 to 10 points. With a regify provider, '''9 is the maximum you can get''' (DKIM not supported). If you are less 9, please try to fix the issues the tool points to. It gives very good explanation.
 +
 +
Please also note the tips here: [[Regify_provider_appliance_tech#PostFix_E-Mail_Service]]
 +
 +
If all looks good, there is nothing more you can do. But please note that some missing e-mails from a few hundred or even thousands is normal. And even the most perfect configured email sending system does not prevent your emails from getting sorted out as spam. Spam filters are mainly based on training . Therefore, if and why regify messages are classified as spam or not is very specific to the recipient.
 +
 +
=== What exactly happens during registration and invitation? ===
 
The following chart shows you the different processes in simplified manner (click to zoom in):
 
The following chart shows you the different processes in simplified manner (click to zoom in):
  
Line 13: Line 26:
 
<br>
 
<br>
 
-
 
-
 +
 +
== Appliance diagnostics ==
 +
If you want the regify support to help you on a specific appliance issue (eg IP addresses, SSL certificates, E-Mail), please go to your SSH appliance menu and visit '''Appliance -> Other Settings -> Support Diagnostics'''. Please enter an e-mail address as destination. This will send the appliance configuration to the given address. Passwords and sensitive information are not part of this report.
  
 
== Common messages in appliance error log ==
 
== Common messages in appliance error log ==
Line 18: Line 34:
 
=== Clearing Related ===
 
=== Clearing Related ===
  
  CRIT: 0 [....]
+
  CRIT: Failed URL: https://dc1.regify....  [Error executing curl call (0)....]
  CRIT: Error executing curl call. Error:connect() timed out! [....]
 
 
  CRIT: Locked connection-id 1. Check /opt/provider/REGIFY_INCLUDE/../ClearingLock.dat [....]
 
  CRIT: Locked connection-id 1. Check /opt/provider/REGIFY_INCLUDE/../ClearingLock.dat [....]
 
  CRIT: clearing connection 1 not available [....]
 
  CRIT: clearing connection 1 not available [....]
  
This happens if a clearing connection was not functional. Most providers today are using two connections (1 and 2). In this case, the connection 1 failed to connect to the clearing. The connection got locked and is no longer used. If both connections are failing, it looks like your internet connection was failing. In our experience, this happens every now and then. Internet connections are not that reliable as you might think. If such happens more often than once a month, you should examine your internet connections or maybe ask your internet provider about the reason. Especially if the maintenance mode is longer than 5 minutes.
+
This happens if a clearing connection was not functional. Most providers today are using two connections (1 and 2). In the above case, connection 1 failed to connect to the clearing. The connection got locked and is no longer used.  
  
<u>Every five minutes, the provider appliance try's to re-use such locked connections. If they are working, they will get un-locked automatically. If at least one connection is working, the maintenance mode is automatically removed.</u>
+
If both connections are failing, the regify provider starts '''maintenance mode'''. In our experience, this happens every now and then. Internet connections are not that reliable as you might think. If such happens more often than once a month, you should examine your Internet connections or maybe ask your Internet provider about the reason. Especially if the maintenance mode is longer than 30 minutes.
 +
 
 +
'''Every five minutes, the provider appliance try's to re-use locked clearing connections.''' If they are working, they will get un-locked automatically. If at least one clearing connection is working, the maintenance mode is automatically removed.
  
 
  CRIT: No more clearing connections available. Set provider to maintain-mode... [....]
 
  CRIT: No more clearing connections available. Set provider to maintain-mode... [....]
Line 31: Line 48:
 
This means that no more functional clearing connection is available. In this case, the regify provider appliance automatically enters the maintenance mode.
 
This means that no more functional clearing connection is available. In this case, the regify provider appliance automatically enters the maintenance mode.
  
WARN: removed maintenance-state of provider [....]
+
'''If the regify provider is in maintenance mode, the end users get some message saying that the regify provider is currently under maintenance.''' During maintenance mode, users can no longer open or send any regify messages.
  
This message indicates that the provider has been in maintenance mode before. If the provider appliance finds out that at least one clearing connection is functional again, the maintenance mode is removed automatically and this message is triggered.
+
CRIT: removed maintenance-state of provider [....]
  
  WARN: all clearing connections are reactivated now [/]
+
This message indicates that the provider has been in maintenance mode before and this now ended. If the provider appliance finds out that at least one clearing connection is working again, the maintenance mode is removed automatically and this message is triggered. Users can use the system again.
 +
 
 +
  CRIT: all clearing connections are reactivated now [/]
 +
CRIT: removed clearing lock file (optional)
  
 
This means that <u>all</u> defined clearing connections are found as working (mostly after one or more connections have been locked before).
 
This means that <u>all</u> defined clearing connections are found as working (mostly after one or more connections have been locked before).
 
CRIT: ERROR: Error while clearing connection: '&lt;html&gt;&lt;body&gt;&lt;h1&gt;503 Service Unavailable&lt;/h1&gt;
 
No server is available to handle this request.&lt;/body&gt;&lt;/html&gt;' [....]
 
 
This may happen from time to time during maintenance intervals in the regify clearing service. We try to reduce them in the future and the technical team is working on a solution to prevent such messages. The system is still working.
 
  
 
=== Security Related ===
 
=== Security Related ===
 
PHP Warning:  session_destroy(): Trying to destroy uninitialized session in /opt/provider/REGIFY_PUBLIC/ADMINISTRATION/index.php on line X
 
 
This happens sometimes if some administrators session ran out of time and he then clicked on some administration function link. You can ignore this message.
 
 
PHP Warning:  pack(): Type H: illegal hex digit X in /opt/provider/common/incSecurity.php on line X
 
 
This message sometimes happened in provider appliance V3.2 and V3.3. It happens if the regify provider appliance gets some invalid URL calls with hex encoded values that contain non-hex characters. It seems like the result of some special hacking event by automatic hacking scripts testing random sites.
 
  
 
  CRIT: Access blocked (PHPIDS treshold X reached) for the following request:
 
  CRIT: Access blocked (PHPIDS treshold X reached) for the following request:

Revision as of 14:16, 1 February 2019

Process of registration,invitation and password reset

Many people complain that they do not receive the portal e-mails

There are hundreds of possible reasons but in > 90% of all such cases, the message simply stuck in the spam/junk folder of the recipient. If you have many users complaining, you should test the following:

  • Go to mail-tester.com
  • Use the shown address to send a regify invitation to (or register as new user).
  • Now, back on that mail-tester page, you can click the blue button "Then check your score" and get a result for the received email (make sure it is the same email address in the tree).

This tool rates from 1 to 10 points. With a regify provider, 9 is the maximum you can get (DKIM not supported). If you are less 9, please try to fix the issues the tool points to. It gives very good explanation.

Please also note the tips here: Regify_provider_appliance_tech#PostFix_E-Mail_Service

If all looks good, there is nothing more you can do. But please note that some missing e-mails from a few hundred or even thousands is normal. And even the most perfect configured email sending system does not prevent your emails from getting sorted out as spam. Spam filters are mainly based on training . Therefore, if and why regify messages are classified as spam or not is very specific to the recipient.

What exactly happens during registration and invitation?

The following chart shows you the different processes in simplified manner (click to zoom in):

Processes drawing











-

Appliance diagnostics

If you want the regify support to help you on a specific appliance issue (eg IP addresses, SSL certificates, E-Mail), please go to your SSH appliance menu and visit Appliance -> Other Settings -> Support Diagnostics. Please enter an e-mail address as destination. This will send the appliance configuration to the given address. Passwords and sensitive information are not part of this report.

Common messages in appliance error log

Clearing Related

CRIT: Failed URL: https://dc1.regify....  [Error executing curl call (0)....]
CRIT: Locked connection-id 1. Check /opt/provider/REGIFY_INCLUDE/../ClearingLock.dat [....]
CRIT: clearing connection 1 not available [....]

This happens if a clearing connection was not functional. Most providers today are using two connections (1 and 2). In the above case, connection 1 failed to connect to the clearing. The connection got locked and is no longer used.

If both connections are failing, the regify provider starts maintenance mode. In our experience, this happens every now and then. Internet connections are not that reliable as you might think. If such happens more often than once a month, you should examine your Internet connections or maybe ask your Internet provider about the reason. Especially if the maintenance mode is longer than 30 minutes.

Every five minutes, the provider appliance try's to re-use locked clearing connections. If they are working, they will get un-locked automatically. If at least one clearing connection is working, the maintenance mode is automatically removed.

CRIT: No more clearing connections available. Set provider to maintain-mode... [....]

This means that no more functional clearing connection is available. In this case, the regify provider appliance automatically enters the maintenance mode.

If the regify provider is in maintenance mode, the end users get some message saying that the regify provider is currently under maintenance. During maintenance mode, users can no longer open or send any regify messages.

CRIT: removed maintenance-state of provider [....]

This message indicates that the provider has been in maintenance mode before and this now ended. If the provider appliance finds out that at least one clearing connection is working again, the maintenance mode is removed automatically and this message is triggered. Users can use the system again.

CRIT: all clearing connections are reactivated now [/]
CRIT: removed clearing lock file (optional)

This means that all defined clearing connections are found as working (mostly after one or more connections have been locked before).

Security Related

CRIT: Access blocked (PHPIDS treshold X reached) for the following request:
-------------------------------------------------------------------------------------------
Security Alert (PHPIDS):
IMPACT: 36
TAGS: xss, csrf, id, rfe, lfi
URL: /XXXXX.php........
...

This means that the internal IDS (Intrusion Detection System) found something suspicious and blocked this call. The value behind IMPACT: shows you how dangerous the attack was. Values between 16 and 35 are lower potential, higher values are mostly some serious attack. This message is followed by the parameters (PARAMETERS: and FIELDS:) that lead to the alert showing you all fields and values received during the attack. If you consider these values as very suspicious or you have repeating alerts, please contact regify support and send us the complete log information.

CRIT: Wrong session-host for url /downloads.php?lg=EN / /downloads.php called by xxx

We currently do not know the origin of this problem. It looks like someone with an old cookie (maybe wrong system time) is trying to connect to your provider with this old cookie. As the session is already finished on the provider, this entry is created. You can ignore this. But if you know the origin of this, please contact us. We are interested in investigating this issue.

PHP Warning:  session_start(): The session id is too long or contains illegal characters, valid characters are a-z, A-Z, 0-9...
PHP Warning:  Unknown: The session id is too long or contains illegal characters, valid characters are a-z, A-Z, 0-9...
PHP Warning:  Unknown: Failed to write session data (files). Please verify that the current setting of session.save_path is correct
PHP Warning:  Unknown: Input variables exceeded 1000.

All these entries are coming from intrusion attempts by hackers. Mostly some hacker-tools are testing if systems are vulnerable on such attacks. Currently, the regify provider is not affected but it triggers such kind of entries. You can ignore and delete such entries.

Other log entries

PHP Warning:  htmlentities(): Invalid multibyte sequence...
PHP Warning:  htmlspecialchars(): Invalid multibyte sequence...

This happens more often since provider V3.4.0. We are investigating the problem.

Handle known appliance problems

The web server does not start

The web server does not start with an error message like this or similar:

(98)Address already in use: make_sock: could not bind to address 0.0.0.0:80

This means, that some httpd processes are hanging. Please use the following to identify the processes:

[root@clearing ~]# ps -ax | grep httpd
22680 ?        S      0:06 /usr/sbin/httpd -k start
22681 ?        S      0:03 /usr/sbin/httpd -k start
22682 ?        S      0:05 /usr/sbin/httpd -k start

In this case, three processes hanging. To get rid of them, try to kill the first one with the lowest process id (22680):

[root@clearing ~]# kill 22680

Check, if the others are also gone. If not, repeat this with the other process id's, too. If all httpd entries are gone, start the webserver using

[root@clearing ~]# apachectl start

Error during update

If you encounter this problem, your boot partition became to small:

Disk Requirements:
 At least 14MB more space needed on the /boot filesystem

We fixed this bug for everything newer than V4, but if you migrated from some 3.1 provider or even earlier, this might happen.

To solve, please follow this guide:

  • Log in to your regify provider appliance by using SSH and the root account.
    PS. You defined the root password during setup.
  • Get the name of your currently used kernel
# uname -r
2.6.32-504.12.2.el6.i686

The result may be something like 2.6.32-504.12.2.el6.i686.

  • Now, list all existing kernels by calling
# rpm -q kernel
kernel-2.6.28-398.10.9.el6.i686
kernel-2.6.31-422.11.1.el6.i686
kernel-2.6.32-431.11.2.el6.i686
kernel-2.6.32-504.12.2.el6.i686
  • Finally, remove all unused kernels by calling the following for every unused kernel:
# rpm -e kernel-2.6.28-398.10.9.el6.i686
# rpm -e kernel-2.6.31-422.11.1.el6.i686
# rpm -e kernel-2.6.32-431.11.2.el6.i686
  • Now try again whatever you wanted to do (eg "Check for updates")

Handle known Internet Explorer issues

Under certain conditions conditions Microsoft IE version 10 and below will interpret webpages as an outdated version of IE. In some cases it interprets webpages as Internet Explorer version 7. This will cause the provider to work incorrectly since regify support IE8 and later.

Is there a fix for this?

Yes, the next provider update will solve this issue by forcing your browser to always interpret pages in "Standard Document Mode".

Until the next update is available, the following manual fix can be applied:

1- While in the browser, press F12 on your keyboard to start Internet Explorer developer tools.

2- Developer tools panel will appear. Press on document Mode and select the "Standards" option.


What parts of the provider website are affected?

We have received several calls from clients concerning the issue. All of them have pointed out that the group management page is not working properly.

However any JavaScript powered page can be effected by this issue.

What about future Internet Explorer version?

Microsoft has decided to force their future browsers to follow the web standards starting from IE version 11 and greater.

What about other browsers?

Other browsers follow standards, therefore they are not effected by the issue.

Why does not regify support Internet Explorer 7?

Internet Explorer is an outdated browser that does not follow web standards. All major website have dropped its support since its usage has dropped to less than 1 percent this year.

How to fix broken cross-master replication

  1. login to both systems by using SSH (PuTTY) with user root and the password you set during installation
  2. Make a copy of your databases on both machines by using mysqldump
    mysqldump --databases regify > /tmp/mysql_backup.sql
  3. Enter provider configuration on both systems by typing providerConfig
  4. On both systems, navigate to "Database..." -> "View Database Status"
  5. Determine, which system is the one with most recent data
    1. One of the two systems is showing "Master Status" and "Slave Status". The other one only showing "Master Status".
    2. If it is not like descibed above, this tutorial will not help you! Please contact regify support!
    3. The one with both "Master Status" and "Slave Status" is the one with the most recent data. We call it now Origin. The other one is called Behind now.
  6. Navigate to "Database..." -> "Configure Replication..."
  7. Choose "no replication" on both systems ("master" and "slave")
  8. Choose "Cross Master" on the Origin system
  9. Fill in the data of the master dialogue
    1. Enter Unique Server ID (now 1 for the Origin)
    2. Enter correct number of servers (2 in most cases)
    3. Enter IP address/subnet of the Behind system
    4. Enter a username like 'providerRep' and a password (note the password!)
    5. Choose 'Next'
  10. In slave dialogue
    1. Enter the IP of the Behind system (already filled in)
    2. Enter the same credentials as in the dialogue before
    3. Choose 'Next'
  11. Answer the "Replication Synchronisation" question with "Yes"
  12. Fill in the IP address of the Behind system and enter the root password of the Behind system.
  13. Choose 'Ok'
  14. Wait
  15. Choose "Cross Master" on the Behind system.
  16. Fill in the data of the master dialogue
    1. Enter Unique Server ID (now 2 for the Behind)
    2. Enter correct number of servers (like on the Origin)
    3. Enter IP address/subnet of the Origin system
    4. Enter the same credentials as in the dialogue on the Origin
    5. Choose 'Next'
  17. In slave dialogue
    1. Enter the IP of the Origin system (already filled in)
    2. Enter the same credentials as in the dialogue before
    3. Choose 'Next'
  18. Answer the "Replication Synchronisation" question with "No" (important!)
  19. Wait
  20. Done