Troubleshooting Provider Appliance

From regify WIKI
Jump to: navigation, search

Process of registration,invitation and password reset

Many people complain that they do not receive the portal e-mails

There are hundreds of possible reasons but in 90% of all such cases, the message simply stuck in the spam/junk folder of the recipient. If you have many users complaining, you should test the following:

  1. Is Reverse-DNS set correctly? Use [this tool], enter your provider IP address and have a look if it points to the correct provider URL.
  2. Is SPF enabled for your provider IP? Go [here] and invite the given address. Then check the results.
  3. If both is fine and you do not see a reason, maybe test [with this tool], as it returns good information. Should be 7 or 8 from 10 to be fine.

If all looks good, contact regify support for more help. But please, some missing e-mails from a few hundred or even thousands is normal.

What exactly happens during registration and invitation?

The following chart shows you the different processes in simplified manner (click to zoom in):

Processes drawing










-

Appliance diagnostics

If you want the regify support to help you on a specific appliance issue (eg IP addresses, SSL certificates, E-Mail), please go to your SSH appliance menu and visit Appliance -> Other Settings -> Support Diagnostics. Please enter an e-mail address as destination. This will send the appliance configuration to the given address. Passwords and sensitive information are not part of this report.

Common messages in appliance error log

Clearing Related

CRIT: 0 [....]
CRIT: Error executing curl call. Error:connect() timed out! [....]
CRIT: Locked connection-id 1. Check /opt/provider/REGIFY_INCLUDE/../ClearingLock.dat [....]
CRIT: clearing connection 1 not available [....]

This happens if a clearing connection was not functional. Most providers today are using two connections (1 and 2). In this case, the connection 1 failed to connect to the clearing. The connection got locked and is no longer used. If both connections are failing, it looks like your internet connection was failing. In our experience, this happens every now and then. Internet connections are not that reliable as you might think. If such happens more often than once a month, you should examine your internet connections or maybe ask your internet provider about the reason. Especially if the maintenance mode is longer than 5 minutes.

Every five minutes, the provider appliance try's to re-use such locked connections. If they are working, they will get un-locked automatically. If at least one connection is working, the maintenance mode is automatically removed.

CRIT: No more clearing connections available. Set provider to maintain-mode... [....]

This means that no more functional clearing connection is available. In this case, the regify provider appliance automatically enters the maintenance mode.

WARN: removed maintenance-state of provider [....]

This message indicates that the provider has been in maintenance mode before. If the provider appliance finds out that at least one clearing connection is functional again, the maintenance mode is removed automatically and this message is triggered.

WARN: all clearing connections are reactivated now [/]

This means that all defined clearing connections are found as working (mostly after one or more connections have been locked before).

CRIT: ERROR: Error while clearing connection: '<html><body><h1>503 Service Unavailable</h1> 
No server is available to handle this request.</body></html>' [....]

This may happen from time to time during maintenance intervals in the regify clearing service. We try to reduce them in the future and the technical team is working on a solution to prevent such messages. The system is still working.

Security Related

CRIT: Access blocked (PHPIDS treshold X reached) for the following request:
-------------------------------------------------------------------------------------------
Security Alert (PHPIDS):
IMPACT: 36
TAGS: xss, csrf, id, rfe, lfi
URL: /XXXXX.php........
...

This means that the internal IDS (Intrusion Detection System) found something suspicious and blocked this call. The value behind IMPACT: shows you how dangerous the attack was. Values between 16 and 35 are lower potential, higher values are mostly some serious attack. This message is followed by the parameters (PARAMETERS: and FIELDS:) that lead to the alert showing you all fields and values received during the attack. If you consider these values as very suspicious or you have repeating alerts, please contact regify support and send us the complete log information.

CRIT: Wrong session-host for url /downloads.php?lg=EN / /downloads.php called by xxx

We currently do not know the origin of this problem. It looks like someone with an old cookie (maybe wrong system time) is trying to connect to your provider with this old cookie. As the session is already finished on the provider, this entry is created. You can ignore this. But if you know the origin of this, please contact us. We are interested in investigating this issue.

PHP Warning:  session_start(): The session id is too long or contains illegal characters, valid characters are a-z, A-Z, 0-9...
PHP Warning:  Unknown: The session id is too long or contains illegal characters, valid characters are a-z, A-Z, 0-9...
PHP Warning:  Unknown: Failed to write session data (files). Please verify that the current setting of session.save_path is correct
PHP Warning:  Unknown: Input variables exceeded 1000.

All these entries are coming from intrusion attempts by hackers. Mostly some hacker-tools are testing if systems are vulnerable on such attacks. Currently, the regify provider is not affected but it triggers such kind of entries. You can ignore and delete such entries.

Other log entries

PHP Warning:  htmlentities(): Invalid multibyte sequence...
PHP Warning:  htmlspecialchars(): Invalid multibyte sequence...

This happens more often since provider V3.4.0. We are investigating the problem.

Handle known appliance problems

The web server does not start

The web server does not start with an error message like this or similar:

(98)Address already in use: make_sock: could not bind to address 0.0.0.0:80

This means, that some httpd processes are hanging. Please use the following to identify the processes:

[root@clearing ~]# ps -ax | grep httpd
22680 ?        S      0:06 /usr/sbin/httpd -k start
22681 ?        S      0:03 /usr/sbin/httpd -k start
22682 ?        S      0:05 /usr/sbin/httpd -k start

In this case, three processes hanging. To get rid of them, try to kill the first one with the lowest process id (22680):

[root@clearing ~]# kill 22680

Check, if the others are also gone. If not, repeat this with the other process id's, too. If all httpd entries are gone, start the webserver using

[root@clearing ~]# apachectl start

Error during update

If you encounter this problem, your boot partition became to small:

Disk Requirements:
 At least 14MB more space needed on the /boot filesystem

We fixed this bug for everything newer than V4, but if you migrated from some 3.1 provider or even earlier, this might happen.

To solve, please follow this guide:

  • Log in to your regify provider appliance by using SSH and the root account.
    PS. You defined the root password during setup.
  • Get the name of your currently used kernel
# uname -r
2.6.32-504.12.2.el6.i686

The result may be something like 2.6.32-504.12.2.el6.i686.

  • Now, list all existing kernels by calling
# rpm -q kernel
kernel-2.6.28-398.10.9.el6.i686
kernel-2.6.31-422.11.1.el6.i686
kernel-2.6.32-431.11.2.el6.i686
kernel-2.6.32-504.12.2.el6.i686
  • Finally, remove all unused kernels by calling the following for every unused kernel:
# rpm -e kernel-2.6.28-398.10.9.el6.i686
# rpm -e kernel-2.6.31-422.11.1.el6.i686
# rpm -e kernel-2.6.32-431.11.2.el6.i686
  • Now try again whatever you wanted to do (eg "Check for updates")

Handle known Internet Explorer issues

Under certain conditions conditions Microsoft IE version 10 and below will interpret webpages as an outdated version of IE. In some cases it interprets webpages as Internet Explorer version 7. This will cause the provider to work incorrectly since regify support IE8 and later.

Is there a fix for this?

Yes, the next provider update will solve this issue by forcing your browser to always interpret pages in "Standard Document Mode".

Until the next update is available, the following manual fix can be applied:

1- While in the browser, press F12 on your keyboard to start Internet Explorer developer tools.

2- Developer tools panel will appear. Press on document Mode and select the "Standards" option.


What parts of the provider website are affected?

We have received several calls from clients concerning the issue. All of them have pointed out that the group management page is not working properly.

However any JavaScript powered page can be effected by this issue.

What about future Internet Explorer version?

Microsoft has decided to force their future browsers to follow the web standards starting from IE version 11 and greater.

What about other browsers?

Other browsers follow standards, therefore they are not effected by the issue.

Why does not regify support Internet Explorer 7?

Internet Explorer is an outdated browser that does not follow web standards. All major website have dropped its support since its usage has dropped to less than 1 percent this year.

How to fix broken cross-master replication

  1. login to both systems by using SSH (PuTTY) with user root and the password you set during installation
  2. Make a copy of your databases on both machines by using mysqldump
    mysqldump --databases regify > /tmp/mysql_backup.sql
  3. Enter provider configuration on both systems by typing providerConfig
  4. On both systems, navigate to "Database..." -> "View Database Status"
  5. Determine, which system is the one with most recent data
    1. One of the two systems is showing "Master Status" and "Slave Status". The other one only showing "Master Status".
    2. If it is not like descibed above, this tutorial will not help you! Please contact regify support!
    3. The one with both "Master Status" and "Slave Status" is the one with the most recent data. We call it now Origin. The other one is called Behind now.
  6. Navigate to "Database..." -> "Configure Replication..."
  7. Choose "no replication" on both systems ("master" and "slave")
  8. Choose "Cross Master" on the Origin system
  9. Fill in the data of the master dialogue
    1. Enter Unique Server ID (now 1 for the Origin)
    2. Enter correct number of servers (2 in most cases)
    3. Enter IP address/subnet of the Behind system
    4. Enter a username like 'providerRep' and a password (note the password!)
    5. Choose 'Next'
  10. In slave dialogue
    1. Enter the IP of the Behind system (already filled in)
    2. Enter the same credentials as in the dialogue before
    3. Choose 'Next'
  11. Answer the "Replication Synchronisation" question with "Yes"
  12. Fill in the IP address of the Behind system and enter the root password of the Behind system.
  13. Choose 'Ok'
  14. Wait
  15. Choose "Cross Master" on the Behind system.
  16. Fill in the data of the master dialogue
    1. Enter Unique Server ID (now 2 for the Behind)
    2. Enter correct number of servers (like on the Origin)
    3. Enter IP address/subnet of the Origin system
    4. Enter the same credentials as in the dialogue on the Origin
    5. Choose 'Next'
  17. In slave dialogue
    1. Enter the IP of the Origin system (already filled in)
    2. Enter the same credentials as in the dialogue before
    3. Choose 'Next'
  18. Answer the "Replication Synchronisation" question with "No" (important!)
  19. Wait
  20. Done