Troubleshooting Provider Appliance

From regify WIKI
Revision as of 08:17, 12 March 2024 by Regify (talk | contribs) (→‎Check bandwidth usage)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Process of registration,invitation and password reset

Many people complain that they do not receive the portal e-mails

There are hundreds of possible reasons but in > 90% of all such cases, the message simply stuck in the spam/junk folder of the recipient. If you have many users complaining, you should test the following:

  • Go to mail-tester.com
  • Use the shown address to send a regify invitation to (or register as new user).
  • Now, back on that mail-tester page, you can click the blue button "Then check your score" and get a result for the received email (make sure it is the same email address in the tree).

This tool rates from 1 to 10 points. With a regify provider, 9 is the maximum you can get (DKIM not supported). If you are less 9, please try to fix the issues the tool points to. It gives very good explanation.

Please also note the tips here: Regify_provider_appliance_tech#PostFix_E-Mail_Service

If all looks good, there is nothing more you can do. But please note that some missing e-mails from a few hundred or even thousands is normal. And even the most perfect configured email sending system does not prevent your emails from getting sorted out as spam. Spam filters are mainly based on training . Therefore, if and why regify messages are classified as spam or not is very specific to the recipient.

Reverse DNS

Please note that reverse dns records must be managed by the owner of the IP address. Therefore, it is very likely not on you to handle this in your DNS. You need to contact the hoster of your solution or the internet provider of your company in order to get such reverse dns record.

DMARC

You may want to add a _dmarc TXT record to your DNS server. To get basic functionality (passing everything) you can start with some TXT record like

v=DMARC1; p=none

for the domain _dmarc.[yourdomain]. More information is here at MX toolbox.

What exactly happens during registration and invitation?

The following chart shows you the different processes in simplified manner (click to zoom in):

Processes drawing











-

Appliance diagnostics

If you want the regify support to help you on a specific appliance issue (eg IP addresses, SSL certificates, E-Mail), please go to your SSH appliance menu and visit Appliance -> Other Settings -> Support Diagnostics. Please enter an e-mail address as destination. This will send the appliance configuration to the given address. Passwords and sensitive information are not part of this report.

How are the regify invoices calculated?

IMPORTANT: Please note that the invoices we write are representing the status and numbers from the last day of the last month, shortly before midnight. So, if a few days have gone by, there will be differences in the current numbers if users have registered or left in the meantime. The same is for transaction numbers.

Get monthly report

First, please activate the sending of the monthly report to your email address. To activate, see the regify provider documentation (Web-Admin -> Documentation and manuals -> regify_provider_documentation.pdf) for the value REPORTCOPYADDRESS. By this, you can get a detailed and explained copy of all the numbers in your regify provider.

Short explanation about the counting

  • We do not bill regimail private users.
  • We do not bill for users in regimail private groups.
  • We bill for
    • regimail professional users
    • mass sendings (price per message):
      1. For regimail mass sendings
      2. For regipay mass sendings
      3. For regibill mass sendings

This is how we count regimail professional users:

  • We count users if they are
    1. active regimail professional members and
    2. above the free 30 days period (regify does not charge for the first 30 days) and
    3. maxtransaction setting is less or equal the provider-default an
    4. ungrouped and
    5. not a regimail mass sender
  • In addition, we also count
    1. The sum of all defined regimail professional groups (even if they are not full).

Common messages in appliance error log

Clearing Related

CRIT: Failed URL: https://dc1.regify....  [Error executing curl call (0)....]
CRIT: Locked connection-id 1. Check /opt/provider/REGIFY_INCLUDE/../ClearingLock.dat [....]
CRIT: clearing connection 1 still not available [....]

This happens if a clearing connection was not functional. Most providers today are using two connections (1 and 2). In the above case, connection 1 failed to connect to the clearing. The connection got locked and is no longer used. Here, a further check also found it to be not yet functional.

If both connections are failing, the regify provider starts maintenance mode. In our experience, this happens every now and then. Internet connections are not that reliable as you might think. If such happens more often than once a month, you should examine your Internet connections or maybe ask your Internet provider about the reason. Especially if the maintenance mode is longer than 30 minutes.

Every five minutes, the provider appliance try's to re-use locked clearing connections. If they are working, they will get un-locked automatically. If at least one clearing connection is working, the maintenance mode is automatically removed.

CRIT: No more clearing connections available. Set provider to maintain-mode... [....]

This means that no more functional clearing connection is available. In this case, the regify provider appliance automatically enters the maintenance mode.

If the regify provider is in maintenance mode, the end users get some message saying that the regify provider is currently under maintenance. During maintenance mode, users can no longer open or send any regify messages.

CRIT: clearing connection 1 seems now available and working

This message indicates that the provider has been in maintenance mode before and this now ended. If the provider appliance finds out that at least one clearing connection is working again, the maintenance mode is removed automatically and this message is triggered. Users can use the system again.

CRIT: removed maintenance-state of provider (optional)
CRIT: removed clearing lock file (optional)
CRIT: all clearing connections are reactivated now

This means that all defined clearing connections are found as working (mostly after one or more connections have been locked before).

Security Related

CRIT: Wrong session-host for url /downloads.php?lg=EN / /downloads.php called by xxx

We currently do not know the origin of this problem. It looks like someone with an old cookie (maybe wrong system time) is trying to connect to your provider with this old cookie. As the session is already finished on the provider, this entry is created. You can ignore this. But if you know the origin of this, please contact us. We are interested in investigating this issue.

PHP Warning:  session_start(): The session id is too long or contains illegal characters, valid characters are a-z, A-Z, 0-9...
PHP Warning:  Unknown: The session id is too long or contains illegal characters, valid characters are a-z, A-Z, 0-9...
PHP Warning:  Unknown: Failed to write session data (files). Please verify that the current setting of session.save_path is correct
PHP Warning:  Unknown: Input variables exceeded 1000.

All these entries are coming from intrusion attempts by hackers. Mostly some hacker-tools are testing if systems are vulnerable on such attacks. Currently, the regify provider is not affected but it triggers such kind of entries. You can ignore and delete such entries.

Other log entries

PHP Warning:  htmlentities(): Invalid multibyte sequence...
PHP Warning:  htmlspecialchars(): Invalid multibyte sequence...

This happens more often since provider V3.4.0. We are investigating the problem.

Handle known appliance problems

The web server does not start

The web server does not start with an error message like this or similar:

(98)Address already in use: make_sock: could not bind to address 0.0.0.0:80

This means, that some httpd processes are hanging. Please use the following to identify the processes:

[root@clearing ~]# ps -ax | grep httpd
22680 ?        S      0:06 /usr/sbin/httpd -k start
22681 ?        S      0:03 /usr/sbin/httpd -k start
22682 ?        S      0:05 /usr/sbin/httpd -k start

In this case, three processes hanging. To get rid of them, try to kill the first one with the lowest process id (22680):

[root@clearing ~]# kill 22680

Check, if the others are also gone. If not, repeat this with the other process id's, too. If all httpd entries are gone, start the webserver using

[root@clearing ~]# apachectl start

The websock server does not start

Sometimes, after an upgrade, the websocket server may not become restarted. In this case, regibox managers showing connectivity issues and the builtin regify monitoring is showing ERROR: Websock server for regibox not running.

In order to fix this, log in as root (or drop to shell and use su root). Now, execute the following:

svc -u /service/lwsws/

How to fix broken cross-master replication

ATTENTION: During this recovery, the whole system is in maintenance mode and customers will not be able to use the system. This will affect all products like regimail (and regigates), regibox and regipay.

  1. Enter web administration of both main systems and enable maintenance mode (Provider maintenance -> Enable maintenance mode).
    • Do not login to a sub-provider administration because the maintenance function is only available from the main provider administration.
  2. Login to both systems by using SSH (PuTTY) with user root and the password you set during installation
  3. Make a backup of your databases and config on both machines by using
    /usr/libexec/regify-provider-app/makebackup -o ~/backup_preRecovery
  4. Enter provider configuration on both systems by typing providerConfig in ssh comandline
  5. On both systems, navigate to "Database..." -> "View Database Status"
  6. Determine, which system is the one with most recent data
    1. One of the two systems is showing "Master Status" and "Slave Status". The other one only showing "Master Status".
    2. If it is not like descibed above, this tutorial will not help you! Please contact regify support!
    3. The one with both "Master Status" and "Slave Status" is the one with the most recent data. We call it now Origin. The other one is called Behind now.
  7. Navigate to "Database..." -> "Configure Replication..." on both systems
  8. Choose "no replication" on both systems ("master" and "slave")
  9. Choose "Cross Master" on the Origin system
  10. Fill in the data of the master dialogue
    1. Enter Unique Server ID (now 1 for the Origin)
    2. Enter correct number of servers (2 in most cases)
    3. Enter IP address/subnet of the Behind system
    4. Enter a username like 'providerRep' and a password (note the password!)
    5. Choose 'Next'
  11. In slave dialogue
    1. Enter the IP of the Behind system (already filled in)
    2. Enter the same credentials as in the dialogue before
    3. Choose 'Next'
  12. Answer the "Replication Synchronisation" question with "Yes"
  13. Fill in the IP address of the Behind system and enter the root password of the Behind system.
  14. Choose 'Ok'
  15. Wait (may take a while)
    • NOTE: If this fails because of login problems on the Behind systems, please enter "Network..." -> "Advanced Settings..." -> "SSH Settings" on the Behind system and check if you need to whitelist the IP of your Origin system and also allow root login. In case of doubt, note your current settings and then temporarily wipe the IP field and enable root login for this process.
  16. Choose "Cross Master" on the Behind system.
  17. Fill in the data of the master dialogue
    1. Enter Unique Server ID (now 2 for the Behind)
    2. Enter correct number of servers (like on the Origin)
    3. Enter IP address/subnet of the Origin system
    4. Enter the same credentials as in the dialogue on the Origin
    5. Choose 'Next'
  18. In slave dialogue
    1. Enter the IP of the Origin system (already filled in)
    2. Enter the same credentials as in the dialogue before
    3. Choose 'Next'
  19. Answer the "Replication Synchronisation" question with "No" (important!)
  20. Wait (may take a while)
  21. Enter web administration of both systems and disable maintenance mode (Provider maintenance -> Remove maintenance mode).
  22. Make sure to restore SSH settings on Behind system in case you changed them in step 15.
  23. Done

What if I need to restore the backup?

The backup, created in the above procedure step 3, can get restored by following this guide:

  1. Login using SSH to Origin system using root
  2. Execute
    apachectl -S
    and copy or screenshot the result
  3. Restore backup using
    /usr/libexec/regify-provider-app/restore ~/backup_preRecovery
  4. Upon restore, the SSL engine is offline because the system needs re-assignment of IPs and domains.
    • Run providerConfig and enter "Provider..." -> "Start SSL..."".
    • There you have to select the IP address for all subproviders one after the other. The list from step 2 will help you to assign the correct ones.
  5. Now, follow above procedure to re-initialize the database replication.

Make and restore provider backup

Make backup

Login using SSH and make sure to be root. Now run the following:

/usr/libexec/regify-provider-app/makebackup -o /opt/provider/REGIFY_PUBLIC/backup.enc

Then download the backup using a webbrowser at https://<yourProvider>/backup.enc

IMPORTANT: Remove the backup after downloading:

rm /opt/provider/REGIFY_PUBLIC/backup.enc

In order to restore it, you also need your clearing ID and password. If you do not know that any more, run the following:

cat /opt/provider/REGIFY_INCLUDE/menu_config.php

Note the values for PROVIDERID and CLEARINGPASS.

You can also show the assignment of domains and IP addresses (needed for later re-assignment):

apachectl -S | grep :443

Restore backup

Copy the backup to the new machine. We prefer rsync or scp. Here is an example scp call:

scp backup.enc root@<yourProvider>:/tmp

Now log in to your new appliance and make sure to be root.

Restore the backup using this command:

/usr/libexec/regify-provider-app/restore -i <PROVIDERID> /tmp/backup.enc

You will be prompted for the CLEARINGPASS password. Upon correct password was entered, the backup is restored and the configuration is updated.

You now need to enter the SSH appliance menu and re-assign the providers domains to the IP addresses.

Upon this, you can restart the webserver using "Provider..." -> "Start SSL...".

Check bandwidth usage

Install the bandwhich tool in home folder (temporary use only, please):

cd ~
wget https://github.com/imsnif/bandwhich/releases/download/0.20.0/bandwhich-v0.20.0-x86_64-unknown-linux-musl.tar.gz -O - | tar -xz

NOTE: You cannot use a more recent version of bandwhich, because the newer versions require more recent glibc versions.

Now, run it like this:

./bandwhich -n

NOTE: The -n parameter stops the tool from querying DNS for every IP address. It is of no use and also causes traffic.

You stop it with the Q key.

Please remove the tool after you finished.