Connection to local server drops after few minutes

Hi. I’m having issue with my Nextcloud server, hosted on a machine running Proxmox. Nextcloud was installed using the Turnkey-Nextcloud lxc template. My desktop (the client) is running EndeavourOS (6.1 lts kernel)

The issue happens randomly, after few minutes (sometime hours) of “intense” traffic. Nextcloud server became no longer accessible, I can’t no longer ssh it (default 22 port), but I can still ping it. Restarting the machine not always fix the issue. The issue goes away by itself after some time (even days). Note that I still can reach Nextcloud via other devices (even though I’ve never used Nextcloud so much on other devices…).

Thing is, I have no idea to how troubleshoot it. I disable firewall on my side, disabled fail2ban on the server (firewalld/ufw is not installed on the server), nothing changed. But I found a way to reestablish the connection.
On both desktop and server I have Tailscale installed (Wireguard vpn). If I try to access Nextcloud via the vpn it works, and immediately after, it starts to work again even without the vpn (ie using the 192.168.* local address).

I guess the problem is on the desktop client, because the server is running very few processes, and none of them look like they could cause issue (but I’m no expert, so I’ll list the processes down here)

root@NextCloud ~# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.2 165860  9876 ?        Ss   15:42   0:00 /sbin/init
root          49  0.0  0.7 105748 32948 ?        Ss   15:42   0:00 /lib/systemd/systemd-journald
systemd+      66  0.0  0.1  16072  6960 ?        Ss   15:42   0:00 /lib/systemd/systemd-networkd
root         154  0.0  0.0   6740  2616 ?        Ss   15:42   0:00 /usr/sbin/cron -f
redis        160  0.0  0.2  67720 11736 ?        Ssl  15:42   0:02 /usr/bin/redis-server 127.0.0.1:6379
root         162  0.0  0.0 151160  3584 ?        Ssl  15:42   0:00 /usr/sbin/rsyslogd -n -iNONE
root         184  0.0  0.0   5476  2016 pts/0    Ss+  15:42   0:00 /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 linux
root         185  0.0  0.0   5476  2028 pts/2    Ss+  15:42   0:00 /sbin/agetty -o -p -- \u --noclear --keep-baud tty2 115200,38400,9600 linux
root         208  0.0  0.1  13376  7516 ?        Ss   15:42   0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
stunnel4     216  0.0  0.0  86056  1320 ?        Ssl  15:42   0:00 /usr/bin/stunnel4 /etc/stunnel/shellinabox.conf
stunnel4     233  0.0  0.0  86056  1328 ?        Ssl  15:42   0:00 /usr/bin/stunnel4 /etc/stunnel/webmin.conf
mysql        316  0.0  3.3 1871748 140688 ?      Ssl  15:42   0:01 /usr/sbin/mariadbd
root         318  0.0  0.9 269288 41568 ?        Ss   15:42   0:00 /usr/sbin/apache2 -k start
www-data     322  0.0  0.0   3492   168 ?        Ss   15:42   0:00 /usr/bin/htcacheclean -d 120 -p /var/cache/apache2/mod_cache_disk -l 300M -n
www-data     327  0.0  1.3 272852 57720 ?        S    15:42   0:00 /usr/sbin/apache2 -k start
www-data     328  0.0  1.3 272716 56008 ?        S    15:42   0:00 /usr/sbin/apache2 -k start
www-data     329  0.0  1.7 292764 75472 ?        S    15:42   0:01 /usr/sbin/apache2 -k start
www-data     330  0.0  1.7 292888 74608 ?        S    15:42   0:02 /usr/sbin/apache2 -k start
root         393  0.0  0.7 730908 33172 ?        Ssl  15:42   0:02 /usr/sbin/tailscaled --state=/var/lib/tailscale/tailscaled.state --socket=/run/tailscale/tailscaled.sock --port=41641
shellin+     448  0.0  0.0   7212  2616 ?        Ss   15:42   0:00 /usr/bin/shellinaboxd -q --background=/var/run/shellinaboxd.pid -c /var/lib/shellinabox -p 12319 -u shellinabox -g shellinabox --user-css White On Black:+/etc/shellinabox/options-enabled/00+White On Black.css,Black 
shellin+     450  0.0  0.0   7212   664 ?        S    15:42   0:00 /usr/bin/shellinaboxd -q --background=/var/run/shellinaboxd.pid -c /var/lib/shellinabox -p 12319 -u shellinabox -g shellinabox --user-css White On Black:+/etc/shellinabox/options-enabled/00+White On Black.css,Black 
root         987  0.0  0.1  40052  4700 ?        Ss   15:42   0:00 /usr/lib/postfix/sbin/master -w
postfix      988  0.0  0.1  40072  6352 ?        S    15:42   0:00 pickup -l -t unix -u -c
postfix      989  0.0  0.1  40120  6292 ?        S    15:42   0:00 qmgr -l -t unix -u
root         991  0.0  0.5  31880 24196 ?        Ss   15:42   0:00 /usr/bin/perl /usr/share/webmin/miniserv.pl /etc/webmin/miniserv.conf
www-data    1345  0.0  1.3 272508 55252 ?        S    15:42   0:00 /usr/sbin/apache2 -k start
www-data    1346  0.0  1.7 292764 72140 ?        S    15:42   0:00 /usr/sbin/apache2 -k start
www-data    1347  0.0  1.3 272704 54844 ?        S    15:42   0:00 /usr/sbin/apache2 -k start
www-data    1348  0.0  1.2 272496 54508 ?        S    15:42   0:00 /usr/sbin/apache2 -k start
www-data    1349  0.0  1.3 272512 54784 ?        S    15:42   0:00 /usr/sbin/apache2 -k start
www-data    1350  0.0  1.4 273536 60380 ?        S    15:42   0:00 /usr/sbin/apache2 -k start
root        1458  0.0  0.1   9440  4296 pts/1    Ss   15:50   0:00 /bin/login -p --
root        1477  0.0  0.1   5556  4408 pts/1    S    15:50   0:00 -bash
root        1614  0.0  0.1  14156  8312 ?        Rs   15:56   1:05 sshd: root@notty

Is “the machine” in this case the client or the server?

In the past rebooting the client didn’t work.
I was testing again right now, and now rebooting worked. I can’t say if rebooting reestablished the connection or if it was some other thing. But either way, when it works, it’s just a temporary fix. Because when I start making some uploads, after few minutes the connection drops again.

Hope this makes sense, English is not my first language

An info that may be relevant: server is connected to the router via ethernet, desktop client via wifi. Right now I have no easy way to connect the desktop via ethernet

Cheap USB ethernet adapter might give you a way to rule out the wifi connection.
What about other devices on the local network?
Can you connect to NC from a phone while the desktop can not?
Trying to figure out if the problem is on the client, ie eos or phone, or router, or proxmox or lxc.
Can you connect to proxmox from your desktop?
Can you connect to other lxcs on proxmox?

I’ve been doing a lot of test from yesterday, it seems that rebooting either the client or the server is enough to reestablish the connection (but as I said don’t fix the problem)

This if what I found:

  • I tried to create intense traffic from my laptop (wifi) to the same NC server, and the same exact issue is there. But the laptop is also running EndeavourOs so…

  • I tried to flush all iptables rules and setup this 3 rules:

sudo iptables -P INPUT ACCEPT
sudo iptables -P FORWARD ACCEPT
sudo iptables -P OUTPUT ACCEPT
  • I tried the same intense traffic from my dekstop to another lxc container, here the connection dropped after more than 12 hours but the issue I received was different (before was ssh: client loop: connection refused: broken pipe
    This I think hints that maybe the issue is with that specific lxc container

Yes I can, and it looks like this reestablish the connection on the desktop (but I need to replicate more times).

Yes

Yes, without a problem

I can even ping the NC server successfully from the desktop, but web interface (port 80 if not mistaken) or ssh (port 22) are not reachable

Can you connect to NC from proxmox? Can you run screen on proxmox and ten ssh into NC, and then disconnect the screen session leaving the ssh session, thus allowing you to ssh to proxmox and the reconnect the screen session and see inside NC?

From the desktop, you can ping, can you use nmap and run a tcp syn scan? and Maybe use nmap to do a udp scan as well?

I have seen before where I can ping a box, but I can’t get into the box with ssh. I you have an existing ssh in screen session on proxmox that you can use to see inside that might help checking kernel logs

You can network routing tables, arp cache.
Are there DNS issues?
Netstat can see connections, connection buffering?
Using netstat you can see is there are lots of half open or half closed connections?

Can you run a regular curl from proxmox or a lxc to poke NC, if the phone sees to poke NC awake? Or is it required that the connecton come from the wifi?
Does that mean that wifi is becoming wedged somehow?

Oh God, I need multiple translation layers to understand what you’re saying :sweat_smile: I’ll try

So, what you’re asking is:

  1. open proxmox shell (the normal web shell accessible from the desktop or you mean directly from the server?)
  2. SSH from proxmox to NC container (what do you mean by screen on?)
  3. close proxmox web tab?

Ok, i’ll try that as soon as the connection drops again

nmap output:

sudo nmap -sS -p- 192.168.159.101

Starting Nmap 7.94 ( https://nmap.org ) at 2023-07-16 15:21 CEST
Note: Host seems down. If it is really up, but blocking our ping probes, try -Pn
Nmap done: 1 IP address (0 hosts up) scanned in 1.51 seconds

So no returning ping so nmap doesn’t want to send SYN it seems, didn’t get much from that.

Any luck with screen on proxmox and ssh from within that into NC?

what’s your network setup? wifi or wired?

But what do you mean by “screen on proxmox”? The web interface?

Wifi

try to connect at least the server with a wire

1 Like

The server is connected with wire, it’s the desktop client that’s connected via wifi

1 Like

Not web
Screen is a detachable terminal multiplexer.
You can switch between multiple active sessions.
You can detach the screen controller, then logout, then religion and reconnect. Good for long running processes that you don’t want killed if you lose network connection.
Check man screen, or on web man screen. You might have to install.