Server running ESXi 6.5u2, SSH and shell enabled.
We periodically SSH into the shell to run some storage diagnostics (LSI storcli). After some time, SSH connections fail because ESXi seems to run out of pseudoterminals (likely some process leaking them??). I see these errors in /var/log/auth.log
2019-08-07T06:28:37Z sshd[127764]: /etc/ssh/sshd_config line 7: Deprecated option UsePrivilegeSeparation
2019-08-07T06:28:37Z sshd[127764]: /etc/ssh/sshd_config line 15: Unsupported option PrintLastLog
2019-08-07T06:28:37Z sshd[127764]: Connection from XXXXX port 43365
2019-08-07T06:28:38Z sshd[127764]: Accepted keyboard-interactive/pam for root from XXXXX port 43365 ssh2
2019-08-07T06:28:38Z sshd[127764]: pam_unix(sshd:session): session opened for user root by (uid=0)
2019-08-07T06:28:38Z sshd[127764]: error: openpty: No such file or directory
2019-08-07T06:28:38Z sshd[127764]: error: session_pty_req: session 0 alloc failed
2019-08-07T06:28:38Z sshd[127764]: pam_unix(sshd:session): session closed for user root
I checked /dev/char/pty and there are 64 pseudoterminals, but I haven't found any way to discover who is using each terminal (lsof is not the same as linux).
Any idea?
Hello,
Did you change anything in sshd_config ?
If you can, check if you can correlate the pty/tX session with any world ID with this command :
grep -i pty /var/log/auth.log
it will raise a results like :
2018-10-10T08:44:20Z sshd[2989715]: Session opened for 'root' on /dev/char/pty/t0
where 2989715 is the current world id of the session pty/t0
Then if you retrieves all the World ID, then we can check from where it comes.
PS : Can you put your shell.log and auth.log as an attached files (or private message if you want), i'll check it out.
Let me know
Hi thanks for your logs.
Just on august 08th there's around 100 ssh connection initiated by these 2 IPs :
10.1.36.58
10.33.158.183
Is there a script running or something like that ? Because the following commands are launched every 15min along the night and day and it creates SSH connections... :
2019-08-08T00:20:22Z sshd[133943]: User 'root' running command '/opt/lsi/storcli/storcli show J'
2019-08-08T00:20:22Z sshd[133943]: User 'root' running command 'ls /opt/lsi/storcli/storcli'
2019-08-08T00:20:22Z sshd[133943]: User 'root' running command '/opt/lsi/storcli/storcli /c0 show J'
You can correlate every time those commands are launched (in auth.log) it add an entry to shell.log, and there's exactly the same time..
Yes. We are polling the server every 15 minutes (through SSH) from two different pollers to query the storage status (the storcli utility). We execute the script and then close the SSH connection. I don't see any evidence in the system that any of these connections is hung.
Yeah that's weird... you start and end a ssh session it's logged so we can trust the connection close well.
Have you tried to set a timeout for ssh sessions ? (even though the starting/ending of your ssh sessions are logged so it means they close well, we can verify if a timeout change anything.)
Do you mean on the server side?
Sorry, yes on the server side
I think there is already a timeout set. this is the output of sshd -T
/etc/ssh/sshd_config line 7: Deprecated option UsePrivilegeSeparation
/etc/ssh/sshd_config line 15: Unsupported option PrintLastLog
port 22
addressfamily any
listenaddress 0.0.0.0:22
listenaddress [::]:22
usepam yes
logingracetime 120
x11displayoffset 10
maxauthtries 6
maxsessions 10
clientaliveinterval 200
clientalivecountmax 3
streamlocalbindmask 0177
permitrootlogin yes
ignorerhosts yes
ignoreuserknownhosts no
hostbasedauthentication no
hostbasedusesnamefrompacketonly no
pubkeyauthentication yes
passwordauthentication no
kbdinteractiveauthentication yes
challengeresponseauthentication yes
printmotd yes
x11forwarding no
x11uselocalhost yes
permittty yes
permituserrc yes
strictmodes yes
tcpkeepalive yes
permitemptypasswords no
permituserenvironment no
compression yes
gatewayports no
usedns no
allowtcpforwarding yes
allowagentforwarding yes
disableforwarding no
allowstreamlocalforwarding yes
streamlocalbindunlink no
fingerprinthash SHA256
fipsmode no
pidfile /var/run/sshd.pid
xauthlocation /usr/X11R6/bin/xauth
ciphers aes128-ctr,aes192-ctr,aes256-ctr,3des-cbc
macs hmac-sha2-256,hmac-sha2-512,hmac-sha1
banner /etc/issue
forcecommand none
chrootdirectory none
trustedusercakeys none
revokedkeys none
authorizedprincipalsfile none
versionaddendum none
authorizedkeyscommand none
authorizedkeyscommanduser none
authorizedprincipalscommand none
authorizedprincipalscommanduser none
hostkeyagent none
kexalgorithms diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1,diffie-hellman-group-exchange-sha1
hostbasedacceptedkeytypes ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistcom,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519,rsa-sha2-512,rsa-sha2-256,ssh-rsa
hostkeyalgorithms ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cera-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519,rsa-sha2-512,rsa-sha2-256,ssh-rsa
pubkeyacceptedkeytypes ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp52,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519,rsa-sha2-512,rsa-sha2-256,ssh-rsa
loglevel INFO
syslogfacility AUTH
authorizedkeysfile /etc/ssh/keys-%u/authorized_keys
hostkey /etc/ssh/ssh_host_rsa_key
hostkey /etc/ssh/ssh_host_dsa_key
authenticationmethods any
subsystem sftp /usr/lib/vmware/openssh/bin/sftp-server -f LOCAL5 -l INFO
maxstartups 10:30:100
permittunnel no
ipqos lowdelay throughput
rekeylimit 0 0
permitopen any
Thanks, and can you provide the ssh config from ESXi side
use "cat /etc/ssh/sshd_config"
Sorry I was on vacation, I just got back. Here it is
[root@goliath-node-a:~] cat /etc/ssh/sshd_config
# running from inetd
# Port 2200
Protocol 2
HostKey /etc/ssh/ssh_host_rsa_key
HostKey /etc/ssh/ssh_host_dsa_key
UsePrivilegeSeparation no
SyslogFacility auth
LogLevel info
PermitRootLogin yes
PrintMotd yes
PrintLastLog no
TCPKeepAlive yes
X11Forwarding no
Ciphers aes128-ctr,aes192-ctr,aes256-ctr,3des-cbc
MACs hmac-sha2-256,hmac-sha2-512,hmac-sha1
KexAlgorithms diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1,diffie-hellman-group-exchange-sha1
UsePAM yes
# only use PAM challenge-response (keyboard-interactive)
PasswordAuthentication no
Banner /etc/issue
Subsystem sftp /usr/lib/vmware/openssh/bin/sftp-server -f LOCAL5 -l INFO
AuthorizedKeysFile /etc/ssh/keys-%u/authorized_keys
# Timeout value of 10 mins. The default value of ClientAliveCountMax is 3.
# Hence, we get a 3 * 200 = 600 seconds timeout if the client has been
# unresponsive.
ClientAliveInterval 200
# sshd(8) will refuse connection attempts with a probability of “rate/100”
# (30%) if there are currently “start” (10) unauthenticated connections. The
# probability increases linearly and all connection attempts are refused if the
# number of unauthenticated connections reaches “full” (100)
MaxStartups 10:30:100