CSSD failed to start because it failed to change to real-time priority

OS: Red Hat Enterprise Server 7.9
Oracle: 11.2.0.4 2-node RAC

Problem:

Immediately after `crsctl start crs', alert_<host>.log in GI log directory shows:

2021-03-29 15:59:35.743:
[cssd(16077)]CRS-1713:CSSD daemon is started in clustered mode
2021-03-29 15:59:35.752:
[cssd(16077)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00011:) in /u01/app/grid/log/doprlhypdb1a/cssd/ocssd.log

and ocssd.log shows:

2021-03-29 15:59:35.746: [    CSSD][275134272]clssscGetParameterOLR: OLR fetch for parameter priority (15) failed with rc 21
2021-03-29 15:59:35.746: [    CSSD][275134272]clssscSetPrivEnv: Setting priority to 4
2021-03-29 15:59:35.752: [    CSSD][275134272]clssscSetPrivEnv: unable to set priority to 4
2021-03-29 15:59:35.752: [    CSSD][275134272]SLOS: cat=-2, opn=scls_set_priority_realtime, dep=1, loc=setsched
unable to escalate to real time

2021-03-29 15:59:35.752: [    CSSD][275134272](:CSSSC00011:)clssscExit: A fatal error occurred during initialization

Analysis and Solution:

Because cssd is very basic, its failure causes crsd to fail to start. Manually starting crsd (crsctl start res ora.crsd) won't work. Analysis should focus on the message "unable to set priority to 4" or "unable to escalate to real time". A search on MOS brings us to

Linux: GI OCSSD Fails to Start After cgroups Setting Change(Doc ID 1577784.1)
Grid Infrastructure: CSSD Fails to Start on Solaris Local Containers (zones)(Doc ID 1340694.1)

But our servers are Linux and do not use cgroup.

A Google search finds

(1) https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-a00069245en_us (Advisory: HPE Serviceguard for Linux - cmcld, cmproxyd, and qs Daemons May Fail To Run with Messages "Could Not/Failed To Set Realtime Priority")
(2) https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/7/html/tuning_guide/real_time_throttling
(3) https://access.redhat.com/articles/3696121 (How to configure a RHEL 7 system to be able to run programs requiring Real-Time Scheduling)
(4) https://access.redhat.com/solutions/2860951 (Processes requiring Real-Time Scheduling fail with "sched_setscheduler: Operation not permitted" error or similar)

It becomes clear that it's CPU accounting that prevents cssd from escalating to real-time priority, which is required in Oracle RAC (but not single-node). The best article to follow is (4). Following the step-by-step troubleshooting procedure, the cause of our problem has been identified to be a newly installed package insights-client:

doprlhypdb1a:~# egrep -ri "^(Startup)?CPU.*=(.*%|1|yes|true|on)" /usr/lib/systemd/system /etc/systemd/system
/usr/lib/systemd/system/insights-client.service:CPUQuota=30%

Note the string "CPUQuota=30%". Since there's no way to work around this on RHEL7 (on RHEL8 the package has improved, but not on RHEL7), we have to solve the problem by uninstalling the package (yum remove insights-client). After that and a reboot, cssd and the entire CRS stack can start up.

Comments:

* Having this package installed doesn't mean this service exists. `systemctl list-units | grep insights-client' returns nothing (`systemctl list-units | grep insights' returns insights-client-results.path and insights-client.timer, which are related though.). But /var/log/messages shows lines like "Starting to collect Insights data", "Uploading Insights data" at the time of reboot.

* Some webpages or forum answers including
CSSD Daemon Fails to Start with Error CRS-1726 and CRS-8503 (Doc ID 2714854.1)
CRS Will Not Successfully Restart After Node Reboot (Doc ID 2720950.1
solve the problem by disabling real-time throttle (`echo -1 > /proc/sys/kernel/sched_rt_runtime_us' or `sysctl -w kernel.sched_rt_runtime_us=-1'). But they fail to heed Red Hat's warning in http://access.redhat.com/solutions/1604133 "setting sched_rt_runtime_us to -1 can be extremely dangerous", because a badly written program that runs in real-time could hog the CPU to the extent that no human intervention is possible and you have to power down the server to resolve the hang. The default value of sched_rt_runtime_us, 950000 nanoseconds or 0.95 seconds, allows 0.05 seconds of time gap for every 1 second to allow the server do some other work, including processing your commands. Unfortunately, not finding the real cause but only adjusting this kernel parameter to a higher number such as 990000 doesn't solve the problem, so people including Oracle support documents suggest completely disabling the throttle mechanism.

* On RHEL7, you can experiment by increasing the priority of a process manually:
chrt -a -p <pid> #check current priority
chrt -r -p 99 <pid> #escalate it to round-robin real-time priority 99
chrt -a -p <pid> #check again, should show policy SCHED_RR and priority 99
If cssd cannot be escalated, the above test will likely fail, too.
Note: RHEL8 behaves differently; cssd runs in normal priority on single-node

* After the GI stack has started, we actually had another problem: DB instance failed to start, not even by `startup nomount', with errors in alert_<SID>.log:
No connectivity to other instances in the cluster during startup. Hence, LMON is terminating the instance.
It points to an interconnect problem. But ping or traceroute the partner node's private network IP was fine. We solved the problem by completely turning off reverse path filters on the networks:
net.ipv4.conf.<network interface>.rp_filter=0
in /etc/sysctl.conf and `sysctl -p', on both nodes. Then the database instance can be started.
This is a problem unrelated to the cssd's failure. But it's documented here since we solved it at the same time.