Patch-ID# 111605-01 Keywords: maintenance release automatic rescheduling Synopsis: Sun Grid Engine: Sun Grid Engine 5.2.2 maintenance patch Date: Jul/17/2001 Solaris Release: 7 8 SunOS Release: 5.7 5.8 Unbundled Product: Sun Grid Engine Unbundled Release: 5.2.2 Xref: Topic: suggested maintenance release for Sun Grid Engine 5.2.2 Relevant Architectures: sparc BugId's fixed with this patch: 4396922 4398025 4399710 4404288 4407599 4419370 4419384 4419390 4422660 4422667 4422676 4422678 4425985 4426107 4426586 4428201 4432722 4432744 4433112 4437022 4437029 4463581 Changes incorporated in this version: 4396922 4398025 4399710 4404288 4407599 4419370 4419384 4419390 4422660 4422667 4422676 4422678 4425985 Patches accumulated and obsoleted by this patch: Patches which conflict with this patch: Patches required with this patch: Obsoleted by: Files included with this patch: /bin/solaris64/cod_commd /bin/solaris64/cod_coshepherd /bin/solaris64/cod_execd /bin/solaris64/cod_qmaster /bin/solaris64/cod_qstd /bin/solaris64/cod_schedd /bin/solaris64/cod_shadowd /bin/solaris64/cod_shepherd /bin/solaris64/codcommdcntl /bin/solaris64/qacct /bin/solaris64/qalter /bin/solaris64/qconf /bin/solaris64/qdel /bin/solaris64/qhold /bin/solaris64/qhost /bin/solaris64/qlogin /bin/solaris64/qmake /bin/solaris64/qmod /bin/solaris64/qmon /bin/solaris64/qresub /bin/solaris64/qrls /bin/solaris64/qrsh /bin/solaris64/qselect /bin/solaris64/qsh /bin/solaris64/qstat /bin/solaris64/qsub /bin/solaris64/qtcsh /examples /examples/jobsbin/solaris64/work /utilbin/solaris64/adminrun /utilbin/solaris64/checkprog /utilbin/solaris64/checkuser /utilbin/solaris64/filestat /utilbin/solaris64/gethostbyaddr /utilbin/solaris64/gethostbyname /utilbin/solaris64/gethostname /utilbin/solaris64/getservbyname /utilbin/solaris64/loadcheck /utilbin/solaris64/now /utilbin/solaris64/qrsh_starter /utilbin/solaris64/rlogin /utilbin/solaris64/rsh /utilbin/solaris64/rshd /utilbin/solaris64/testsuidroot /utilbin/solaris64/uidgid Problem Description: 4396922 Submit option '-ac name=value' broken 4398025 Need more diagnostic information in qrsh error messages 4399710 cod_shadowd core dumps 4404288 Rescheduling of slave controlled DMP jobs causes jobs to hang in 't' state 4407599 Large slot ranges in PE request of jobs causes high cpu usage in scheduler 4419370 cannot restart qmaster when an execution host is removed from NIS 4419384 qmaster exits when queues with very long queue names are added 4419390 commd core dumps if commprocs IP address is not resolvable into hostname 4422660 Job restart is not triggered when jobs execution node becomes unavailable 4422667 qtcsh: Parsing of qtask file does not handle quotes correctly 4422676 queue in unknown/alarm state due to inconsistent hostname resolving 4422678 bug in parsing of -ac/-dc/-sc/-v/-V submit options with spaces in variable value 4425985 Solaris execd crashes at multiprocessor machines 4426107 job abort mail: job died through signal 4426586 default resource limit in effect even if resource is not managed at host/queue 4428201 Resource h_vmem/s_vmem not limited in Linux version 4432722 negative requests on a consumable resource confuse resource management 4432744 per process resource limits are not adjusted for multithreaded jobs 4433112 Environment variable QRSH_WRAPPER ignored by qmake 4437022 Admin defined -notify job option partially broken 4437029 Opening "Job Control" dialog can cause core dump of qmon 4463581 getpwnam() error should be more instructive Patch Installation Instructions: -------------------------------- For Solaris 2.0-2.6 releases, refer to the Install.info file and/or the README within the patch for instructions on using the generic 'installpatch' and 'backoutpatch' scripts provided with each patch. For Solaris 7-8 releases, refer to the man pages for instructions on using 'patchadd' and 'patchrm' scripts provided with Solaris. Any other special or non-generic installation instructions should be described below as special instructions. The following example installs a patch to a standalone machine: example# patchadd /var/spool/patch/104945-02 The following example removes a patch from a standalone system: example# patchrm 104945-02 For additional examples please see the appropriate man pages. Special Install Instructions: ----------------------------- Please visit our home page at http://www.sun.com/gridware for more information about the patches which update your Sun Grid Engine release 5.2.2 to 5.2.3. Make sure to install all patches for this maintenance release, including the patches for the "doc" and "common" package and all binary sets (Solaris 32-bit and 64-bit binares) as needed. Shutting down Sun Grid Engine ----------------------------- You can upgrade from 5.2.2 with pending jobs. So you just need to drain your cluster of running jobs by disabling all queues: # qmod -d '*' Shutdown your cluster with the following commands: # qconf -kej (shutdown execd and kill running jobs) (wait 1-2 minutes) # qstat -f (verify the status of the cluster) # qconf -ks (kill scheduler) # qconf -km (kill qmaster) # $CODINE_ROOT/util/shutdown_commd.sh -all (kill cod_commd's) (kill all cod_shadowd's) Now verify that all Sun Grid Engine daemons (cod_qmaster, cod_schedd, cod_execd, cod_commd, cod_shepherd, cod_shadowd) on all hosts are finished. If not, terminate them with the 'kill' command. Remove your execd spool directories ----------------------------------- This is a safe method to make sure that no hung jobs can cause any problems after the upgrade. The execd spool directory is configured through the global cluster configuration and has the unqualified host name appended. By default it is located in $CODINE_ROOT/default/spool/ You can recursively delete all these directories but please make sure NOT to delete the qmaster spool directory. After installing the patches read the file 'doc/UPGRADE' for more information how to update your startup script and restart Sun Grid Engine. README -- Last modified date: Friday, April 12, 2002