Hi,
while building and testing Solaris sigar, I frequently saw this test failure:
Assertion failed: IS_IMPL_U64(proc_mem.page_faults), file t_sigar_proc.c, line 91
sh: line 10: 13626: Abort(coredump)
some DTracing and general staring-at-code later I came to the conclusion that this (l 651 ff in solaris_sigar.c):
--- snip ---
int sigar_proc_mem_get(sigar_t *sigar, sigar_pid_t pid,
sigar_proc_mem_t *procmem)
{
int status = sigar_proc_psinfo_get(sigar, pid);
...
if (sigar_proc_usage_get(sigar, &usage, pid) == SIGAR_OK) {
procmem->minor_faults = usage.pr_minf;
procmem->major_faults = usage.pr_majf;
procmem->page_faults =
procmem->minor_faults +
procmem->major_faults;
}
else {
procmem->minor_faults = SIGAR_FIELD_NOTIMPL;
procmem->major_faults = SIGAR_FIELD_NOTIMPL;
procmem->page_faults = SIGAR_FIELD_NOTIMPL;
}
return SIGAR_OK;
--- snip ---
... is missing a check for ESRCH as a return code from sigar_proc_usage_get() - if the process under examination disappears from /proc after the call to sigar_proc_psinfo_get() but before the call to _usage_get(), the latter will return ESRCH which this code will happily ignore, and, even worse IMO, NOT IMPLEMENTED will be reported, which is plain wrong in this case.
I suggest the following patch to address this issue:
=== modified file 'src/os/solaris/solaris_sigar.c'
--- src/os/solaris/solaris_sigar.c 2009-09-15 21:54:00 +0000
+++ src/os/solaris/solaris_sigar.c 2010-05-31 15:42:20 +0000
@@ -663,13 +663,16 @@
procmem->resident = pinfo->pr_rssize << 10;
procmem->share = SIGAR_FIELD_NOTIMPL;
- if (sigar_proc_usage_get(sigar, &usage, pid) == SIGAR_OK) {
+ if ((status = sigar_proc_usage_get(sigar, &usage, pid)) == SIGAR_OK) {
procmem->minor_faults = usage.pr_minf;
procmem->major_faults = usage.pr_majf;
procmem->page_faults =
procmem->minor_faults +
procmem->major_faults;
}
+ else if (status == ESRCH) { /* pid disappeared ... just return error */
+ return status;
+ }
else {
procmem->minor_faults = SIGAR_FIELD_NOTIMPL;
procmem->major_faults = SIGAR_FIELD_NOTIMPL;