Friday, June 8, 2012

Using Solaris SMF

In The Name Of Allah The Beneficent The Merciful 


In most Unix environments, the startup process consists of a handful of autonomous boot scripts. They act independently of one another; unaware of what scripts have already run or which ones will run after them. When they are invoked, there is no serious error checking and no recourse if the script fails.

For Solaris 10, Sun introduced the Service Management Facility. SMF is a framework that handles system boot-up, process management, and self-healing. It addresses the shortcomings of startup scripts and creates an infrastructure to manage daemons after the host has booted.

A System V Unix host will start the sendmail daemon with the script S80sendmail from either the /etc/rc2.d or /etc/rc3.d directory. The script contains commands to start or stop sendmail, depending its invocation. The S portion of the filename denotes that this is a startup script, and the 80 is a sequence number that says when the script should run.

When S80sendmail runs, it won't be aware of any previous problems such as a NIS failure or /var never properly mounting. You could write tests into the script, but that increases startup time and the complexity of each script.

In the SMF environment, sendmail is a service. Solaris 10 defines a service as a persistent program that handles system or user requests. Services are expected to be fault tolerant and manageable by the operating system.

Services are identified by a URI known as a Fault Management Resource Identifier. The FMRI is broken up in a category hierarchy to help identify the service and what it is responsible for.

Here is the FMRI for sendmailssh, and other services running on a host:

svc:/network/smtp:sendmail
svc:/network/ssh:default
svc:/network/system/filesystem/local:default
lrc:/etc/rc2_d/S99audit

Here is the breakdown of the FMRI structure:

schemaservice nameinstance
category
svcs:/network/smtp:sendmail
svcs:/network/ssh:default
svcs:/system/filesystem/local:default
lrc:/etc/rc2_dS99audit

Each service has a manifest that describes the service and its management needs. It lists the service dependencies, the control scripts, and the actions to take when the service fails. The manifest starts out as an XML file that SMF imports into a central repository, which records the properties of all the services.

Sendmail will not run without the following dependencies:

  • Local filesystems are mounted
  • Basic network services are up
  • The host is aware of its domain name
  • The /etc/nsswitch.conf file exists
  • The /etc/mail/sendmail.cf file exists
  • Any nameservices in use (NIS, LDAP) are running
  • The auto filesystem, if in use, is running
  • Syslog, if in use, is running

Services in the SMF environment start up in parallel, but each service will become available only when all its listed dependencies are. This means the host will have a faster boot-up, and it will reduce the chances of a cascading failure of services. There is no explicit order to service startup, so sendmail or its dependencies could start up at any time.

Almost all services under the SMF are controlled by one service known as the restarter. The restarter controls the svc.startd daemon, which in turn starts the other services, tests their dependencies, and restarts them if they fail. When Solaris 10 boots up,svc.startd is one of the first programs spawned from /sbin/init.

It's still possible to use rcN.d scripts under Solaris 10; however, the programs started from these scripts will not be under SMF control. These are referred to as legacy run scripts. They have an FMRI, like normal services do, but the schema prefix is lrc:. Legacy run scripts are not initialized until all SMF services are up and running. When the host shuts down, they are the first stop scripts run before the SMF services are disabled.


Administering SMF

The two most common commands used to administer services are svcs and svcadm. Thesvcs command reports on the state of configured services, while the svcadm command controls the services.


$ svcs
STATE          STIME    FMRI
...
legacy_run     Sep_22   lrc:/etc/rc2_d/S99audit
...
online         Sep_22   svc:/system/svc/restarter:default
online         Sep_22   svc:/system/filesystem/autofs:default
online         Sep_22   svc:/system/system-log:default
online         Sep_22   svc:/network/smtp:sendmail
online         Sep_22   svc:/system/filesystem/local:default
online         Sep_22   svc:/network/ssh:default
online         Sep_22   svc:/system/dumpadm:default
online         Sep_22   svc:/network/loopback:default
...
Running svcs without arguments lists all running (online) services. The STATE column reports the service status; the STIME refers to when the service state last changed; and the FMRI identifies the service. If you want to list all services, not just those that are running, use the -a option.
The svcs command can also examine a single service by using either a full or partial FMRI. You can add the -v or -x options for extended output on the service. The -d option will list all the dependencies of a service.
$ svcs svc://localhost/network/ssh:default
STATE          STIME    FMRI
online         Sep_22   svc:/network/ssh:default

$ svcs -v svc:/network/ssh
STATE          NSTATE        STIME    CTID   FMRI
online         -             Sep_22       52 svc:/network/ssh:default

$ svcs -x network/ssh
svc:/network/ssh:default (SSH server)
 State: online since Thu Sep 22 07:51:15 2005
   See: sshd(1M)
   See: /var/svc/log/network-ssh:default.log
Impact: None.

$ svcs -d ssh STATE          STIME    FMRI online
Sep_22   svc:/network/loopback:default online         Sep_22
svc:/network/physical:default online         Sep_22
svc:/system/cryptosvc:default online         Sep_22
svc:/system/filesystem/local:default online         Sep_22
svc:/system/utmp:default online         Sep_22
svc:/system/filesystem/autofs:default
You can add the hostname localhost to an FMRI, or you can abbreviate it by removing the instance name and/or the categories. If the abbreviation results in multiple matches, they will all be listed. Here are two services that each have the name local in the last segment of the service name:
$ svcs local
STATE          STIME    FMRI
online         Sep_22   svc:/system/device/local:default
online         Sep_22   svc:/system/filesystem/local:default
You can also perform basic glob matching on service names:
$ svcs "*network*"
STATE          STIME    FMRI
disabled       Sep_22   svc:/network/rpc/keyserv:default
disabled       Sep_22   svc:/network/rpc/nisplus:default
disabled       Sep_22   svc:/network/nis/client:default
.....
online         Sep_22   svc:/network/nfs/client:default
online         Sep_22   svc:/network/security/ktkt_warn:default
online         Sep_22   svc:/network/telnet:default
online         Sep_22   svc:/network/nfs/rquota:default
$
Services can manage a running process or an OS state. By using the -p option with svcs, you can identify the processes associated with a service.
$ svcs -p svc:/network/ssh
STATE          STIME    FMRI
online         Sep_22   svc:/network/ssh:default
               Sep_22        345 sshd
The time the process started is listed under the STIME column.
In some cases, services do not have running processes associated with them. Tasks such as bringing a network interface up or mounting a disk partition do not require continuously running processes. The svc:/system/filesyste/local:default service runs the mount command once to mount all local filesystems, and then the script exits. SMF refers to these as transient services.
$ svcs -p svc:/system/filesystem/local:default
STATE          STIME    FMRI
online         Sep_22   svc:/system/filesystem/local:default
Finally, there are services that have running processes only when they are in use. When Sun designed the Service Management Framework, it merged the behavior of inetd and the way it handles network daemons. All the daemons that previously appeared in the/etc/inetd.conf file are now SMF-managed services. The difference is that these services use the inetd daemon as a starter, instead of svc.startd.
$ svcs -p rlogin
STATE          STIME    FMRI
online         Sep_22   svc:/network/login:rlogin

$ rlogin localhost
Password:
Last login: Sun Feb 19 23:49:56 from localhost
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005

$ svcs -p rlogin
STATE          STIME    FMRI
online         Sep_22   svc:/network/login:rlogin
               23:50:41    23833 in.rlogind
               23:50:41    23836 bash
               23:50:48    23840 svcs

$ exit
logout
Connection to localhost closed.
$ svcs -p rlogin
STATE          STIME    FMRI
online         Sep_22   svc:/network/login:rlogin
If you kill a process under the control of service management, the program that originally started it will restart it. Here's an example of an Apache2 service that has been running since January 5. First, I double-checked the service by grepping for the process IDs, which match the ones listed with the service. Then, I sent the TERM signal to the parent of all of the child processes.
# svcs -p http
STATE          STIME    FMRI
online         Jan_05   svc:/application/http:apache2
               Jan_05      12377 httpd
               Jan_05      12378 httpd
               Jan_05      12379 httpd
               Jan_05      12380 httpd

# ps -ef | grep http
    root 12377     1   0   Jan 05 ?           2:14 /opt/apache2/bin/httpd -DPERL
    root 23521 23520   0 20:33:01 pts/1       0:00 grep http
    http 12378 12377   0   Jan 05 ?           0:00 /opt/apache2/bin/httpd -DPERL
    http 12380 12377   0   Jan 05 ?           0:00 /opt/apache2/bin/httpd -DPERL

# kill -TERM 12377

# ps -ef | grep http
    root 23527 23520   0 20:33:25 pts/1       0:00 grep http
    root 23580     1   0 20:33:09 ?           0:01 /opt/apache2/bin/httpd -DPERL
    http 23581 23580   0 20:33:10 ?           0:00 /opt/apache2/bin/httpd -DPERL
    http 23582 23580   0 20:33:12 ?           0:00 /opt/apache2/bin/httpd -DPERL
    http 23583 23580   0 20:33:12 ?           0:00 /opt/apache2/bin/httpd -DPERL

# svcs -p svc:/application/http:apache2
STATE          STIME    FMRI
online         20:33:09 svc:/application/http:apache2
               20:33:09    23580 httpd
               20:33:10    23581 httpd
               20:33:11    23582 httpd
               20:33:11    23583 httpd
I then rechecked for the httpd processes to find that the svc.start daemon started new Apache servers. Then I examined the http service. It reported that the service time had changed, and listed the new process IDs.
The following table lists some SMF services, their associated processes, and their restarter FMRI:
ServiceProcessesRestarter
svc:/system/svc/restarter:sendmailsvc.startdnone
svc:/network/smtp:sendmailsendmailsvc:/system/svc/restarter:default
svc:/network/ssh:defaultsshdsvc:/system/svc/restarter:default
svc:/system/sac:defaultsac
ttymon
svc:/system/svc/restarter:default
svc:/network/inetd:defaultinetdsvc:/system/svc/restarter:default
svc:/network/telnet:defaultin.telnetdsvc:/network/inetd:default
If you want to know the restarter for a service, use svcs -l. Use svcs -R with a full FMRI to list all of the services a restarter service controls.
$ svcs -l network/ssh
fmri         svc:/network/ssh:default
name         SSH server
enabled      true
state        online
next_state   none
state_time   Thu Sep 22 07:51:15 2005
logfile      /var/svc/log/network-ssh:default.log
restarter    svc:/system/svc/restarter:default
contract_id  52
dependency   require_all/none svc:/system/filesystem/local (online)
dependency   optional_all/none svc:/system/filesystem/autofs (online)
dependency   require_all/none svc:/network/loopback (online)
dependency   require_all/none svc:/network/physical (online)
dependency   require_all/none svc:/system/cryptosvc (online)
dependency   require_all/none svc:/system/utmp (online)
dependency   require_all/restart file://localhost/etc/ssh/sshd_config (online)

$ svcs -R svc:/system/svc/restarter:default
STATE          STIME    FMRI
disabled       Sep_22   svc:/system/metainit:default
disabled       Sep_22   svc:/network/rpc/keyserv:default
online         Sep_22   svc:/system/svc/restarter:default
online         Sep_22   svc:/network/pfil:default
online         Sep_22   svc:/milestone/name-services:default
online         Sep_22   svc:/network/loopback:default
....

Controlling Services

Enable or disable a service using the svcadm command:


# svcs -x telnet
svc:/network/telnet:default (Telnet server)
 State: online since Thu Sep 22 07:51:11 2005
   See: in.telnetd(1M)
   See: telnetd(1M)
Impact: None.

# svcadm disable svc:/network/telnet:default

# svcs -x telnet
svc:/network/telnet:default (Telnet server)
 State: disabled since Sun Feb 19 23:32:40 2006
Reason: Disabled by an administrator.
   See: http://sun.com/msg/SMF-8000-05
   See: in.telnetd(1M)
   See: telnetd(1M)
Impact: This service is not running.
The configuration state of a service is recorded in the service repository, so changes to that state persist across reboots. If you disable telnet, rebooting the host won't bring it back up. You must explicitly reenable it from the command line. Make a temporary change to the state of a service by adding the -t option to svcadm:
# svcadm disable -t network/telnet
There are six different service states for configured SMF services.
online
The service is enabled and is running or available to run, or the tasks associated with this service are complete.
offline
The service is enabled but has not yet reached the online state. It is either in the process of starting up, or the dependencies of the service are not yet online.
disabled
The service is not enabled and should not be running.
degraded
The service is running but in a limited capacity. The Sun documentation is very vague about what "degraded" means, and suggests that the programs associated with the service are responsible for making that determination.
maintenance
The service has a problem, and it cannot continue to run or complete a task. A service in this state usually requires administrative intervention. The restarter for the service won't try to bring the service online until it has been cleared.
legacy-run
This is the default state for legacy run services.
The process of starting or stopping a service is listed in the service manifest. Most services have a method script associated with them that handle starting and stopping the service, just like an rc script. The restarter service runs this script to bring the service online or offline.
The svcadm command gives administrators a standard interface for controlling services.svcadm recognizes several service management commands:
enable
Brings the service online.
disable
Takes the service offline.
restart
Restarts the service process, either by performing a disable followed by an enable, or a specific programmed method to restart the service.
refresh
The refresh method rereads the service properties from the repository. This is useful if someone made configuration changes to the service definition. If that service is controlled by svc.startd and that service also defines an internal refresh method, then the refresh method runs. A program that is usually refreshed rereads its configuration file.
clear
Resets a service that is in the maintenance state.
mark (degraded or maintenance)
Deliberately sets the state of a service to either degraded or maintenance. This is usually used for debugging a service.
The svcadm command is more picky about wildcards, unlike svcs. You can still use abbreviated FMRIs and wildcards, as long as they match only one full FMRI.
# svcadm refresh svc:/network/login
svcadm: Pattern 'svc:/network/login' matches multiple instances:
        svc:/network/login:rlogin
        svc:/network/login:klogin
        svc:/network/login:eklogin

# svcadm refresh "svc:/*rlogin"

# svcs "*rlogin"
STATE          STIME    FMRI
online         23:24:17 svc:/network/login:rlogin

Boot-up and Runlevels

Because rc scripts are no longer the preferred method used to manage programs, Sun has enhanced the runlevel model with service milestones.


In Unix, runlevel one is single user mode, two is multiuser mode, and three is multiuser mode with file sharing or network services. In each runlevel, there is a core set of services that must be brought online.
For example, levels one, two, and three all require a minimum amount of local filesystems to be mounted, and network interfaces to be online. Runlevel two requires all internet services to be online, and users must be able to log on to the host. Runlevel three requires everything level two does, plus the ability to share files by NFS.
Milestones are services that don't run any applications but do have a dependent list of services. Once those services are online, the milestone is marked online. The milestone ensures an expected group of services are up and running, so you don't have to check each individual service.
Here is a list of milestones currently online. In this case, seven milestones are online because they all had their dependencies met.
$ svcs "svc:/milestone/*"
online         Sep_22   svc:/milestone/name-services:default
online         Sep_22   svc:/milestone/network:default
online         Sep_22   svc:/milestone/devices:default
online         Sep_22   svc:/milestone/single-user:default
online         Sep_22   svc:/milestone/sysconfig:default
online         Sep_22   svc:/milestone/multi-user:default
online         Sep_22   svc:/milestone/multi-user-server:default
Here is a list of milestones and their equivalant rc levels.
MilestoneRC LevelDescription
svc:/milestone/devices:defaultDevices
svc:/milestone/network:defaultNetwork interfaces online
svc:/milestone/single-user:default1Single-user mode
svc:/milestone/sysconfig:defaultBasic system configuration
svc:/milestone/name-services:defaultAny one of the NIS, NIS+, DNS, or LDAP services
svc:/milestone/multi-user:default2Multiuser mode
svc:/milestone/milti-user-server:default3Multiuser server mode
Consider the dependencies for svc:/milestone/multi-user:default:
$ svcs -d milestone/multi-user
STATE          STIME    FMRI
disabled       Sep_22   svc:/network/smtp:sendmail
online         Sep_22   svc:/milestone/name-services:default
online         Sep_22   svc:/milestone/single-user:default
online         Sep_22   svc:/system/filesystem/local:default
online         Sep_22   svc:/network/rpc/bind:default
online         Sep_22   svc:/milestone/sysconfig:default
online         Sep_22   svc:/system/utmp:default
online         Sep_22   svc:/network/inetd:default
online         Sep_22   svc:/network/nfs/client:default
online         Sep_22   svc:/system/system-log:default
Milestones are checkpoints in the operating system. Before multiuser mode can be online, network/smtpmilestone/name-servicesmilestone/single-userrpc/bind, and the other services listed must be online as well.
One of the dependent services listed is milestone/single-user, which has its own list of dependencies:
$ svcs -d milestone/single-user
STATE          STIME    FMRI
disabled       Sep_22   svc:/system/metainit:default
online         Sep_22   svc:/network/loopback:default
online         Sep_22   svc:/milestone/network:default
online         Sep_22   svc:/milestone/devices:default
online         Sep_22   svc:/system/filesystem/minimal:default
online         Sep_22   svc:/system/manifest-import:default
online         Feb_21   svc:/system/identity:node
Instead of making all milestones dependent on common services, the milestones are set up as cascading checkpoints. When you change the dependency list formilestone/single-user, you don't need to change the dependencies formilestone/multi-user-server.
To change the milestone level of the host, use the svcadm command:
$ svcadm milestone -d [milestone FMRI]
The -d option lets you set your choice as the default milestone. This option will persist across reboots.
As far as shutting down the host, the shutdown or init commands are still the preferred methods of performing a safe shutdown or reboot.

Debugging Problems with Services

Sometimes services fail due to unavoidable circumstances. For example, a bad configuration file will prevent the Apache process from starting. If the service fails, it will usually end up being marked in the maintenance state. To correct this problem, you need to know where to look for problems.
# svcs http
STATE          STIME    FMRI
maintenance    20:51:31 svc:/application/http:apache2

# svcs -x http
svc:/application/http:apache2 (Apache2 Server)
 State: maintenance since Mon Feb 20 20:51:31 2006
Reason: Method failed.
   See: http://sun.com/msg/SMF-8000-8Q
   See: httpd(8)
   See: /var/svc/log/application-http:apache2.log
Impact: This service is not running.
Each service keeps a log with the output from the method script. Most errors will appear in this file, as long as the program writes out errors to stdout or stderr.
# tail /var/svc/log/application-http\:apache2.log
Syntax error on line 23 of /etc/opt/apache2/httpd.conf:
Invalid command 'Kisten', perhaps mis-spelled or defined by a module not included in the server configuration
[ Feb 20 20:50:30 Method "stop" exited with status 0 ]
[ Feb 20 20:51:31 Method or service exit timed out.  Killing contract 957 ]
[ Feb 20 20:51:31 Rereading configuration. ]
Another option is to check the log of svc.startd, as it is the restarter process for the Apache service.
# tail /var/svc/log/svc.stard.log
Feb 20 20:51:31/3: svc:/application/http:apache2: Method or service exit timed
    out.  Killing contract 957.
Feb 20 20:51:31/520: application/http:apache2 failed
After you have corrected the error, use the svcadm command to clear the maintenance state.
# svcadm clear application/http:apache2

# svcs -x http
svc:/application/http:apache2 (Apache2 Server)
 State: online since Mon Feb 20 21:00:22 2006
   See: httpd(8)
   See: /var/svc/log/application-http:apache2.log
Impact: None.
The important thing to remember is that the Service Management Facility isn't designed to block normal access to programs or processes. If you really need to perform serious testing of Apache httpd or other programs, it's still possible to invoke these commands from the command line. If a service is in the maintenance state, then go ahead and runhttp -t, or sendmail -bD, or whatever command you need to run. SMF will not interfere with processes that did not initiate from its own starter.

No comments:

Post a Comment