SAP HANA Database Alerts Overview
Alert ID | Alert Name | Description | How to handle |
0 | Internal statistics
server problem | Identifies internal
statistics server problem | |
1 | Host physical
memory usage | Determines what
percentage of total physical memory available on the host is used. All
processes consuming memory are considered, including non-SAP HANA processes | |
2 | Disk usage | Determines what
percentage of each disk containing data, log, and trace files is used. This
includes space used by non-SAP HANA files | |
3 | Inactive services | Identifies inactive
services | |
4 | Restarted services | Identifies services
that have restarted since the last time the check was performed | |
5 | Host CPU Usage | Determines the
percentage CPU idle time on the host and therefore whether or not CPU
resources are running low | |
10 | Delta merge
(mergedog) configuration | Determines whether
or not the 'active' parameter in the 'mergedog' section of system
configuration file(s) is 'yes'. mergedog is the system process that
periodically checks column tables to determine whether or not a delta merge
operation needs to be executed | |
12 | Memory usage of
name server | Determines what
percentage of allocated shared memory is being used by the name server on a host | |
16 | Lock wait timeout
configuration | Determines whether
the 'lock_wait_timeout' parameter in the 'transaction' section of the
indexserver.ini file is between 100,000 and 7,200,000 | |
17 | Record count of
non-partitioned column-store tables | Determines the number
of records in non-partitioned column-store tables. Current table size is not
critical. Partitioning need only be considered if tables are expected to grow
rapidly (a non-partitioned table cannot contain more than 2,147,483,648 (2
billion) rows) | |
20 | Table growth of
non-partitioned column-store tables | Determines the
growth rate of non-partitioned columns tables | |
21 | Internal event | Identifies internal
database events | |
22 | Notification of all
alerts | Determines whether
or not there have been any alerts since the last check and if so, sends a
summary e-mail to specified recipients | Investigate the alerts |
23 | Notification of
medium and high priority alerts | Determines whether
or not there have been any medium and high priority alerts since the last
check and if so, sends a summary e-mail to specified recipients | Investigate the alerts |
24 | Notification of
high priority alerts | Determines whether
or not there have been any high priority alerts since the last check and if
so, sends a summary e-mail to specified recipients | Investigate the alerts |
25 | Open connections | Determines what
percentage of the maximum number of permitted SQL connections are open. The
maximum number of permitted connections is configured in the
"session" section of the indexserver.ini file | |
26 | Unassigned volumes | Identifies volumes
that are not assigned a service | |
27 | Record count of
column-store table partitions | Determines the
number of records in the partitions of column-store tables. A table partition
cannot contain more than 2,147,483,648 (2 billion) rows. | |
28 | Most recent
savepoint operation | Determines how long
ago the last savepoint was defined, that is, how long ago a complete,
consistent image of the database was persisted to disk | |
29 | Size of delta
storage of column-store tables | Determines the size
of the delta storage of column tables | |
30 | Check internal disk
full event | Determines whether
or not the disks to which data and log files are written are full. A
disk-full event causes your database to stop and must be resolved | |
31 | License expiry | Determines how many
days until your license expires. Once your license expires, you can no longer
use the system, except to install a new license | |
32 | Log mode LEGACY | Determines whether
or not the database is running in log mode "legacy". Log mode
"legacy" does not support point-in-recovery and is not recommended
for productive systems | 1900296;2803904 |
33 | Log mode OVERWRITE | Determines whether
or not the database is running in log mode "overwrite". Log mode
"overwrite" does not support point-in-recovery (only recovery to a
data backup) and is not recommended for productive systems | |
34 | Unavailable volumes | Determines whether
or not all volumes are available | |
35 | Existence of data
backup | Determines whether
or not a data backup exists. Without a data backup, your database cannot be
recovered | |
36 | Status of most
recent data backup | Determines whether
or not the most recent data backup was successful | |
37 | Age of most recent
data backup | Determines the age
of the most recent successful data backup | |
38 | Status of most
recent log backups | Determines whether
or not the most recent log backups for services and volumes were successful | |
39 | Long-running
statements | Identifies
long-running SQL statements | |
40 | Total memory usage
of column-store tables | Determines what
percentage of the effective allocation limit is being consumed by individual
column-store tables as a whole (that is, the cumulative size of all of a
table's columns and internal structures) | |
41 | In-memory DataStore
activation | Determines whether
or not there is a problem with the activation of an in-memory DataStore
object | |
42 | Long-idling cursors | Identifies
long-idling cursors | |
43 | Memory usage of
services | Determines what
percentage of its effective allocation limit a service is using | |
44 | Licensed memory
usage | Determines what
percentage of licensed memory is used | |
45 | Memory usage of
main storage of column-store tables | Determines what
percentage of the effective allocation limit is being consumed by the main
storage of individual column-store tables | |
46 | RTEdump files | Identifies new
runtime dump files (*rtedump*) have been generated in the trace directory of
the system. These contain information about, for example, build, loaded
modules, running threads, CPU, and so on | |
47 | Long-running
serializable transactions | Identifies
long-running serializable transactions | Close the serializable transaction in the application or kill the
connection by executing the SQL statement ALTER SYSTEM DISCONNECT SESSION
<LOGICAL_CONNECTION_ID>. For more information, see the table
HOST_LONG_SERIALIZABLE_TRANSACTION (_SYS_STATISTICS). |
48 | Long-running
uncommitted write transactions | Identifies
long-running uncommitted write transactions | |
49 | Long-running
blocking situations | Identifies
long-running blocking situations | |
50 | Number of diagnosis
files | Determines the
number of diagnosis files written by the system (excluding zip-files). An
unusually large number of files can indicate a problem with the database (for
example, problem with trace file rotation or a high number of crashes) | |
51 | Size of diagnosis
files | Identifies large
diagnosis files. Unusually large files can indicate a problem with the
database | |
52 | Crashdump files | Identifies new
crashdump files that have been generated in the trace directory of the system | |
53 | Pagedump files | Identifies new
pagedump files that have been generated in the trace directory of the system | |
54 | Savepoint duration | Identifies
long-running savepoint operations | |
55 | Columnstore unloads | Determines how many
columns in columnstore tables have been unloaded from memory. This can
indicate performance issues | |
56 | Python trace
activity | Determines whether
or not the python trace is active and for how long. The python trace affects
system performance | |
57 | Instance secure
store file system (SSFS) inaccessible | Determines if the
instance secure store in the file system (SSFS) of your SAP HANA system is
accessible to the database | |
58 | Plan cache size | Determines whether
or not the plan cache is too small | |
59 | Percentage of
transactions blocked | Determines the
percentage of transactions that are blocked | |
60 | Sync/Async read
ratio | Identifies a bad
trigger asynchronous read ratio. This means that asynchronous reads are
blocking and behave almost like synchronous reads. This might have negative
impact on SAP HANA I/O performance in certain scenarios | |
61 | Sync/Async write
ratio | Identifies a bad
trigger asynchronous write ratio. This means that asynchronous writes are
blocking and behave almost like synchronous writes. This might have negative
impact on SAP HANA I/O performance in certain scenarios | |
62 | Expiration of
database user passwords | Identifies database
users whose password is due to expire in line with the configured password
policy. If the password expires, the user will be locked. If the user in
question is a technical user, this may impact application availability. It is
recommended that you disable the password lifetime check of technical users
so that their password never expires (ALTER USER <username> DISABLE
PASSWORD LIFETIME) | |
63 | Granting of
SAP_INTERNAL_HANA_SUPPORT role | Determines if the
internal support role (SAP_INTERNAL_HANA_SUPPORT) is currently granted to any
database users | |
64 | Total memory usage
of table-based audit log | Determines what
percentage of the effective memory allocation limit is being consumed by the
database table used for table-based audit logging. If this table grows too
large, the availability of the database could be impacted | |
65 | Runtime of the log
backups currently running | Determines whether
or not the most recent log backup terminates in the given time | |
66 | Storage snapshot is
prepared | Determines whether
or not the period, during which the database is prepared for a storage
snapshot, exceeds a given threshold | |
67 | Table growth of
rowstore tables | Determines the
growth rate of rowstore tables | |
68 | Total memory usage
of row store | Determines the
current memory size of a row store used by a service | |
69 | Enablement of
automatic log backup | Determines whether
automatic log backup is enabled | |
70 | Consistency of
internal system components after system upgrade | Verifies the
consistency of schemas and tables in internal system components (for example,
the repository) after a system upgrade | Contact SAP support |
71 | Row store
fragmentation | Check for
fragmentation of row store | |
72 | Number of log
segments | Determines the
number of log segments in the log volume of each serviceCheck for number of
log segments | Make sure that log backups are being automatically created and that
there is enough space available for them. Check whether the system has been
frequently and unusually restarting services. If it has, then resolve the
root cause of this issue and create log backups as soon as possible. |
73 | Overflow of
rowstore version space | Determines the
overflow ratio of the rowstore version space | Identify the connection or transaction that is blocking version
garbage collection. You can do this in the SAP HANA studio by executing
"MVCC Blocker Statement" and "MVCC Blocker Transaction"
available on the System Information tab of the Administration editor. If
possible, kill the blocking connection or cancel the blocking
transaction. Alert is relevant only to HANA < 2.0 SPS 03 |
74 | Overflow of
metadata version space | Determines the
overflow ratio of the metadata version space | Identify the connection or transaction that is blocking version
garbage collection. You can do this in the SAP HANA studio by executing
"MVCC Blocker Statement" and "MVCC Blocker Transaction"
available on the System Information tab of the Administration editor. If
possible, kill the blocking connection or cancel the blocking
transaction. Alert is relevant only to HANA < 2.0 SPS 03 |
75 | Rowstore version
space skew | Determines whether
the rowstore version chain is too long | Identify the connection or transaction that is blocking version
garbage collection. You can do this in the SAP HANA studio by executing
"MVCC Blocker Statement" and "MVCC Blocker Transaction"
available on the System Information tab of the Administration editor. If
possible, kill the blocking connection or cancel the blocking transaction.
For your information, you can find table information by using query
"SELECT * FROM TABLES WHERE TABLE_OID = <table object ID>". |
76 | Discrepancy between
host server times | Identifies
discrepancies between the server times of hosts in a scale-out system | Check operating system time settings. |
77 | Database disk usage | Determines the
total used disk space of the database. All data, logs, traces and backups are
considered | |
78 | Connection between
systems in system replication setup | Identifies closed
connections between the primary system and a secondary system. If connections
are closed, the primary system is no longer being replicated | Investigate why connections are closed (for example, network problem)
and resolve the issue. |
79 | Configuration
consistency of systems in system replication setup | Identifies
configuration parameters that do not have the same value on the primary
system and a secondary system. Most configuration parameters should have the
same value on both systems because the secondary system has to take over in
the event of a disaster | If the identified configuration parameter(s) should have the same
value in both systems, adjust the configuration. If different values are
acceptable, add the parameter(s) as an exception in
global.ini/[inifile_checker]. |
80 | Availability of
table replication | Monitors error
messages related to table replication | Determine which tables encountered the table replication error using
system view M_TABLE_REPLICAS, and then check the corresponding indexserver
alert traces. |
81 | Cached view size | Determines how much
memory is occupied by cached view | Increase the size of the cached view. In the "result_cache"
section of the indexserver.ini file, increase the value of the
"total_size" parameter. |
82 | Timezone conversion | Compares SAP HANA
internal timezone conversion with Operating System timezone conversion | Update SAP HANA internal timezone tables (refer to SAP
notes 1791342, 1932132). |
83 | Table consistency | Identifies the
number of errors and affected tables detected by
_SYS_STATISTICS.Collector_Global_Table_Consistency | Contact SAP Support |
84 | Insecure instance
SSFS encryption configuration | Determines whether
the master key of the instance secure store in the file system (SSFS) of your
SAP HANA system has been changed. If the SSFS master key is not changed after
installation, it cannot be guaranteed that the initial key is unique | Change the instance SSFS master key as soon as possible. For more
information, see the SAP HANA Administration Guide. |
85 | Insecure systemPKI
SSFS encryption configuration | Determines whether
the master key of the secure store in the file system (SSFS) of your system's
internal public key infrastructure (system PKI) has been changed. If the SSFS
master key is not changed after installation, it cannot be guaranteed that
the initial key is unique | Change the system PKI SSFS master key as soon as possible. For more
information, see the SAP HANA Administration Guide. |
86 | Internal
communication is configured too openly | Determines whether
the ports used by SAP HANA for internal communication are securely
configured. If the "listeninterface" property in the
"communication" section of the global.ini file does not have the
value ".local" for single-host systems and ".all" or
".global" for multiple-host systems, internal communication
channels are externally exploitable | |
87 | Granting of SAP
HANA DI support privileges | Determines if
support privileges for the SAP HANA Deployment Infrastructure (DI) are
currently granted to any database users or roles | Check if the corresponding users still need the privileges. If not,
revoke the privileges from them. |
88 | Auto merge for
column-store tables | Determines if the
delta merge of a table was executed successfully or not | The delta merge was not executed successfully for a table. Check the
error description in view M_DELTA_MERGE_STATISTICS and also Indexserver
trace. |
89 | Missing volume
files | Determines if there
is any volume file missing | Volume file missing, database instance is broken, stop immediately all
operations on this instance. |
90 | Status of HANA
platform lifecycle management configuration | Determines if the
system was not installed/updated with the SAP HANA Database Lifecycle Manager
(HDBLCM) | Install/update
SAP HANA Database Lifecycle Manager (HDBLCM). Implement SAP note 2078425 |
91 | Plan cache hit
ratio | Determines whether
or not the plan cache hit ratio is too low | Increase the size of the plan cache. In the "sql" section of
the indexserver.ini file, increase the value of the
"plan_cache_size" parameter. |
92 | Root keys of
persistent services are not properly synchronized | Not all services
that persist data could be reached the last time the root key change of the
data volume encryption service was changed. As a result, at least one service
is running with an old root key | Trigger a savepoint for this service or flush the SSFS cache using
hdbcons. |
93 | Streaming License
expiry | Determines how many
days until your streaming license expires. Once your license expires, you can
no longer start streaming projects | Obtain a valid license and install it. For the exact expiration date,
see the monitoring view M_LICENSES. |
94 | Log replay backlog
for system replication secondary | System Replication
secondary site has a higher log replay backlog than expected | Investigate on secondary site, why log replay backlog is increased. |
95 | Availability of
Data Quality reference data (directory files) | Determine the Data
Quality reference data expiration dates | Download the latest Data Quality reference data files and update the
system. (For more details about updating the directories, see the
Administration Guide.) |
96 | Long-running tasks | Identifies all
long-running tasks | Investigate the long-running tasks. For more information, see the task
statistics tables or views in _SYS_TASK schema and trace log. |
97 | Granting of SAP
HANA DI container import privileges | Determines if the
container import feature of the SAP HANA Deployment Infrastructure (DI) is
enabled and if import privileges for SAP HANA DI containers are currently
granted to any database users or roles | Check if the identified users still need the privileges. If not,
revoke the privileges from them and disable the SAP HANA DI container import
feature. |
98 | LOB garbage
collection activity | Determines whether
or not the lob garbage collection is activated | Activate the LOB garbage collection using the corresponding
configuration parameters. |
99 | HANA version | Checks the
installed HANA version | Please check if your SAP HANA system can be upgraded to a newer
version. |
100 | Unsupported
operating system in use | Determines if the
operating system of the SAP HANA Database hosts is supported | Upgrade the operating system to a supported version (see SAP HANA
Master Guide for more information). |
101 | SQL access for SAP
HANA DI technical users | Determines if SQL
access has been enabled for any SAP HANA DI technical users. SAP HANA DI
technical users are either users whose names start with '_SYS_DI' or SAP HANA
DI container technical users (<container name>, <container
name>#DI, <container name>#OO) | Check if the identified users ('_SYS_DI*' users or SAP HANA DI
container technical users) still need SQL access. If not, disable SQL access
for these users and deactivate the users. |
102 | Existence of system
database backup | Determines whether
or not a system database backup exists. Without a system database backup,
your system cannot be recovered | Perform a backup of the system database as soon as possible. |
103 | Usage of deprecated
features | Determines if any
deprecated features were used in the last interval | Check
the view M_FEATURE_USAGE to see which features were used. See SAP
Note 2425002 |
104 | System replication:
increased log shipping backlog | Monitors log
shipping backlog. Alert is raised when threshold value is reached (priority
dependent on threshold values). | To identify the reason for the increased system replication log
shipping backlog, check the status of the secondary system. Possible causes
for the increased system replication log shipping backlog can be a slow
network performance, connection problems, or other internal issues. |
105 | Total Open
Transactions Check | Monitors the number
of open transactions per service | Double check if the application is closing the connection correctly,
and whether the high transaction load on the system is expected. |
106 | ASYNC replication
in-memory buffer overflow | Checks if local
in-memory buffer in ASYNC replication mode runs full | Check buffer size, peak loads, network, IO on secondary |
107 | Inconsistent
fallback snapshot | Checks for
inconsistent fallback snapshots | Drop the inconsistent snapshot |
108 | Old fallback
snapshot | Checks for out of
date fallback snapshots (older than the defined thresholds) | Check the age and possibly drop the snapshot |
109 | Backup history
broken | Checks if the
backup history is incomplete or inconsistent (the log_mode is internally set
to overwrite, it is not ensured that the service is fully recoverable via
backup) | Perform a data backup as soon as possible to ensure that the service
is fully recoverable |
110 | Catalog Consistency | An alert is raised
if the Catalog Consistency Check detects errors (identifies the number of
errors and affected objects) | Contact SAP Support |
111 | Replication status
of replication log | Check whether the
status of replication log is disable | Truncate replication log table and enable replication log |
112 | Missing STONITH
with shared storage | Check whether a
STONITH provider is configured in a scale-out system with shared basepaths | |
113 | Open file count | Determines what
percentage of total open file handles are in use. All processes are
considered, including non-SAP HANA processes. | You can configure the Linux file-max parameter using the following
commands: |
| |||
1. Check the current maximum number of allowed file handles: | |||
| |||
cat /proc/sys/fs/ file-max | |||
| |||
2. Extend maximum number of file handles in /etc/ sysctl.conf: | |||
| |||
fs.file-max = 20000000 | |||
| |||
3. Activate changes for operating system: | |||
| |||
sysctl -p /etc/ sysctl.conf | |||
114 | Active async IO
count | Determines what
percentage of total asynchronous input/ output requests are in use. All
processes are considered, including non-SAP HANA processes. | |
115 | Timezone
environment variable verification | Determines if the
timezone environment variable TZ can be interpreted. See M_TIMEZONE_ALERTS.
Otherwise HANA falls back to a system call which can cause signifi-cant
performance issues | |
116 | Transparent huge
pages status | Determines if
Transparent Huge Pages (THP) are activated which can cause issues for the
database | Deactivate
THP by setting the kernel parameter to "[never]". See SAP
Note 2031375 |
118 | Port ephemeral max
count | Checks for free
local ports by referring to following kernel parameters: net.ipv4.ip_local_p
ort_range and net.ipv4.ip_local_r eserved_ports. The alert is raised if the
number of free local ports is below the configured minimum. | Increase the local port range by setting profile parameters as
described in SAP Note 401162, or free ports which do not have to be
reserved. The number of free local ports and ports which should be reserved
for HANA services can be found in the system view M_HOST_INFORMATION, keys:
net_port_ephemeral_ max_count and net_port_ranges. See also Linux
kernel parameters in the SAP HANA System Administration Guide. |
119 | Required local SAP
HANA port ranges | Checks for local
ports which are required but which have not been reserved (see Alert 118). If
not reserved there is a risk that these ports could be automatically assigned
to other applications. The alert is raised if any ports in the local ports
range are not reserved. | Reserve ports by setting profile parameters as described in SAP
Note 401162. Details of unreserved ports can be retrieved from the
system view M_HOST_INFORMATION key: net_port_unreserved _ranges. See
also Linux kernel parameters in the SAP HANA System Administration
Guide. |
128 | LDAP Enabled Users
without SSL | Checks for the
vulnerability where users may be enabled for LDAP Authentication but SSL is
not enabled | |
129 | Check trusted
certificate expiration date | Determines if there
are any trusted certificates that will expire soon or have already expired | |
130 | Check own
certificate expiration date | Determines if there
are any own or chained certificates that will expire soon or have already
expired | |
131 | Session requests
queued by admission control | Determines the
number of session requests waiting in the admission control queue. This can
indicate an issue with the response time of the request | Investigate why the session requests newly queued by admission
control. Refer to M_ADMISSION_CONTROL_EVENTS for more information |
132 | Session requests
rejected by admission control | Determines the
number of session requests newly rejected by admission control. This can
indicate an issue with the availability of the database | Investigate why the session requests newly rejected by admission
control. Refer to M_ADMISSION_CONTROL_EVENTS for more information |
135 | Checks configuration
for SAP HANA SLD Data Supplier | For system
replication, virtual and physical databases must be configured to send the
correct landscape data to the Landscape Management Database via the System
Landscape Directory (SLD) | |
136 | Unsupported
Parameter Values Set | Checks if
configuration parameters are set to an unsupported value | Correct the values of configuration parameters as necessary |
137 | Restart Required
for Configuration Change | Check if a restart
is required for a configuration change to become effective | If necessary, restart the system |
500 | Dbspace usage | Checks for the
dbspace size usage | Investigate the usage of dbspace and increase the size. |
501 | Dbspace status | Determines whether
or not all dbspaces are available | Investigate why the dbspace is not available. |
502 | Dbspace file status | Determines whether
or not all dbspace files are available | Investigate why the dbspace file is not available. |
600 | Inactive Streaming
applications | Identifies inactive
Streaming applications | Investigate why the Streaming application is inactive, for example, by
checking the Streaming application's trace files. |
601 | Inactive Streaming
project managed adapters | Identifies inactive
Streaming project managed adapters | Investigate why the Streaming project managed adapter is inactive, for
example, by checking the trace files. |
602 | Streaming project
physical memory usage | Determines what
percentage of total physical memory available on the host is used for the
streaming project | Investigate memory usage of the streaming project. |
603 | Streaming project
CPU usage | Determines the
percentage CPU usage for a streaming project on the host and therefore
whether or not CPU resources are running out | Investigate CPU usage. |
604 | Number of
publishers of streaming project | Identify the large
publishers of streaming project. Make sure that they will not break the
streaming project | Investigate whether these publishers are created intentionally. |
605 | Number of
subscribers of streaming project | Identify the large
subscribers of streaming project. Make sure that they will not break the
streaming project | Investigate whether these subscribers are created intentionally. |
606 | Row throughput of
subscriber of streaming project | Identify which
subscriber of streaming project has low throughput measured in rows per
second | Investigate why the subscriber works slowly. |
607 | Transaction
throughput of subscriber of streaming project | Identify which
subscriber of streaming project has transaction throughput measured in
transactions per second | Investigate why the subscriber works slowly. |
608 | Row throughput of
publisher of streaming project | Identify which
publisher of streaming project has low throughput measured in rows per second | |
609 | Transaction
throughput of publisher of streaming project | Identify which
publisher of streaming project has transaction throughput measured in
transactions per second | |
610 | Bad rows of project
managed adapter | Identify which
project managed adapter has much rows with error | Investigate why the adapter has such much rows with error. |
611 | High latency of
project managed adapter | Identify which
project managed adapter has high latency | Investigate why the adapter has high latency. |
612 | Large queue of
stream of streaming project | Identify which
stream of streaming project has large queue | Investigate why the stream has large queue. |
613 | Large store of
stream of streaming project | Identify which
stream of streaming project has large store | Investigate why the stream has large store. |
700 | Agent availability | Determines how many
minutes the agent has been inactive | Investigate connection of agent and check if agent is up and running. |
701 | Agent memory usage | Determines what
percentage of total available memory on agent is used | Investigate which adapter or processes use a lot of memory. |
710 | Remote Subscription
exception | Checks for recent
exceptions in remote subscriptions and remote sources | Investigate the error message and the error code and restart the
remote subscription if necessary. |
EmoticonEmoticon
Note: only a member of this blog may post a comment.