Coverage for watcher/decision_engine/model/collector/base.py: 88%
56 statements
« prev ^ index » next coverage.py v7.8.2, created at 2025-06-17 12:22 +0000
« prev ^ index » next coverage.py v7.8.2, created at 2025-06-17 12:22 +0000
1# -*- encoding: utf-8 -*-
2# Copyright (c) 2015 b<>com
3#
4# Authors: Jean-Emile DARTOIS <jean-emile.dartois@b-com.com>
5# Vincent FRANCOISE <vincent.francoise@b-com.com>
6#
7# Licensed under the Apache License, Version 2.0 (the "License");
8# you may not use this file except in compliance with the License.
9# You may obtain a copy of the License at
10#
11# http://www.apache.org/licenses/LICENSE-2.0
12#
13# Unless required by applicable law or agreed to in writing, software
14# distributed under the License is distributed on an "AS IS" BASIS,
15# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
16# implied.
17# See the License for the specific language governing permissions and
18# limitations under the License.
19#
21"""
22A :ref:`Cluster Data Model <cluster_data_model_definition>` (or CDM) is a
23logical representation of the current state and topology of the :ref:`Cluster
24<cluster_definition>` :ref:`Managed resources <managed_resource_definition>`.
26It is represented as a set of :ref:`Managed resources
27<managed_resource_definition>` (which may be a simple tree or a flat list of
28key-value pairs) which enables Watcher :ref:`Strategies <strategy_definition>`
29to know the current relationships between the different :ref:`resources
30<managed_resource_definition>` of the :ref:`Cluster <cluster_definition>`
31during an :ref:`Audit <audit_definition>` and enables the :ref:`Strategy
32<strategy_definition>` to request information such as:
34- What compute nodes are in a given :ref:`Audit Scope
35 <audit_scope_definition>`?
36- What :ref:`Instances <instance_definition>` are hosted on a given compute
37 node?
38- What is the current load of a compute node?
39- What is the current free memory of a compute node?
40- What is the network link between two compute nodes?
41- What is the available bandwidth on a given network link?
42- What is the current space available on a given virtual disk of a given
43 :ref:`Instance <instance_definition>` ?
44- What is the current state of a given :ref:`Instance <instance_definition>`?
45- ...
47In a word, this data model enables the :ref:`Strategy <strategy_definition>`
48to know:
50- the current topology of the :ref:`Cluster <cluster_definition>`
51- the current capacity for each :ref:`Managed resource
52 <managed_resource_definition>`
53- the current amount of used/free space for each :ref:`Managed resource
54 <managed_resource_definition>`
55- the current state of each :ref:`Managed resources
56 <managed_resource_definition>`
58In the Watcher project, we aim at providing a some generic and basic
59:ref:`Cluster Data Model <cluster_data_model_definition>` for each :ref:`Goal
60<goal_definition>`, usable in the associated :ref:`Strategies
61<strategy_definition>` through a plugin-based mechanism which are called
62cluster data model collectors (or CDMCs). These CDMCs are responsible for
63loading and keeping up-to-date their associated CDM by listening to events and
64also periodically rebuilding themselves from the ground up. They are also
65directly accessible from the strategies classes. These CDMs are used to:
67- simplify the development of a new :ref:`Strategy <strategy_definition>` for a
68 given :ref:`Goal <goal_definition>` when there already are some existing
69 :ref:`Strategies <strategy_definition>` associated to the same :ref:`Goal
70 <goal_definition>`
71- avoid duplicating the same code in several :ref:`Strategies
72 <strategy_definition>` associated to the same :ref:`Goal <goal_definition>`
73- have a better consistency between the different :ref:`Strategies
74 <strategy_definition>` for a given :ref:`Goal <goal_definition>`
75- avoid any strong coupling with any external :ref:`Cluster Data Model
76 <cluster_data_model_definition>` (the proposed data model acts as a pivot
77 data model)
79There may be various :ref:`generic and basic Cluster Data Models
80<cluster_data_model_definition>` proposed in Watcher helpers, each of them
81being adapted to achieving a given :ref:`Goal <goal_definition>`:
83- For example, for a :ref:`Goal <goal_definition>` which aims at optimizing
84 the network :ref:`resources <managed_resource_definition>` the :ref:`Strategy
85 <strategy_definition>` may need to know which :ref:`resources
86 <managed_resource_definition>` are communicating together.
87- Whereas for a :ref:`Goal <goal_definition>` which aims at optimizing thermal
88 and power conditions, the :ref:`Strategy <strategy_definition>` may need to
89 know the location of each compute node in the racks and the location of each
90 rack in the room.
92Note however that a developer can use his/her own :ref:`Cluster Data Model
93<cluster_data_model_definition>` if the proposed data model does not fit
94his/her needs as long as the :ref:`Strategy <strategy_definition>` is able to
95produce a :ref:`Solution <solution_definition>` for the requested :ref:`Goal
96<goal_definition>`. For example, a developer could rely on the Nova Data Model
97to optimize some compute resources.
99The :ref:`Cluster Data Model <cluster_data_model_definition>` may be persisted
100in any appropriate storage system (SQL database, NoSQL database, JSON file,
101XML File, In Memory Database, ...). As of now, an in-memory model is built and
102maintained in the background in order to accelerate the execution of
103strategies.
104"""
106import abc
107import copy
108import threading
109import time
111from oslo_config import cfg
112from oslo_log import log
114from watcher.common import clients
115from watcher.common.loader import loadable
116from watcher.decision_engine.model import model_root
118LOG = log.getLogger(__name__)
119CONF = cfg.CONF
122class BaseClusterDataModelCollector(loadable.LoadableSingleton,
123 metaclass=abc.ABCMeta):
125 STALE_MODEL = model_root.ModelRoot(stale=True)
127 def __init__(self, config, osc=None):
128 super(BaseClusterDataModelCollector, self).__init__(config)
129 self.osc = osc if osc else clients.OpenStackClients()
130 self.lock = threading.RLock()
131 self._audit_scope_handler = None
132 self._cluster_data_model = None
133 self._data_model_scope = None
135 @property
136 def cluster_data_model(self):
137 if self._cluster_data_model is None:
138 self.lock.acquire()
139 self._cluster_data_model = self.execute()
140 self.lock.release()
142 return self._cluster_data_model
144 @cluster_data_model.setter
145 def cluster_data_model(self, model):
146 self.lock.acquire()
147 self._cluster_data_model = model
148 self.lock.release()
150 @property
151 @abc.abstractmethod
152 def notification_endpoints(self):
153 """Associated notification endpoints
155 :return: Associated notification endpoints
156 :rtype: List of :py:class:`~.EventsNotificationEndpoint` instances
157 """
158 raise NotImplementedError()
160 def set_cluster_data_model_as_stale(self):
161 self.cluster_data_model = self.STALE_MODEL
163 @abc.abstractmethod
164 def get_audit_scope_handler(self, audit_scope):
165 """Get audit scope handler"""
166 raise NotImplementedError()
168 @abc.abstractmethod
169 def execute(self):
170 """Build a cluster data model"""
171 raise NotImplementedError()
173 @classmethod
174 def get_config_opts(cls):
175 return [
176 cfg.IntOpt(
177 'period',
178 default=3600,
179 help='The time interval (in seconds) between each '
180 'synchronization of the model'),
181 ]
183 def get_latest_cluster_data_model(self):
184 LOG.debug("Creating copy")
185 LOG.debug(self.cluster_data_model.to_xml())
186 return copy.deepcopy(self.cluster_data_model)
188 def synchronize(self):
189 """Synchronize the cluster data model
191 Whenever called this synchronization will perform a drop-in replacement
192 with the existing cluster data model
193 """
194 self.cluster_data_model = self.execute()
197class BaseModelBuilder(object):
199 def call_retry(self, f, *args, **kwargs):
200 """Attempts to call external service
202 Attempts to access data from the external service and handles
203 exceptions. The retrieval should be retried in accordance
204 to the value of api_call_retries
205 :param f: The method that performs the actual querying for metrics
206 :param args: Array of arguments supplied to the method
207 :param kwargs: The amount of arguments supplied to the method
208 :return: The value as retrieved from the external service
209 """
211 num_retries = CONF.collector.api_call_retries
212 timeout = CONF.collector.api_query_timeout
214 for i in range(num_retries): 214 ↛ 224line 214 didn't jump to line 224 because the loop on line 214 didn't complete
215 try:
216 return f(*args, **kwargs)
217 except Exception as e:
218 LOG.exception(e)
219 self.call_retry_reset(e)
220 LOG.warning("Retry %d of %d, error while calling service "
221 "retry in %s seconds",
222 i+1, num_retries, timeout)
223 time.sleep(timeout)
224 raise
226 @abc.abstractmethod
227 def call_retry_reset(self, exc):
228 """Attempt to recover after encountering an error
230 Recover from errors while calling external services, the exception
231 can be used to make a better decision on how to best recover.
232 """
233 pass
235 @abc.abstractmethod
236 def execute(self, model_scope):
237 """Build the cluster data model limited to the scope and return it
239 Builds the cluster data model with respect to the supplied scope. The
240 schema of this scope will depend on the type of ModelBuilder.
241 """
242 raise NotImplementedError()