Coverage for watcher/decision_engine/model/collector/base.py: 88%

56 statements  

« prev     ^ index     » next       coverage.py v7.8.2, created at 2025-06-17 12:22 +0000

1# -*- encoding: utf-8 -*- 

2# Copyright (c) 2015 b<>com 

3# 

4# Authors: Jean-Emile DARTOIS <jean-emile.dartois@b-com.com> 

5# Vincent FRANCOISE <vincent.francoise@b-com.com> 

6# 

7# Licensed under the Apache License, Version 2.0 (the "License"); 

8# you may not use this file except in compliance with the License. 

9# You may obtain a copy of the License at 

10# 

11# http://www.apache.org/licenses/LICENSE-2.0 

12# 

13# Unless required by applicable law or agreed to in writing, software 

14# distributed under the License is distributed on an "AS IS" BASIS, 

15# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 

16# implied. 

17# See the License for the specific language governing permissions and 

18# limitations under the License. 

19# 

20 

21""" 

22A :ref:`Cluster Data Model <cluster_data_model_definition>` (or CDM) is a 

23logical representation of the current state and topology of the :ref:`Cluster 

24<cluster_definition>` :ref:`Managed resources <managed_resource_definition>`. 

25 

26It is represented as a set of :ref:`Managed resources 

27<managed_resource_definition>` (which may be a simple tree or a flat list of 

28key-value pairs) which enables Watcher :ref:`Strategies <strategy_definition>` 

29to know the current relationships between the different :ref:`resources 

30<managed_resource_definition>` of the :ref:`Cluster <cluster_definition>` 

31during an :ref:`Audit <audit_definition>` and enables the :ref:`Strategy 

32<strategy_definition>` to request information such as: 

33 

34- What compute nodes are in a given :ref:`Audit Scope 

35 <audit_scope_definition>`? 

36- What :ref:`Instances <instance_definition>` are hosted on a given compute 

37 node? 

38- What is the current load of a compute node? 

39- What is the current free memory of a compute node? 

40- What is the network link between two compute nodes? 

41- What is the available bandwidth on a given network link? 

42- What is the current space available on a given virtual disk of a given 

43 :ref:`Instance <instance_definition>` ? 

44- What is the current state of a given :ref:`Instance <instance_definition>`? 

45- ... 

46 

47In a word, this data model enables the :ref:`Strategy <strategy_definition>` 

48to know: 

49 

50- the current topology of the :ref:`Cluster <cluster_definition>` 

51- the current capacity for each :ref:`Managed resource 

52 <managed_resource_definition>` 

53- the current amount of used/free space for each :ref:`Managed resource 

54 <managed_resource_definition>` 

55- the current state of each :ref:`Managed resources 

56 <managed_resource_definition>` 

57 

58In the Watcher project, we aim at providing a some generic and basic 

59:ref:`Cluster Data Model <cluster_data_model_definition>` for each :ref:`Goal 

60<goal_definition>`, usable in the associated :ref:`Strategies 

61<strategy_definition>` through a plugin-based mechanism which are called 

62cluster data model collectors (or CDMCs). These CDMCs are responsible for 

63loading and keeping up-to-date their associated CDM by listening to events and 

64also periodically rebuilding themselves from the ground up. They are also 

65directly accessible from the strategies classes. These CDMs are used to: 

66 

67- simplify the development of a new :ref:`Strategy <strategy_definition>` for a 

68 given :ref:`Goal <goal_definition>` when there already are some existing 

69 :ref:`Strategies <strategy_definition>` associated to the same :ref:`Goal 

70 <goal_definition>` 

71- avoid duplicating the same code in several :ref:`Strategies 

72 <strategy_definition>` associated to the same :ref:`Goal <goal_definition>` 

73- have a better consistency between the different :ref:`Strategies 

74 <strategy_definition>` for a given :ref:`Goal <goal_definition>` 

75- avoid any strong coupling with any external :ref:`Cluster Data Model 

76 <cluster_data_model_definition>` (the proposed data model acts as a pivot 

77 data model) 

78 

79There may be various :ref:`generic and basic Cluster Data Models 

80<cluster_data_model_definition>` proposed in Watcher helpers, each of them 

81being adapted to achieving a given :ref:`Goal <goal_definition>`: 

82 

83- For example, for a :ref:`Goal <goal_definition>` which aims at optimizing 

84 the network :ref:`resources <managed_resource_definition>` the :ref:`Strategy 

85 <strategy_definition>` may need to know which :ref:`resources 

86 <managed_resource_definition>` are communicating together. 

87- Whereas for a :ref:`Goal <goal_definition>` which aims at optimizing thermal 

88 and power conditions, the :ref:`Strategy <strategy_definition>` may need to 

89 know the location of each compute node in the racks and the location of each 

90 rack in the room. 

91 

92Note however that a developer can use his/her own :ref:`Cluster Data Model 

93<cluster_data_model_definition>` if the proposed data model does not fit 

94his/her needs as long as the :ref:`Strategy <strategy_definition>` is able to 

95produce a :ref:`Solution <solution_definition>` for the requested :ref:`Goal 

96<goal_definition>`. For example, a developer could rely on the Nova Data Model 

97to optimize some compute resources. 

98 

99The :ref:`Cluster Data Model <cluster_data_model_definition>` may be persisted 

100in any appropriate storage system (SQL database, NoSQL database, JSON file, 

101XML File, In Memory Database, ...). As of now, an in-memory model is built and 

102maintained in the background in order to accelerate the execution of 

103strategies. 

104""" 

105 

106import abc 

107import copy 

108import threading 

109import time 

110 

111from oslo_config import cfg 

112from oslo_log import log 

113 

114from watcher.common import clients 

115from watcher.common.loader import loadable 

116from watcher.decision_engine.model import model_root 

117 

118LOG = log.getLogger(__name__) 

119CONF = cfg.CONF 

120 

121 

122class BaseClusterDataModelCollector(loadable.LoadableSingleton, 

123 metaclass=abc.ABCMeta): 

124 

125 STALE_MODEL = model_root.ModelRoot(stale=True) 

126 

127 def __init__(self, config, osc=None): 

128 super(BaseClusterDataModelCollector, self).__init__(config) 

129 self.osc = osc if osc else clients.OpenStackClients() 

130 self.lock = threading.RLock() 

131 self._audit_scope_handler = None 

132 self._cluster_data_model = None 

133 self._data_model_scope = None 

134 

135 @property 

136 def cluster_data_model(self): 

137 if self._cluster_data_model is None: 

138 self.lock.acquire() 

139 self._cluster_data_model = self.execute() 

140 self.lock.release() 

141 

142 return self._cluster_data_model 

143 

144 @cluster_data_model.setter 

145 def cluster_data_model(self, model): 

146 self.lock.acquire() 

147 self._cluster_data_model = model 

148 self.lock.release() 

149 

150 @property 

151 @abc.abstractmethod 

152 def notification_endpoints(self): 

153 """Associated notification endpoints 

154 

155 :return: Associated notification endpoints 

156 :rtype: List of :py:class:`~.EventsNotificationEndpoint` instances 

157 """ 

158 raise NotImplementedError() 

159 

160 def set_cluster_data_model_as_stale(self): 

161 self.cluster_data_model = self.STALE_MODEL 

162 

163 @abc.abstractmethod 

164 def get_audit_scope_handler(self, audit_scope): 

165 """Get audit scope handler""" 

166 raise NotImplementedError() 

167 

168 @abc.abstractmethod 

169 def execute(self): 

170 """Build a cluster data model""" 

171 raise NotImplementedError() 

172 

173 @classmethod 

174 def get_config_opts(cls): 

175 return [ 

176 cfg.IntOpt( 

177 'period', 

178 default=3600, 

179 help='The time interval (in seconds) between each ' 

180 'synchronization of the model'), 

181 ] 

182 

183 def get_latest_cluster_data_model(self): 

184 LOG.debug("Creating copy") 

185 LOG.debug(self.cluster_data_model.to_xml()) 

186 return copy.deepcopy(self.cluster_data_model) 

187 

188 def synchronize(self): 

189 """Synchronize the cluster data model 

190 

191 Whenever called this synchronization will perform a drop-in replacement 

192 with the existing cluster data model 

193 """ 

194 self.cluster_data_model = self.execute() 

195 

196 

197class BaseModelBuilder(object): 

198 

199 def call_retry(self, f, *args, **kwargs): 

200 """Attempts to call external service 

201 

202 Attempts to access data from the external service and handles 

203 exceptions. The retrieval should be retried in accordance 

204 to the value of api_call_retries 

205 :param f: The method that performs the actual querying for metrics 

206 :param args: Array of arguments supplied to the method 

207 :param kwargs: The amount of arguments supplied to the method 

208 :return: The value as retrieved from the external service 

209 """ 

210 

211 num_retries = CONF.collector.api_call_retries 

212 timeout = CONF.collector.api_query_timeout 

213 

214 for i in range(num_retries): 214 ↛ 224line 214 didn't jump to line 224 because the loop on line 214 didn't complete

215 try: 

216 return f(*args, **kwargs) 

217 except Exception as e: 

218 LOG.exception(e) 

219 self.call_retry_reset(e) 

220 LOG.warning("Retry %d of %d, error while calling service " 

221 "retry in %s seconds", 

222 i+1, num_retries, timeout) 

223 time.sleep(timeout) 

224 raise 

225 

226 @abc.abstractmethod 

227 def call_retry_reset(self, exc): 

228 """Attempt to recover after encountering an error 

229 

230 Recover from errors while calling external services, the exception 

231 can be used to make a better decision on how to best recover. 

232 """ 

233 pass 

234 

235 @abc.abstractmethod 

236 def execute(self, model_scope): 

237 """Build the cluster data model limited to the scope and return it 

238 

239 Builds the cluster data model with respect to the supplied scope. The 

240 schema of this scope will depend on the type of ModelBuilder. 

241 """ 

242 raise NotImplementedError()