-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
enhancementNew feature or requestNew feature or request
Description
@maxstack @bnlawrence @davidhassell as you folks are already aware, we have an issue when returning larger payloads that are produced when returning a stat with axis, those payloads can take up quite a bit of memory, once unpacked by cbor. I've done a number of tests whereby I "hijack" the payload right before decoding it, using the native cbor.CBORDecoder() function, that allows me to manipulate the payload right before decoding it, as such:
def object_hook(decoder, obj):
if isinstance(obj, dict):
# converting to arrays doesn't really impact much
# apart from, of cource, counts which is HUGE - even so, this below works
# but only for when counts is not needed
# dict_keys(['bytes', 'dtype', 'shape', 'count', 'byte_order'])
# [<class 'bytes'>, <class 'str'>, <class 'list'>, <class 'list'>, <class 'str'>]
new_obj = {}
new_obj["bytes"] = obj["bytes"]
new_obj["dtype"] = obj["dtype"]
new_obj["count"] = [] # np.array(obj["count"]): worse than list
new_obj["shape"] = obj["shape"]
new_obj["dtype"] = obj["dtype"]
new_obj["byte_order"] = obj["byte_order"]
return new_obj
def decode_result(response):
"""Decode a successful response, return as a 2-tuple of (numpy array or scalar, count)."""
decoder = cbor.CBORDecoder(BytesIO(response.content), object_hook=object_hook)
reduction_result = decoder.decode_from_bytes(response.content)
dtype = reduction_result['dtype']
shape = reduction_result['shape'] if "shape" in reduction_result else None
# Result
result = np.frombuffer(reduction_result['bytes'], dtype=dtype)
result = result.reshape(shape)
# Counts
count = reduction_result['count']
# TODO: When reductionist is ready, we need to fix 'count'
# Mask the result
result = np.ma.masked_where(count == 0, result)
return result, countobject_hook() is a callable, that allows to du stuff with the cbor obj, in this case I am keeping the original obj apart from making the count list zero-len.
Conclusions
countlist (obj["count"]) is the MAIN memory consumer - for many cases when setting it to zero len (stats that don't need counts) can buy us 30-40% memory- trying to manipulate it in the
object_hookcallable eg convert to Numpy array had no, or mildly adverse effect since the interpreter actually loads it in memory so it can convert it - no lazy conversion - we could try smarter things with Dask for such a conversion, but I don't think we can get anywhere in a significant way - it'd be great if Reductionist encoded it differently - perhaps an array or Bytes struct, so it doesn't decode to a list
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request