Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory.enable_memory_arena_shrinkage is not working in python #23339

Open
vsbaldeev opened this issue Jan 13, 2025 · 1 comment
Open

memory.enable_memory_arena_shrinkage is not working in python #23339

vsbaldeev opened this issue Jan 13, 2025 · 1 comment

Comments

@vsbaldeev
Copy link

Describe the issue

I'm trying to reduce the memory consumption of a process using onnxruntime InferenceSession. To achieve this, I call InferenceSession.run with memory.enable_memory_arena_shrinkage, but it doesn't seem to have any effect. How do I do this in python?

To reproduce

import uuid

import onnxruntime
import psutil
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import StringTensorType
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
import numpy

def create_trained_model():
    x = [str(uuid.uuid4()) for _ in range(100)]
    y = ["a" for _ in range(50)] + ["b" for _ in range(50)]

    pipeline = Pipeline(
        steps=[
            ("vectorizer", CountVectorizer()),
            ("classifier", RandomForestClassifier())
        ]
    )

    pipeline.fit(x, y)
    onnx_model_proto_string = convert_sklearn(
        pipeline,
        initial_types=[('features', StringTensorType((None,)))],
        verbose=False
    ).SerializeToString()

    return onnxruntime.InferenceSession(onnx_model_proto_string, providers=["CPUExecutionProvider"])



def get_one_prediction(onnx_model, input_data, shrink_arena=False):
    if shrink_arena:
        run_options = onnxruntime.RunOptions()
        run_options.add_run_config_entry("memory.enable_memory_arena_shrinkage", "cpu:0")
    else:
        run_options = None

    return onnx_model.run(
        [onnx_model.get_outputs()[1].name],
        {onnx_model.get_inputs()[0].name: input_data},
        run_options=run_options
    )


def get_used_megabytes():
    return psutil.Process().memory_full_info().uss / (1024 * 1024)


def run_model_many_times(onnx_model, count):
    print("Memory in the beginning = ", get_used_megabytes())

    for i in range(count):
        input_data = numpy.array([str(uuid.uuid4()) for _ in range(40)])

        get_one_prediction(onnx_model, input_data)

        if i % 10000 == 0:
            print("Memory before shrinkage = ", get_used_megabytes())
            get_one_prediction(onnx_model, input_data, True)
            print("Memory after shrinkage = ", get_used_megabytes())

    print("Memory in the end = ", get_used_megabytes())


def main():
    print("Memory before model's creation =  ", get_used_megabytes())
    onnx_model = create_trained_model()
    count = 100000
    run_model_many_times(onnx_model, count)

if __name__ == '__main__':
    main()

This code produce the following output:

Memory before model's creation = 98.734375
Memory in the beginning = 104.390625
Memory before shrinkage = 104.46875
Memory after shrinkage = 104.46875
Memory before shrinkage = 104.859375
Memory after shrinkage = 104.859375
Memory before shrinkage = 104.859375
Memory after shrinkage = 104.859375
Memory before shrinkage = 104.90625
Memory after shrinkage = 104.90625
Memory before shrinkage = 104.9375
Memory after shrinkage = 104.9375
Memory before shrinkage = 104.953125
Memory after shrinkage = 104.953125
Memory before shrinkage = 104.953125
Memory after shrinkage = 104.953125
Memory before shrinkage = 104.984375
Memory after shrinkage = 104.984375
Memory before shrinkage = 105.109375
Memory after shrinkage = 105.109375
Memory before shrinkage = 105.125
Memory after shrinkage = 105.125
Memory in the end = 105.140625

Urgency

No response

Platform

Linux

OS Version

Ubuntu 20.04.6 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.20.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@yuslepukhin
Copy link
Member

This feature was primarily introduced for GPU memory.
For CPU we recommend disabling the arena all together and see if default allocator does a better job (it often does).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants