Django MPTT efficiently serializing relational data with DRF

2024/9/29 0:49:11

I have a Category model that is a MPTT model. It is m2m to Group and I need to serialize the tree with related counts, imagine my Category tree is this:

Root (related to 1 group)- Branch (related to 2 groups) - Leaf (related to 3 groups)
...

So the serialized output would look like this:

{ id: 1, name: 'root1', full_name: 'root1',group_count: 6,children: [{id: 2,name: 'branch1',full_name: 'root1 - branch1',group_count: 5,children: [{id: 3,name: 'leaf1',full_name: 'root1 - branch1 - leaf1',group_count: 3,children: []}]}]
}

This is my current super inefficient implementation:

Model

class Category(MPTTModel):name = ...parent = ... (related_name='children')def get_full_name(self):names = self.get_ancestors(include_self=True).values('name')full_name = ' - '.join(map(lambda x: x['name'], names))return full_namedef get_group_count(self):cats = self.get_descendants(include_self=True)return Group.objects.filter(categories__in=cats).count()

View

class CategoryViewSet(ModelViewSet):def list(self, request):tree = cache_tree_children(Category.objects.filter(level=0))serializer = CategorySerializer(tree, many=True)return Response(serializer.data)

Serializer

class RecursiveField(serializers.Serializer):def to_native(self, value):return self.parent.to_native(value)class CategorySerializer(serializers.ModelSerializer):children = RecursiveField(many=True, required=False)full_name = serializers.Field(source='get_full_name')group_count = serializers.Field(source='get_group_count')class Meta:model = Categoryfields = ('id', 'name', 'children', 'full_name', 'group_count')

This works but also hits the DB with an insane number of queries, also there's additional relations, not just Group. Is there a way to make this efficient? How can I write my own serializer?

Answer

You are definitely running into a N+1 query issue, which I have covered in detail in another Stack Overflow answer. I would recommend reading up on optimizing queries in Django, as this is a very common issue.

Now, Django MPTT also has a few problems that you are going to need to work around as far as N+1 queries. Both the self.get_ancestors and self.get_descendants methods create a new queryset, which in your case happens for every object that you are serializing. You may want to look into a better way to avoid these, I've described possible improvements below.

In your get_full_name method, you are calling self.get_ancestors in order to generate the chain that is being used. Considering you always have the parent when you are generating the output, you may benefit from moving this to a SerializerMethodField that reuses the parent object to generate the name. Something like the following may work:

class RecursiveField(serializers.Serializer):def to_native(self, value):return CategorySerializer(value, context={"parent": self.parent.object, "parent_serializer": self.parent})class CategorySerializer(serializers.ModelSerializer):children = RecursiveField(many=True, required=False)full_name = SerializerMethodField("get_full_name")group_count = serializers.Field(source='get_group_count')class Meta:model = Categoryfields = ('id', 'name', 'children', 'full_name', 'group_count')def get_full_name(self, obj):name = obj.nameif "parent" in self.context:parent = self.context["parent"]parent_name = self.context["parent_serializer"].get_full_name(parent)name = "%s - %s" % (parent_name, name, )return name

You may need to edit this code slightly, but the general idea is that you don't always need to get the ancestors because you will have the ancestor chain already.

This doesn't solve the Group queries, which you may not be able to optimize, but it should at least reduce your queries. Recursive queries are incredibly difficult to optimize, and they usually take a lot of planning to figure out how you can best get the required data without falling back to N+1 situations.

https://en.xdnf.cn/q/71194.html

Related Q&A

Psycopg2: module object has no attribute connect [duplicate]

This question already has answers here:Importing a library from (or near) a script with the same name raises "AttributeError: module has no attribute" or an ImportError or NameError(4 answers…

matplotlib text not clipped

When drawing text in matplotlib with text(), and then interactively panning the image, the resulting drawn text is not clipped to the data window. This is counter to how plotting data or drawing text …

how to make child class call parent class __init__ automatically?

i had a class called CacheObject,and many class extend from it.now i need to add something common on all classes from this class so i write thisclass CacheObject(object):def __init__(self):self.updated…

Creating a dataframe in pandas by multiplying two series together

Say I have two series in pandas, series A and series B. How do I create a dataframe in which all of those values are multiplied together, i.e. with series A down the left hand side and series B along t…

UnicodeDecodeError in PyCharm debugger

Its a reference to UnicodeDecodeError while using cyryllic .I have same problem with Python 3.3 and Pycharm 2.7.2 Tryed to hardcode encoding in code, manually specifying encoding in Pycharm options, bu…

Scipy griddata with linear and cubic yields nan

the following code should produce griddata. But in case I choose as interpolation type cubic or linear I am getting nans in the z grid. Wen im choosing nearest everything is running fine. Here is an ex…

Clone a module and make changes to the copy

Is it possible to copy a module, and then make changes to the copy? To phrase another way, can I inherit from a module, and then override or modify parts of it?

AWS Lambda, Python, Numpy and others as Layers

I have been going at this for a while trying to get python, numpy and pytz added to AWS Lambda as Layers rather than having to zip and throw it at AWS with my .py file. I was able to follow multiple tu…

Is there a way to check if a module is being loaded by multiprocessing standard module in Windows?

I believe on Windows, because there is no fork, the multiprocessing module reloads modules in new Pythons processes. You are required to have this code in your main script, otherwise very nasty crashes…

Condas solving environment takes forever

I am using conda since one year, since several weeks, whenever I want to install a package using conda install -c anaconda <package_name>, for any package, it is just stuck at the Solving environ…