Back to Home

How To Handle Large Lists Or QuerySets In Python

February 24, 2022

This is a good technique for managing memory when working with large lists or QuerySets. However, there are a few changes that could make this code more effective.

First, the batch_qs and batch_list functions could be made more general by using a typing.List or typing.Iterable type hint for the list_arr parameter instead of a list type hint. This would allow them to be used with any iterable type, not just lists.

Second, the batch_list function could be simplified by using the built-in range function to generate the indices for each batch. Instead of using a for loop with a start and end variable, you could use a range object with a step value equal to the batch size.

Here’s an example of how this could be implemented:

from typing import Any, Iterator, List, Iterable def batch_list(list_arr: Iterable[Any], batch_size: int) -> Iterator[Any]: for start in range(0, len(list_arr), batch_size): yield list_arr[start:start+batch_size]

Finally, the batch_qs function could be made more efficient by using the QuerySet’s iterator method to iterate over the queryset in batches. This would avoid loading the entire queryset into memory at once, which can be slow and use a lot of memory for large querysets.

Here’s an example of how this could be implemented:

from django.db.models.query import QuerySet def batch_qs(qs: QuerySet, batch_size: int) -> Iterator[QuerySet]: return qs.iterator(batch_size=batch_size)

Overall, these changes would make the code more general and efficient, allowing it to be used in a wider range of scenarios.

© 2024, Vinit Kumar