How To Handle Large Lists Or QuerySets In Python
February 24, 2022
This is a good technique for managing memory when working with large lists or QuerySets. However, there are a few changes that could make this code more effective.
First, the batch_qs and batch_list functions could be made more general by using a typing.List or typing.Iterable type hint for the list_arr parameter instead of a list type hint. This would allow them to be used with any iterable type, not just lists.
Second, the batch_list function could be simplified by using the built-in range function to generate the indices for each batch. Instead of using a for loop with a start and end variable, you could use a range object with a step value equal to the batch size.
Here’s an example of how this could be implemented:
from typing import Any, Iterator, List, Iterable
def batch_list(list_arr: Iterable[Any], batch_size: int) -> Iterator[Any]:
for start in range(0, len(list_arr), batch_size):
yield list_arr[start:start+batch_size]
Finally, the batch_qs function could be made more efficient by using the QuerySet’s iterator method to iterate over the queryset in batches. This would avoid loading the entire queryset into memory at once, which can be slow and use a lot of memory for large querysets.
Here’s an example of how this could be implemented:
from django.db.models.query import QuerySet
def batch_qs(qs: QuerySet, batch_size: int) -> Iterator[QuerySet]:
return qs.iterator(batch_size=batch_size)
Overall, these changes would make the code more general and efficient, allowing it to be used in a wider range of scenarios.
I’m a Software Engineer passionate about solving problems and pushing tech boundaries. When I’m not coding, I enjoy reading, listening to music, appreciating art, and having a good cup of coffee. Check out my latest resume and Github profile, and I hope you enjoy reading my essays.
You can connect with me on twitter at @vinitkme. or drop me an email at mail@vinitkumar.me