Even Easier Testing with Data Seeding in Django Web Applications

Muhamad Yoga Mahendra
6 min readJun 8, 2021
Software Testing 230167 Vector Art at Vecteezy

This article will deliver its points from the point of view of a Django web framework, which is associated to some real cases in the development of Crowd+ by PPL Nice PeoPLe team.

In Software Development, the use of tests — ranging from modularly TestCase to Integration Tests — for detecting and reducing errors has improved exponentially over the last few years. One of the more significant improvement is the ability to automate initial data preparations for testing or running in production. Why is this an improvement? What’s the difference between creating specific SetUp functions which basically do the exact same thing? We’ll find out below.

Automatic Data Seeding

Automatic Data Seeding is the process of making “seeds” or initial data preparations, automated (pretty self-explanatory right?). Here’s some major advantage when we switch to Automatic Data Seeding over Manual Data Seeding:

Enables Centralized and Synchronized Dataset

Usually, web applications uses a complex model structure, which consists of a dependency tree. In the case of Crowd+ application, the model structure isn’t that complex, but still it has some impacts which made automatic data seeding viable. Here’s an illustration made by my teammate which perfectly describes the model structure:

Automate Data Seeding in Tests for Django Applications | by Rafi Muhammad Daffa | Jun, 2021 | Medium

In the above illustration, we have a model dependency diagram, where both Annotator and Project Supplier are dependent on User model to exist. A project is dependent on Project Supplier, which is also dependent on User model. This means whenever a new test for Project model, to actually test it we need to create the User -> Project Supplier models and dependency first.

The following snippet is exactly the case that I’ve mentioned just now.

Model Testing before using Automated Data Seeding

Before we start doing tests, we need to create all of it’s dependency first. Although in TestCase this isn’t really that troublesome, in larger testing area it’ll become a real problem in no time.

Supplies tests with reliable initial data

This initial data can be used for further testing purposes (Test Databases are automatically destroyed after testing stage of the deployment so it’s more secure). In the previous part we’ve talked about creating dependencies first before the actual to-be-tested model. Those are just a small part of the whole web application, so there’s actually much more that requires testing. Imagine if the Project had like, 10 different state and features which each requires extensive testing. For each of those cases, we need to duplicate all that data preparation in SetUp(), 10 times. A direct violation to the DRY (Don’t Repeat Yourself) Principle, I’d say.

Enables higher elaboration and communication capabilities

We’re using Google Sheets to maintain the data seeds, and as such collaborative effort to work upon the seeds is very easy. We can also communicate on-the-fly within the sheets.

Me and fellow Developer collaboratively creating the data seeds while communicating inside the sheets itself.

This real-time communication, combined with Google Sheet’s powerful tools let’s us finish the data seeds faster. We can also discuss things related to the data seeds, and adjust accordingly.

Automating Data Seeding to Django

So we know the advantages, what it’s capable of and some actual examples. Actually, Django has already implemented the automatic data seeding mechanism such that we only need to configure only a tiny bit of files. Django uses Fixtures, for the data seeding. They are a special data files which is specifically formatted for Django to mass-import into the chosen database. Our Crowd+ application uses the JSON format for the fixtures.

Wow, that’s actually nice, but how do we actually do it? Well, the process can be broken down into a few steps:

1. Create the dataset/model

Since Django provides all the tools for us, we should use it. Firstly we need to create either the data seeds first, or the Django model first. Both must be made before continuing to step 2 (Doesn’t really matter which one is worked on first). Our developer team chooses to create the Django model first, then the data seeds.

Note that the formatting and content of Fixture files depends on the application requirements. In our case, we use the Django’s ORM to simplify the fixture creation.

After finishing both, make sure that they match (no unknown fields, data types, or relations). Then we can migrate the Django models, this command will “migrate” the model we’ve just made into the database.

python manage.py makemigrations
python manage.py migrate

If everything went smoothly, then it’s time to do step 2.

2. Export the dataset

Again, since Django provides all the tools for us, we should use it. If you don’t want to store the previous database state, and wanted to start fresh with the fixtures you made at step 1, proceed to step 3.

Django provides us with the dumpdata function to export database (the whole database, or a specific model). With it we can export existing data and integrate it with our data seeds. The command to do this is:

python manage.py dumpdata --format <output file format> --indent <output indentation level, 4 is python's default> --output <output file name> <application name>.<model name># Example:
python manage.py dumpdata --format json --indent 4 --output project.json repository.project

By default the Fixtures are located at /projectroot/application/fixtures folder. This is especially troublesome since our project has many functionality which can act as a specific application. We can change this behaviour by modifying the settings.py file (the Django project settings file) and add the following code:

# Test Fixtures
FIXTURE_DIRS = [
BASE_DIR / "assets" / "fixtures"
]

This tells the Django that all fixtures are to be loaded from this specific folder.

3. Load the Fixtures

Django Test Cases are automatically able to use fixtures right off the bat, however it will never try to load fixtures unless specified. This allows us to only load specific fixtures for specific testing purpose. To load a fixture, firstly create the test class, and insert a fixtures variable, whose value is a list that contains all the fixtures we want to load. For example we want to create tests for the Project model. In the previous diagram we know that the dependency is User -> ProjectSupplier -> Project, and Annotator which can be registered to the project. So we would do the following:

Load data seeds from fixtures

In our Crowd+ application, we separate each model’s fixture into a different dedicated file. This enables us to only load fixtures that we need. The load order also matters, so the user fixture should always be first (followed by annotator / project supplier) and then everything else, according to their own dependencies.

4. Use the Fixtures

When the test is run, and if there’s no problem, we can say that the fixture usage is successful and we now can use those loaded fixtures through the whole project. Here’s an example taken from our ProjectTestCase class which already implements automatic data seeding:

Looks very clean, and we don’t actually need to create new user model, new project supplier model, only import fixtures and voila you’re done!

Conclusion

  • Automatic Data Seeding is a technique for increasing application testing and deployment effectiveness, reduces repetitive code, provides reliable test and starting data, and many more.
  • Maximize effectiveness by utilizing every provided tools and resources.
  • Automatic Data Seeding might require similar efforts when compared to Manual Data Seeding, but Automatic Data Seeding wins in the longer run due to maintainability, scalability, flexibility and capability wise.

See you later :D

--

--