Recently, we have encountered the classic situation with which more than one of us feels identified. When we are writing code, they suggest an improvement and we think: Ah, that’s right, we hadn’t thought of that!
Going to the test code, and finding a bunch of data in the given section, can be painful. Data that makes noise, or messes you up more than it should, since most probably your program logic does nothing with it, it just adds it and/or checks that it is in the output.
To give an example, I make up data:
def test_create_report_of_available_products_should_return_available_products(self, config, config_key):
inventory_dataframe = pd.DataFrame(
[
{
"name": "Kalia Vanish!",
"create_date": "irrelevant-date",
"is_out_of_date": "irrelevant-date",
"color": "irrelevant-color",
"id_of_product": 1,
"internal_id": 111,
"barcode": 7777777,
"availability": "Yes"
},
{
"name": "Kalia Vanish!",
"create_date": "irrelevant-date",
"is_out_of_date": "irrelevant-date",
"color": "irrelevant-color",
"id_of_product": 2,
"internal_id": 222,
"barcode": 7777777,
"availability": "No"
},
{
"name": "Kalia Vanish!",
"create_date": "irrelevant-date",
"is_out_of_date": "irrelevant-date",
"color": "irrelevant-color",
"id_of_product": 3,
"internal_id": 333,
"barcode": 7777777,
"availability": "No"
},
]
).astype("string")
expected_report = pd.DataFrame(
[
{
"name": "Kalia Vanish!",
"create_date": "irrelevant-date",
"is_out_of_date": "irrelevant-date",
"color": "irrelevant-color",
"id_of_product": 1,
"internal_id": 111,
"barcode": 7777777,
"availability": "Yes"
}
]).astype("string")
generated_report = create_report_of_available_products_from(inventory_dataframe).
assert_that(generated_report).is_equal_to(expected, check_like=True)
The first thing that comes to mind for sure is:
Uffffffffffff, let’s see what data this object has…?
And you spend the afternoon reading the fields in that set, and trying to understand why all that data is there and what it does.
But really, what your logic is doing, is filtering those products that are available from those that are not. Let’s imagine that, the logic of create_report_of_available_products_from only uses the availability and internal_id properties. The rest of the information is not used in the method, and does not contribute anything to this particular test… But, since “it has always been done this way in the other tests”, we tend to err on the side of copy-paste and it really hurts us.
For the previous case, it is enough to remove from the given, those properties that are not used in the method we want to test:
def test_create_report_of_available_products_should_return_available_products(self, config, config_key):
inventory_dataframe = pd.DataFrame(
[
{
"internal_id": 111,
"availability": "Yes"
},
{
"internal_id": 222,
"availability": "No"
},
{
"internal_id": 333,
"availability": "No"
},
]
).astype("string")
expected_report = pd.DataFrame(
[
{
"internal_id": 111,
"availability": "Yes"
}
]).astype("string")
generated_report = create_report_of_available_products_from(inventory_dataframe).
assert_that(generated_report).is_equal_to(expected, check_like=True)
Much better, isn’t it? But… What about the data we have deleted, where are they? They are 100% necessary for the report! Right, we have to add them to the test, but in a subtle way so that it does not add noise and so that in our test of a reading, we know which fields are the ones we are using:
def test_create_report_of_available_products_should_return_available_products(self, config, config_key):
inventory_dataframe = pd.DataFrame(
[
{
"internal_id": 111,
"availability": "Yes"
},
{
"internal_id": 222,
"availability": "No"
},
{
"internal_id": 333,
"availability": "No"
},
]
).astype("string")
expected_report = pd.DataFrame(
[
{
"internal_id": 111,
"availability": "Yes"
}
]).astype("string")
expected_report_with_extra_data = self._fill_irrelevant_test_fields(expected_report)
generated_report = create_report_of_available_products_from(inventory_dataframe).
assert_that(generated_report).is_equal_to(expected_report_with_extra_data, check_like=True)
The method self._fill_irrelevant_test_fields (you can improve the name, it is a little example) you would create it in the same class of your tests, and there you do something like this:
def _fill_irrelevant_test_fields(self, expected: pd.DataFrame) -> pd.DataFrame:
expected["name"] = "irrelevant-name"
expected["create_date"] = "irrelevant-date"
expected["is_out_of_date"] = "irrelevant-date"
expected["id_of_product"] = "irrelevant-fund-name"
expected["barcode"] = "irrelevant-barcode"
return expected.astype("string")
You think it looks nice but you want to take it a step further? Don’t worry, we can create a builder for our objects:
def ItemBuilder(
name: str = "irrelevant-name",
create_date: str = "irrelevant-create_date",
is_out_of_date: bool = False,
id_of_product: str = "irrelevant-id_of_product",
barcode: str = "irrelevant-barcode",
internal_id: str = "irrelevant-internal_id",
availability: bool = False
) -> dict:
return {
"name": name,
"create_date": create_date,
"is_out_of_date": is_out_of_date,
"id_of_product": id_of_product,
"barcode": barcode,
"internal_id": internal_id,
"availability": availability
}
Thus, our test would finally look like this:
def test_create_report_of_available_products_should_return_available_products(self, config, config_key):
inventory_dataframe = pd.DataFrame(
[
ItemBuilder(internal_id="111", availability="Yes"),
ItemBuilder(internal_id="222", availability="No"),
ItemBuilder(internal_id="333", availability="No")
])
expected_report = pd.DataFrame(
[
ItemBuilder(internal_id="111", availability="Yes")
])
generated_report = create_report_of_available_products_from(inventory_dataframe).
assert_that(generated_report).is_equal_to(expected_report, check_like=True)
What do we achieve with this? We remove the self._fill_irrelevant_test_fields method, because our ItemBuilder adds by default all the necessary fields of the object with an irrelevant value. In our test, when calling the ItemBuilder, we pass it only those fields that we want to use and we are left with a clean, tidy, functional and easily readable test.
Many thanks to Sara Revilla, for suggesting in the original article a few improvements and the final part of the ItemBuilder.
It is very cool this little tip that our colleagues taught us in Clarity.AI, after seeing our tests. The truth is that it helps a lot when you enter to see someone’s code, and that everything is clean, separated and with just enough and necessary for you to understand it.
Do you want more? We invite you to subscribe to our newsletter to get the most relevan articles.
If you enjoy reading our blog, could you imagine how much fun it would be to work with us? let's do it!
But wait a second 🖐 we've got a conflict here. Newsletters are often 💩👎👹 to us. That's why we've created the LEAN LIST, the first zen, enjoyable, rocker, and reggaetoner list of the IT industry. We've all subscribed to newsletters beyond our limits 😅 so we are serious about this.