Dataclasses in Python
Published: Wed Oct 25 2023
Dataclasses are a kind of data structure using classes (classes that hold data). Let’s see how to use dataclasses in this byte!
Dataclasses
Let me define a class named User
and a constructor that will create a member name
and assign the name
argument to it.
class User:
def __init__(self, name):
self.name = name
When I create an object user
of the class User
and try to print the object, the default string representation (the class name and the memory address of the object) will be printed.
user = User("Roopesh")
print(user)
>> <__main__.User object at 0x7f53c09c6510>
The task is to print a meaningful statement. To tackle this, the __str__
method can be defined in the class like this:
class User:
def __init__(self, name):
self.name = name
def __str__(self):
return f"[User] {self.name}"
user = User("Roopesh")
print(user)
>> [User] Roopesh
As you can see above, a full message is printed. Alright, the next task is related to comparing two objects. If I try to create another user object with the same name like this user1 = User("Roopesh")
and compare it with the user
object like this print(user == user1)
then I’ll get False
because Python will use the default equality comparison, which compares the memory addresses of the objects. To tackle this, a __eq__
method can be defined to compare objects based on the name
variable, like this:
class User:
def __init__(self, name):
self.name = name
def __str__(self):
return f"[User] {self.name}"
def __eq__(self, other):
return self.name == other.name
user = User("Roopesh")
user1 = User("Roopesh")
print(user == user1)
So far, we have seen defining special methods (__init__
, __str__
, and more). Dataclasses are designed to simplify the creation of classes that primarily store data, and they automatically add commonly used special methods, reducing the need for us to write boilerplate code.
from dataclasses import dataclass
@dataclass
class User:
name: str
user = User("Roopesh")
print(user)
As you see above, there are no special methods defined (including the constructor) for the User
class, which is defined using the @dataclass
decorator. In summary, as said before, dataclasses
are designed to be yet another data structure by utilizing classes.
When I scrolled over the documentation, I found a couple of things interesting. I’ll share those in the following and leave others for you to look over.
asdict
Dataclasses can be converted to dictionaries using the asdict
function:
from dataclasses import dataclass, asdict
@dataclass
class Point:
x: int
y: int
@dataclass
class C:
mylist: list[Point]
p = Point(10, 20)
c = C([Point(0, 0), Point(10, 4)])
print(asdict(c))
>> {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
As you see, the variable name is used as a key in the dict conversion.
make_dataclass
A Dataclass can be created using make_dataclass
:
from dataclasses import dataclass, make_dataclass, field
Point = make_dataclass('Point',
[('x', int, field(default=0)),
('y', int, field(default=0))],
namespace={'add_one_to_x': lambda self: setattr(self, 'x', self.x + 1)})
origin = Point()
print(origin)
origin.add_one_to_x()
print(origin)
>> Point(x=0, y=0)
>> Point(x=1, y=0)
In the above code snippet, when the add_one_to_x
method is called on an instance of Point, it increments the x value by 1.
@classmethod decorator
@classmethod
decorator will correspond to a class itself instead of instance-specific. Meaning classmethod can act as an alternative constructor to create instances.
from dataclasses import dataclass
@dataclass
class Point:
x: int
y: int
@classmethod
def create_origin(cls):
return cls(0, 0)
origin = Point.create_origin()
point = Point(3, 4)
print(origin)
print(point)
>> Point(x=0, y=0)
>> Point(x=3, y=4)
As you see create_origin
method will act as a constructor to create an instance of a class when defined using @classmethod
decorator.