An ORM can bite you

How implicit queries can influence an application performance
Jarosław Orzeł
2022-05-14

Introduction

Object Relational Mappers (ORMs) are widely used in software development to abstract database operations in our application code by providing a layer between object-oriented programming language and relational tables in a database. However we should be conscious that simple and inconspicuous expressions provided by our ORM can lead to heavy actions underhood. To present it I will take SQLAlchemy, one of the most popular ORMs in Python world.

Suppose we have a set of simplified models representing a User in a Company:

1class Person(Base): 2 __tablename__ = "person" 3 id = Column(Integer, primary_key=True, autoincrement=True) 4 name = Column(String) 5 user = relationship("User", uselist=False) 6 7class User(Base): 8 __tablename__ = "user" 9 id = Column(Integer, primary_key=True, autoincrement=True) 10 person_id = Column(Integer, ForeignKey("person.id")) 11 person = relationship("Person") 12 my_accounts = relationship("UserAccount") 13 14class Company(Base): 15 __tablename__ = "company" 16 id = Column(Integer, primary_key=True, autoincrement=True) 17 name = Column(String) 18 19class Account(Base): 20 __tablename__ = "account" 21 id = Column(Integer, primary_key=True, autoincrement=True) 22 status = Column(String) 23 company_id = Column(Integer, ForeignKey("company.id")) 24 company = relationship("Company") 25 26class UserAccount(Base): 27 __tablename__ = "user_account" 28 account_id = Column(Integer, ForeignKey("account.id"), primary_key=True) 29 account = relationship("Account") 30 user_id = Column(Integer, ForeignKey("user.id"), primary_key=True) 31 user = relationship("User") 32

Query counting

Now we would like to perform a query to get a Person matching provided filters and print Company.name that is corresponding to the retrieved Person object.

1with Session() as session: 2 with DBStatementCounter(session.connection()) as ctr: 3 person = ( 4 session.query(Person) 5 .join("user", "my_accounts", "account", "company") 6 .filter( 7 Person.name.ilike("test%"), 8 Account.status.ilike("x%"), 9 Company.name.ilike("company%"), 10 ).first() 11 ) 12 if person: 13 print(person.user.my_accounts[0].account.company.name) 14

DBStatementCounter is a helper class that counts how many database statements was executed within a given context. Assuming that a Person was found, what number of queries do you expect from above part of code?

Correct answer is: 5. Surprised?

One query is explicit:

1person = session.query(Person) 2 .join("user", "my_accounts", "account", "company") 3 .filter( 4 Person.name.ilike("test%"), 5 Account.status.ilike("x%"), 6 Company.name.ilike("company%"), 7 ).first() 8

Four remaining queries are implicit:

  • person.user
  • user.my_accounts[0]
  • my_accounts[0].account.company.name
  • account.company

It results from default lazy loading strategy of our ORM. When we load Person object it does not automatically load objects through defined foreign keys. We can see how SQL looks in that case:

1SELECT person.id AS person_id, person.name AS person_name 2FROM person 3JOIN user ON person.id = user.person_id 4JOIN user_account ON user.id = user_account.user_id 5JOIN account ON account.id = user_account.account_id 6JOIN company ON company.id = account.company_id 7WHERE lower(person.name) LIKE lower(?) AND lower(account.status) LIKE lower(?) AND lower(company.name) LIKE lower(?) 8

The query has properly joined tables, however in SELECT section there are only attributes associated to person table, so to retrieve column values from joined tables we need another query (or queries).

Eager loading

We can change that behaviour by passing to relationship an expression: lazy="joined". However, it seems reasonable to not load all connected tables every time we need only a Person's columns. The better option is to do it on demand, when we are sure that columns corresponding to linked tables would be used. In SQLAlchemy we can do it using joinedload function that provide attributes from joined tables in SELECT results.

1person = session.query(Person) 2 .join("user", "my_accounts", "account", "company") 3 .options( 4 joinedload("user"), 5 joinedload("user", "my_accounts"), 6 joinedload("user", "my_accounts", "account"), 7 joinedload("user", "my_accounts", "account", "company"), 8 ) 9 .filter( 10 Person.name.ilike("test%"), 11 Account.status.ilike("x%"), 12 Company.name.ilike("company%"), 13 ).first() 14

That query results in only one query hitting the database. Great! But, when we look at the SQL statement, there is something weird in it:

1SELECT 2person.id AS person_id, person.name AS person_name, 3company_1.id AS company_1_id, company_1.name AS company_1_name, 4account_1.id AS account_1_id, account_1.status AS account_1_status, account_1.company_id AS account_1_company_id, user_account_1.account_id AS user_account_1_account_id, user_account_1.user_id AS user_account_1_user_id, 5user_1.id AS user_1_id, user_1.person_id AS user_1_person_id 6FROM person 7JOIN user ON person.id = user.person_id 8JOIN user_account ON user.id = user_account.user_id 9JOIN account ON account.id = user_account.account_id 10JOIN company ON company.id = account.company_id 11LEFT OUTER JOIN user AS user_1 ON person.id = user_1.person_id 12LEFT OUTER JOIN user_account AS user_account_1 ON user_1.id = user_account_1.user_id 13LEFT OUTER JOIN account AS account_1 ON account_1.id = user_account_1.account_id 14LEFT OUTER JOIN company AS company_1 ON company_1.id = account_1.company_id 15WHERE lower(person.name) LIKE lower(?) AND lower(account.status) LIKE lower(?) AND lower(company.name) LIKE lower(?) 16

To get additional attributes we have our tables joined twice... If we do not have many records in the database, this would not be a problem. Otherwise, we can encounter huge performance issue like described here. The solution here is to replace joinedload with contains_eager, because joinedload basically should not be used with filtering.

1person = session.query(Person) 2 .join("user", "my_accounts", "account", "company") 3 .options( 4 contains_eager("user"), 5 contains_eager("user", "my_accounts"), 6 contains_eager("user", "my_accounts", "account"), 7 contains_eager("user", "my_accounts", "account", "company"), 8 ) 9 .filter( 10 Person.name.ilike("test%"), 11 Account.status.ilike("x%"), 12 Company.name.ilike("company%"), 13 ).first() 14

Now we are happy, because there is only one database query execution and the SQL statement looks correctly:

1SELECT 2company.id AS company_id, company.name AS company_name, account.id AS account_id, account.status AS account_status, account.company_id AS account_company_id, user_account.account_id AS user_account_account_id, 3user_account.user_id AS user_account_user_id, 4user.id AS user_id, user.person_id AS user_person_id, 5person.id AS person_id, person.name AS person_name 6FROM person 7JOIN user ON person.id = user.person_id 8JOIN user_account ON user.id = user_account.user_id 9JOIN account ON account.id = user_account.account_id 10JOIN company ON company.id = account.company_id 11WHERE lower(person.name) LIKE lower(?) AND lower(account.status) LIKE lower(?) AND lower(company.name) LIKE lower(?) 12

Conclusion

This was SQLAlchemy and Python example. However, no matter you use Django ORM, Java Hibernate or .NET Nhibernate, you will encounter the same issues and should take proper decisions about lazy loading of you objects. So, be careful with your ORM.

Ta strona korzysta z ciasteczek opartych na sztucznej inteligencji, aby zapewnić najlepsze doświadczenia zapoznaj sie z Ciasteczkową polityką