NamedRow a Better Sqlite3 row_factory for Python

2019-10-24

Currently, sqlite3 returns rows by tuple or sqlite3.Row for dict-style, index access. I constantly find myself wanting attribute access like namedtuple for rows. I find attribute access cleaner without the brackets and quoting field names. Unlike previous discussions, I don't want to use the namedtuple object. I appreciate the simple API and minimal memory consumption of sqlite3.Row and used it as my guide in creating sqlite3.NamedRow to allow attribute and dict-style access faster than sqlite3.Row.

Why a new object instead of adding attribute access to the existing sqlite3.Row? .. TEASER_END There is an existing member method keys and any table with the field "keys" would cause a hard to debug, easily avoidable, collision.

Features

Optimized in C, so it will be faster than any python implementation.
Access columns by attribute for all valid names and by index for all names.
Iterate over fields by name/value pairs.
Works with standard functions len and contains.
Identical memory consumption to sqlite3.Row with two references: the data tuple and the cursor description.
Identical speed to sqlite3.Row if not faster. Timing usually has it slightly faster for index by name or attribute, but it is almost identical.

Compare to sqlite3.Row

NamedRow only has class dunder methods to allow any valid field name. When the field name would be an invalid attribute name, you have two options: either use the SQL AS in the select statement or index by name. To get the field names use the iterator or to the same from the cursor.description.

titles = [x[0] for x in named_row]
titles = [x[0] for x in cursor.description]
titles = dict(named_row).keys()

Name and dict access are no longer case-insensitive. There are three reasons for this.

Case-insensitive comparison only works well for ASCII characters. In a Unicode world, case-insensitive edge cases create unnecessary errors. Looking at a several existing codebases, this feature of Row is almost never used and I believe is not needed in NamedRow.
Case-insensitivity is not "pythonic": "special cases aren't special enough to break the rules". Where row.name, row.Name, and row.NAME are all the same it gives off the faint code smell of something wrong. When case-insensitivity is needed and the query SELECT can not be modified, sqlite3.Row is still there.
Code is simpler and easier to maintain without edge cases.
It is faster.

Timing Results

NamedRow is faster than sqlite3.Row for index-by-name access. As the number of fields in the row increases, NamedRow significantly out performs sqlite3.Row in the graph below. Yes, we are already talking about small units of time, but when working with thousands of rows, the cumulative savings can be noticable.

Methodology

I wanted to time the creation of the row_factory object and accessing each field within that object. I started by creating some repeatably random length fields.

FIELD_MAP = "abcdefghijklmno pqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

def make_field(letter_idx: int) -> str:
    """Make a string of length between 4 and 15."""
    return FIELD_MAP[letter_idx] * rando.randrange(4, 15)

random.seed(101038)
fields = [make_field(i) for i in range(number_of_fields)]

I use this to create my setup code.

import sqlite3
cursor = sqlite3.Connection(':memory:').cursor()
# fields a sequence of fieldnames
data = tuple(1 for x in range(len(fields)))

# Setup the factory_row
cursor.factory_row = sqlite3.NamedRow
select = ",".join([f"{i} as {field}" for i, field in enumerate(fields)])
cursor.execute("SELECT {select}")

NamedRow is significanly faster than sqlite3.Row for index-by-name and attribute access.
NamedRow attribute access is faster than index-by-name. The index incures a bit of overhead in determing if the index is by integer or by string which we see.
NamedRow and sqlite3.Row have near identical performance when indexing by integer.

Usage Examples

>>> import sqlite3
>>> c = sqlite3.Connection(":memory:").cursor()
>>> c.row_factory = sqlite3.NamedRow
>>> named_row = c.execute("SELECT 'A' AS letter, '.-' AS morse, 65 AS ord").fetchone()

>>> len(named_row)
3
>>> 'letter' in named_row
true
>>> row == row
true
>>> hash(row)
5512444875192833987

#Index by number and range.
>>> named_row[0]
'A'
>>> named_row[1:]
('.-', 65)

#Index by column name.
>>> named_row[ord]
65

#Access by attribute.
>>> named_row.morse
'.-'

#Iterate row for name/value pairs.
>>> dict(named_row)
{'letter': 'A', 'morse': '.-', 'ord': 65}
>>> tuple(named_row)
(('letter', 'A'), ('morse', '.-'), ('ord', 65))

Develop in Linux

Features

Compare to sqlite3.Row

Timing Results

Methodology

Usage Examples