API documentation for the database interface

This API is thread safe (it uses a multiple reader, single writer locking scheme). You can access this API like this:

from calibre.library import db
db = db('Path to calibre library folder').new_api

If you are in a calibre plugin that is part of the main calibre GUI, you get access to it like this instead:

db = self.gui.current_db.new_api
class calibre.db.cache.Cache(backend, library_database_instance=None)[source]

An in-memory cache of the metadata.db file from a calibre library. This class also serves as a threadsafe API for accessing the database. The in-memory cache is maintained in normal form for maximum performance.

SQLITE is simply used as a way to read and write from metadata.db robustly. All table reading/sorting/searching/caching logic is re-implemented. This was necessary for maximum performance and flexibility.

class EventType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
book_created = 4

When a new book record is created in the database, with the book id as the only argument

book_edited = 8

When a book format is edited, with arguments: (book_id, fmt)

books_removed = 5

When books are removed from the database with the list of book ids as the only argument

format_added = 2

When a format is added to a book, with arguments: (book_id, format)

formats_removed = 3

When formats are removed from a book, with arguments: (mapping of book id to set of formats removed from the book)

indexing_progress_changed = 9

When the indexing progress changes

items_removed = 7

When items such as tags or authors are removed from some books. Arguments: (field_name, affected book ids, ids of removed items)

items_renamed = 6

When items such as tags or authors are renamed in some or all books. Arguments: (field_name, affected book ids, map of old item id to new item id)

metadata_changed = 1

When some metadata is changed for some books, with arguments: (name of changed field, set of affected book ids)

add_books(books, add_duplicates=True, apply_import_tags=True, preserve_uuid=False, run_hooks=True, dbapi=None)[source]

Add the specified books to the library. Books should be an iterable of 2-tuples, each 2-tuple of the form (mi, format_map) where mi is a Metadata object and format_map is a dictionary of the form {fmt: path_or_stream}, for example: {'EPUB': '/path/to/file.epub'}.

Returns a pair of lists: ids, duplicates. ids contains the book ids for all newly created books in the database. duplicates contains the (mi, format_map) for all books that already exist in the database as per the simple duplicate detection heuristic used by has_book().

add_custom_book_data(name, val_map, delete_first=False)[source]

Add data for name where val_map is a map of book_ids to values. If delete_first is True, all previously stored data for name will be removed.

add_extra_files(book_id, map_of_relpath_to_stream_or_path, replace=True, auto_rename=False)[source]

Add extra data files

add_format(book_id, fmt, stream_or_path, replace=True, run_hooks=True, dbapi=None)[source]

Add a format to the specified book. Return True if the format was added successfully.

Parameters:
  • replace – If True replace existing format, otherwise if the format already exists, return False.

  • run_hooks – If True, file type plugins are run on the format before and after being added.

  • dbapi – Internal use only.

add_listener(event_callback_function, check_already_added=False)[source]

Register a callback function that will be called after certain actions are taken on this database. The function must take three arguments: (EventType, library_id, event_type_specific_data)

add_notes_resource(path_or_stream_or_data, name: str, mtime: float = None) int[source]

Add the specified resource so it can be referenced by notes and return its content hash

all_annotation_types()[source]

Return a tuple of all annotation types in the database.

all_annotation_users()[source]

Return a tuple of all (user_type, user name) that have annotations.

all_annotations(restrict_to_user=None, limit=None, annotation_type=None, ignore_removed=False, restrict_to_book_ids=None)[source]

Return a tuple of all annotations matching the specified criteria. ignore_removed controls whether removed (deleted) annotations are also returned. Removed annotations are just a skeleton used for merging of annotations.

all_annotations_for_book(book_id)[source]

Return a tuple containing all annotations for the specified book_id as a dict with keys: format, user_type, user, annotation. Here, annotation is the annotation data.

all_book_ids(type=<class 'frozenset'>)[source]

Frozen set of all known book ids.

all_field_for(field, book_ids, default_value=None)[source]

Same as field_for, except that it operates on multiple books at once

all_field_ids(name)[source]

Frozen set of ids for all values in the field name.

all_field_names(field)[source]

Frozen set of all fields names (should only be used for many-one and many-many fields)

annotation_count_for_book(book_id)[source]

Return the number of annotations for the specified book available in the database.

annotations_map_for_book(book_id, fmt, user_type='local', user='viewer')[source]

Return a map of annotation type -> annotation data for the specified book_id, format, user and user_type.

author_data(author_ids=None)[source]

Return author data as a dictionary with keys: name, sort, link

If no authors with the specified ids are found an empty dictionary is returned. If author_ids is None, data for all authors is returned.

author_sort_from_authors(authors, key_func=<function make_change_case_func.<locals>.change_case>)[source]

Given a list of authors, return the author_sort string for the authors, preferring the author sort associated with the author over the computed string.

books_for_field(name, item_id)[source]

Return all the books associated with the item identified by item_id, where the item belongs to the field name.

Returned value is a set of book ids, or the empty set if the item or the field does not exist.

books_in_virtual_library(vl, search_restriction=None, virtual_fields=None)[source]

Return the set of books in the specified virtual library

compress_covers(book_ids, jpeg_quality=100, progress_callback=None)[source]

Compress the cover images for the specified books. A compression quality of 100 will perform lossless compression, otherwise lossy compression.

The progress callback will be called with the book_id and the old and new sizes for each book that has been processed. If an error occurs, the new size will be a string with the error details.

copy_cover_to(book_id, dest, use_hardlink=False, report_file_size=None)[source]

Copy the cover to the file like object dest. Returns False if no cover exists or dest is the same file as the current cover. dest can also be a path in which case the cover is copied to it if and only if the path is different from the current path (taking case sensitivity into account).

copy_format_to(book_id, fmt, dest, use_hardlink=False, report_file_size=None)[source]

Copy the format fmt to the file like object dest. If the specified format does not exist, raises NoSuchFormat error. dest can also be a path (to a file), in which case the format is copied to it, iff the path is different from the current path (taking case sensitivity into account).

cover(book_id, as_file=False, as_image=False, as_path=False, as_pixmap=False)[source]

Return the cover image or None. By default, returns the cover as a bytestring.

WARNING: Using as_path will copy the cover to a temp file and return the path to the temp file. You should delete the temp file when you are done with it.

Parameters:
  • as_file – If True return the image as an open file object (a SpooledTemporaryFile)

  • as_image – If True return the image as a QImage object

  • as_pixmap – If True return the image as a QPixmap object

  • as_path – If True return the image as a path pointing to a temporary file

data_for_find_identical_books()[source]

Return data that can be used to implement find_identical_books() in a worker process without access to the db. See db.utils for an implementation.

data_for_has_book()[source]

Return data suitable for use in has_book(). This can be used for an implementation of has_book() in a worker process without access to the db.

delete_annotations(annot_ids)[source]

Delete annotations with the specified ids.

delete_custom_book_data(name, book_ids=())[source]

Delete data for name. By default deletes all data, if you only want to delete data for some book ids, pass in a list of book ids.

delete_trash_entry(book_id, category)[source]

Delete an entry from the trash. Here category is ‘b’ for books and ‘f’ for formats.

embed_metadata(book_ids, only_fmts=None, report_error=None, report_progress=None)[source]

Update metadata in all formats of the specified book_ids to current metadata in the database.

expire_old_trash()[source]

Expire entries from the trash that are too old

export_note(field, item_id) str[source]

Export the note as a single HTML document with embedded images as data: URLs

fast_field_for(field_obj, book_id, default_value=None)[source]

Same as field_for, except that it avoids the extra lookup to get the field object

field_for(name, book_id, default_value=None)[source]

Return the value of the field name for the book identified by book_id. If no such book exists or it has no defined value for the field name or no such field exists, then default_value is returned.

default_value is not used for title, title_sort, authors, author_sort and series_index. This is because these always have values in the db. default_value is used for all custom columns.

The returned value for is_multiple fields are always tuples, even when no values are found (in other words, default_value is ignored). The exception is identifiers for which the returned value is always a dictionary. The returned tuples are always in link order, that is, the order in which they were created.

field_ids_for(name, book_id)[source]

Return the ids (as a tuple) for the values that the field name has on the book identified by book_id. If there are no values, or no such book, or no such field, an empty tuple is returned.

field_supports_notes(field=None) bool[source]

Return True iff the specified field supports notes. If field is None return frozenset of all fields that support notes.

find_identical_books(mi, search_restriction='', book_ids=None)[source]

Finds books that have a superset of the authors in mi and the same title (title is fuzzy matched). See also data_for_find_identical_books().

format(book_id, fmt, as_file=False, as_path=False, preserve_filename=False)[source]

Return the e-book format as a bytestring or None if the format doesn’t exist, or we don’t have permission to write to the e-book file.

Parameters:
  • as_file – If True the e-book format is returned as a file object. Note that the file object is a SpooledTemporaryFile, so if what you want to do is copy the format to another file, use copy_format_to() instead for performance.

  • as_path – Copies the format file to a temp file and returns the path to the temp file

  • preserve_filename – If True and returning a path the filename is the same as that used in the library. Note that using this means that repeated calls yield the same temp file (which is re-created each time)

format_abspath(book_id, fmt)[source]

Return absolute path to the e-book file of format format. You should almost never use this, as it breaks the threadsafe promise of this API. Instead use, copy_format_to().

Currently used only in calibredb list, the viewer, edit book, compare_format to original format, open with, bulk metadata edit and the catalogs (via get_data_as_dict()).

Apart from the viewer, open with and edit book, I don’t believe any of the others do any file write I/O with the results of this call.

format_hash(book_id, fmt)[source]

Return the hash of the specified format for the specified book. The kind of hash is backend dependent, but is usually SHA-256.

format_metadata(book_id, fmt, allow_cache=True, update_db=False)[source]

Return the path, size and mtime for the specified format for the specified book. You should not use path unless you absolutely have to, since accessing it directly breaks the threadsafe guarantees of this API. Instead use the copy_format_to() method.

Parameters:
  • allow_cache – If True cached values are used, otherwise a slow filesystem access is done. The cache values could be out of date if access was performed to the filesystem outside of this API.

  • update_db – If True The max_size field of the database is updated for this book.

formats(book_id, verify_formats=True)[source]

Return tuple of all formats for the specified book. If verify_formats is True, verifies that the files exist on disk.

get_all_items_that_have_notes(field_name=None) set[int] | dict[str, set[int]][source]

Return all item_ids for items that have notes in the specified field or all fields if field_name is None

Returns all links for all fields referenced by book identified by book_id. If book_id doesn’t exist then the method returns {}.

Example: Assume author A has link X, author B has link Y, tag S has link F, and tag T has link G. If book 1 has author A and tag T, this method returns {‘authors’:{‘A’:’X’}, ‘tags’:{‘T’, ‘G’}}. If book 2’s author is neither A nor B and has no tags, this method returns {}.

Parameters:

book_id – the book id in question.

Returns:

{field: {field_value, link_value}, … for all fields with a field_value having a non-empty link value for that book

get_categories(sort='name', book_ids=None, already_fixed=None, first_letter_sort=False)[source]

Used internally to implement the Tag Browser

get_custom_book_data(name, book_ids=(), default=None)[source]

Get data for name. By default returns data for all book_ids, pass in a list of book ids if you only want some data. Returns a map of book_id to values. If a particular value could not be decoded, uses default for it.

get_id_map(field)[source]

Return a mapping of id numbers to values for the specified field. The field must be a many-one or many-many field, otherwise a ValueError is raised.

get_ids_for_custom_book_data(name)[source]

Return the set of book ids for which name has data.

get_item_id(field, item_name, case_sensitive=False)[source]

Return the item id for item_name or None if not found. This function is very slow if doing lookups for multiple names use either get_item_ids() or get_item_name_map(). Similarly, case sensitive lookups are faster than case insensitive ones.

get_item_ids(field, item_names, case_sensitive=False)[source]

Return a dict mapping item_name to the item id or None

get_item_name(field, item_id)[source]

Return the item name for the item specified by item_id in the specified field. See also get_id_map().

get_item_name_map(field, normalize_func=None)[source]

Return mapping of item values to ids

Return a dictionary of links for the supplied field.

Parameters:

for_field – the lookup name of the field for which the link map is desired

Returns:

{field_value:link_value, …} for non-empty links

get_metadata(book_id, get_cover=False, get_user_categories=True, cover_as_data=False)[source]

Return metadata for the book identified by book_id as a calibre.ebooks.metadata.book.base.Metadata object. Note that the list of formats is not verified. If get_cover is True, the cover is returned, either a path to temp file as mi.cover or if cover_as_data is True then as mi.cover_data.

get_next_series_num_for(series, field='series', current_indices=False)[source]

Return the next series index for the specified series, taking into account the various preferences that control next series number generation.

Parameters:
  • field – The series-like field (defaults to the builtin series column)

  • current_indices – If True, returns a mapping of book_id to current series_index value instead.

get_notes_resource(resource_hash) dict | None[source]

Return a dict containing the resource data and name or None if no resource with the specified hash is found

get_proxy_metadata(book_id)[source]

Like get_metadata() except that it returns a ProxyMetadata object that only reads values from the database on demand. This is much faster than get_metadata when only a small number of fields need to be accessed from the returned metadata object.

get_usage_count_by_id(field)[source]

Return a mapping of id to usage count for all values of the specified field, which must be a many-one or many-many field.

has_book(mi)[source]

Return True iff the database contains an entry with the same title as the passed in Metadata object. The comparison is case-insensitive. See also data_for_has_book().

has_format(book_id, fmt)[source]

Return True iff the format exists on disk

has_id(book_id)[source]

Return True iff the specified book_id exists in the db

import_note(field, item_id, path_to_html_file, path_is_data=False)[source]

Import a previously exported note or an arbitrary HTML file as the note for the specified item

init()[source]

Initialize this cache with data from the backend.

items_with_notes_in_book(book_id: int) dict[str, dict[int, str]][source]

Return a dict of field to items that have associated notes for that field for the specified book

Return the link, if any, for the specified item or None if no link is found

list_extra_files(book_id, use_cache=False, pattern='') Tuple[ExtraFile, ...][source]

Get information about extra files in the book’s directory.

Parameters:
  • book_id – the database book id for the book

  • pattern – the pattern of filenames to search for. Empty pattern matches all extra files. Patterns must use / as separator. Use the DATA_FILE_PATTERN constant to match files inside the data directory.

Returns:

A tuple of all extra files matching the specified pattern. Each element of the tuple is ExtraFile(relpath, file_path, stat_result). Where relpath is the relative path of the file to the book directory using / as a separator. stat_result is the result of calling os.stat() on the file.

merge_annotations_for_book(book_id, fmt, annots_list, user_type='local', user='viewer')[source]

Merge the specified annotations into the existing annotations for book_id, fm, user_type, and user.

merge_extra_files(dest_id, src_ids, replace=False)[source]

Merge the extra files from src_ids into dest_id. Conflicting files are auto-renamed unless replace=True in which case they are replaced.

move_book_from_trash(book_id)[source]

Undelete a book from the trash directory

move_format_from_trash(book_id, fmt)[source]

Undelete a format from the trash directory

multisort(fields, ids_to_sort=None, virtual_fields=None)[source]

Return a list of sorted book ids. If ids_to_sort is None, all book ids are returned.

fields must be a list of 2-tuples of the form (field_name, ascending=True or False). The most significant field is the first 2-tuple.

notes_data_for(field, item_id) str[source]

Return all notes data as a dict or None if note does not exist

notes_for(field, item_id) str[source]

Return the notes document or an empty string if not found

notes_resources_used_by(field, item_id)[source]

Return the set of resource hashes of all resources used by the note for the specified item

pref(name, default=None, namespace=None)[source]

Return the value for the specified preference or the value specified as default if the preference is not set.

read_backup(book_id)[source]

Return the OPF metadata backup for the book as a bytestring or None if no such backup exists.

remove_books(book_ids, permanent=False)[source]

Remove the books specified by the book_ids from the database and delete their format files. If permanent is False, then the format files are placed in the per-library trash directory.

remove_formats(formats_map, db_only=False)[source]

Remove the specified formats from the specified books.

Parameters:
  • formats_map – A mapping of book_id to a list of formats to be removed from the book.

  • db_only – If True, only remove the record for the format from the db, do not delete the actual format file from the filesystem.

Returns:

A map of book id to set of formats actually deleted from the filesystem for that book

remove_items(field, item_ids, restrict_to_book_ids=None)[source]

Delete all items in the specified field with the specified ids. Returns the set of affected book ids. restrict_to_book_ids is an optional set of books ids. If specified the items will only be removed from those books.

rename_extra_files(book_id, map_of_relpath_to_new_relpath, replace=False)[source]

Rename extra data files

rename_items(field, item_id_to_new_name_map, change_index=True, restrict_to_book_ids=None)[source]

Rename items from a many-one or many-many field such as tags or series.

Parameters:
  • change_index – When renaming in a series-like field also change the series_index values.

  • restrict_to_book_ids – An optional set of book ids for which the rename is to be performed, defaults to all books.

restore_book(book_id, mi, last_modified, path, formats, annotations=())[source]

Restore the book entry in the database for a book that already exists on the filesystem

restore_original_format(book_id, original_fmt)[source]

Restore the specified format from the previously saved ORIGINAL_FORMAT, if any. Return True on success. The ORIGINAL_FORMAT is deleted after a successful restore.

property safe_read_lock

A safe read lock is a lock that does nothing if the thread already has a write lock, otherwise it acquires a read lock. This is necessary to prevent DowngradeLockErrors, which can happen when updating the search cache in the presence of composite columns. Updating the search cache holds an exclusive lock, but searching a composite column involves reading field values via ProxyMetadata which tries to get a shared lock. There may be other scenarios that trigger this as well.

This property returns a new lock object on every access. This lock object is not recursive (for performance) and must only be used in a with statement as with cache.safe_read_lock: otherwise bad things will happen.

save_original_format(book_id, fmt)[source]

Save a copy of the specified format as ORIGINAL_FORMAT, overwriting any existing ORIGINAL_FORMAT.

search(query, restriction='', virtual_fields=None, book_ids=None)[source]

Search the database for the specified query, returning a set of matched book ids.

Parameters:
  • restriction – A restriction that is ANDed to the specified query. Note that restrictions are cached, therefore the search for a AND b will be slower than a with restriction b.

  • virtual_fields – Used internally (virtual fields such as on_device to search over).

  • book_ids – If not None, a set of book ids for which books will be searched instead of searching all books.

search_annotations(fts_engine_query, use_stemming=True, highlight_start=None, highlight_end=None, snippet_size=None, annotation_type=None, restrict_to_book_ids=None, restrict_to_user=None, ignore_removed=False)[source]

Return of a tuple of annotations matching the specified Full-text query.

search_notes(fts_engine_query='', use_stemming=True, highlight_start=None, highlight_end=None, snippet_size=None, restrict_to_fields=(), return_text=True, result_type=<class 'tuple'>, process_each_result=None, limit=None)[source]

Search the text of notes using an FTS index. If the query is empty return all notes.

set_annotations_for_book(book_id, fmt, annots_list, user_type='local', user='viewer')[source]

Set all annotations for the specified book_id, fmt, user_type and user.

set_conversion_options(options, fmt='PIPE')[source]

options must be a map of the form {book_id:conversion_options}

set_cover(book_id_data_map)[source]

Set the cover for this book. The data can be either a QImage, QPixmap, file object or bytestring. It can also be None, in which case any existing cover is removed.

set_field(name, book_id_to_val_map, allow_case_change=True, do_path_update=True)[source]

Set the values of the field specified by name. Returns the set of all book ids that were affected by the change.

Parameters:
  • book_id_to_val_map – Mapping of book_ids to values that should be applied.

  • allow_case_change – If True, the case of many-one or many-many fields will be changed. For example, if a book has the tag tag1 and you set the tag for another book to Tag1 then the both books will have the tag Tag1 if allow_case_change is True, otherwise they will both have the tag tag1.

  • do_path_update – Used internally, you should never change it.

Sets links for item values in field. Note: this method doesn’t change values not in the value_to_link_map

Parameters:
  • field – the lookup name

  • value_to_link_map – dict(field_value:link, …). Note that these are values, not field ids.

Returns:

books changed by setting the link

set_metadata(book_id, mi, ignore_errors=False, force_changes=False, set_title=True, set_authors=True, allow_case_change=False)[source]

Set metadata for the book id from the Metadata object mi

Setting force_changes=True will force set_metadata to update fields even if mi contains empty values. In this case, ‘None’ is distinguished from ‘empty’. If mi.XXX is None, the XXX is not replaced, otherwise it is. The tags, identifiers, and cover attributes are special cases. Tags and identifiers cannot be set to None so they will always be replaced if force_changes is true. You must ensure that mi contains the values you want the book to have. Covers are always changed if a new cover is provided, but are never deleted. Also note that force_changes has no effect on setting title or authors.

set_notes_for(field, item_id, doc: str, searchable_text: str = '', resource_hashes=(), remove_unused_resources=False) int[source]

Set the notes document. If the searchable text is different from the document, specify it as searchable_text. If the document references resources their hashes must be present in resource_hashes. Set remove_unused_resources to True to cleanup unused resources, note that updating a note automatically cleans up resources pertaining to that note anyway.

set_pref(name, val, namespace=None)[source]

Set the specified preference to the specified value. See also pref().

split_if_is_multiple_composite(f, val)[source]

If f is a composite column lookup key and the column is is_multiple then split v into unique non-empty values. The comparison is case sensitive. Order is not preserved. Return a list() for compatibility with proxy metadata field getters, for example tags.

tags_older_than(tag, delta=None, must_have_tag=None, must_have_authors=None)[source]

Return the ids of all books having the tag tag that are older than the specified time. tag comparison is case insensitive.

Parameters:
  • delta – A timedelta object or None. If None, then all ids with the tag are returned.

  • must_have_tag – If not None the list of matches will be restricted to books that have this tag

  • must_have_authors – A list of authors. If not None the list of matches will be restricted to books that have these authors (case insensitive).

unretire_note_for(field, item_id) int[source]

Unretire a previously retired note for the specified item. Notes are retired when an item is removed from the database

update_annotations(annot_id_map)[source]

Update annotations.

user_categories_for_books(book_ids, proxy_metadata_map=None)[source]

Return the user categories for the specified books. proxy_metadata_map is optional and is useful for a performance boost, in contexts where a ProxyMetadata object for the books already exists. It should be a mapping of book_ids to their corresponding ProxyMetadata objects.