Aaronblog

Reducing Adobe Lightroom catalog size by automatically purging big history states

Recently, when backing up my pictures folder, I noticed my Lightroom catalog had notably increased in size, even though I couldn't remember adding that many pictures. I decided to investigate.

I remembered reading a few years ago, on Jeffrey Friedl's blog, that he had a great experience purging his library's edit history. As his library is considerably larger than mine (~132k photos at the time, wheras I only have about ~9k RAWs in it), his gains were quite substantial. Given that I didn't know the source of the increase, I couldn't delete the history for a single photo — and I don't want to part with all my edit history quite yet. However, it turned out to be a great lead.

In this blog, I'll do my best to explain my reasoning, but if you're just interested in the results: there's a wrap-up at the bottom of the post.

First off: Adobe Lightroom's catalog is an SQLite database

Even though Adobe Lightroom's catalog files use an *.lrcat extension, inspecting it reveals it to be an ordinary SQLite 3 database. however, this is great, because it allows us to edit it using the SQLite toolchain. I used the sqlite3 command-line tool, but there are great GUI tools available as well.

It should go without saying, but I feel I should mention it anyway: make a backup of your Lightroom catalog before fiddling with it!

I won't be going into detail on how the database is laid out — that's beyond the scope of this post. However, the SQL queries below should provide some insight into its structure, if that's what you're looking for.

Finding images with many edit states

Exploring the database, one particular table quickly catches my eye: Adobe_libraryImageDevelopHistoryStep. As it turns out, this contains the edit history for all photos.

I used the following query to find images with loads of history states:

SELECT hi.image, COUNT(*) AS numStates,
       rf.absolutePath || fo.pathFromRoot || fi.baseName || '.' || fi.extension AS fullName
FROM Adobe_libraryImageDevelopHistoryStep AS hi
INNER JOIN Adobe_images AS im ON im.id_local = hi.image
INNER JOIN AgLibraryFile fi ON im.rootFile = fi.id_local
INNER JOIN AgLibraryFolder fo ON fi.folder = fo.id_local
INNER JOIN AgLibraryRootFolder rf ON fo.rootFolder = rf.id_local
GROUP BY hi.image
ORDER BY numStates DESC

An example of what the output will look like:

image    num   fullName
304727   363   D:/Pictures/Adobe/2016/2016-01-11 HashRU bestuur/hashru-bestuur-9.jpg
157091   146   D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8860.CR2
128589   136   D:/Pictures/Adobe/2013/2013-06-01 Description/IMG_7868.CR2
157877   126   D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8916.CR2
156158   115   D:/Pictures/Adobe/2014/2014-06-13 Description/IMG_8794.CR2
157365   101   D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8882.CR2
88511    95    D:/Pictures/Adobe/2012/2012-08-25 Description/IMG_6269.CR2
156172   95    D:/Pictures/Adobe/2014/2014-06-13 Description/IMG_8795.CR2
156533   89    D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8822.CR2
50781    70    D:/Pictures/Adobe/2012/2012-05-23 Sunset/IMG_3406.CR2
...

So, how large are they?

Curiously, Lightroom saves a plaintext history state in the text column of this table — it appears these are serialised LUA objects. If you're using loads of brushes, as it turned out I did in the images in question, this quickly adds up. To how much, I wondered?

The query below yields the number of states, as well as their combined size in megabytes.

SELECT hi.image, COUNT(*) AS numStates, SUM(LENGTH(text)) / (1024 * 1024) AS sizeInMB,
       rf.absolutePath || fo.pathFromRoot || fi.baseName || '.' || fi.extension AS fullName
FROM Adobe_libraryImageDevelopHistoryStep AS hi
INNER JOIN Adobe_images AS im ON im.id_local = hi.image
INNER JOIN AgLibraryFile fi ON im.rootFile = fi.id_local
INNER JOIN AgLibraryFolder fo ON fi.folder = fo.id_local
INNER JOIN AgLibraryRootFolder rf ON fo.rootFolder = rf.id_local
GROUP BY hi.image
ORDER BY sizeInMB DESC

An example of what the output will look like:

image    num   MB   fullName
304727   363   65   D:/Pictures/Adobe/2016/2016-01-11/hashru-bestuur-9.jpg
157877   126   24   D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8916.CR2
128589   136   14   D:/Pictures/Adobe/2013/2013-06-01 Description/IMG_7868.CR2
157091   146   10   D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8860.CR2
156172    95    5   D:/Pictures/Adobe/2014/2014-06-13 Description/IMG_8795.CR2
156533    89    4   D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8822.CR2
156158   115    3   D:/Pictures/Adobe/2014/2014-06-13 Description/IMG_8794.CR2
 88511    95    2   D:/Pictures/Adobe/2012/2012-08-25 Description/IMG_6269.CR2
160062    42    2   D:/Pictures/Adobe/2014/2014-06-14 Phoenix Down/IMG_9057.CR2
135868    16    1   D:/Pictures/Adobe/2013/2013-08-14 Amsterdam/IMG_8204.CR2
...

So, my recent filesize increase was likely due to me editing a single photo, adding a whopping 65MB!

Purging the history for this photo as Jeffrey suggests frees the disk space again (after optimising the catalog), but I thought I'd go one step further — and automate this for the bigger photos in my catalog.

Intermission: can history states be purged safely?

As it turns out, yes, we can safely delete history states. The Adobe_imageDevelopSettings and Adobe_imageDevelopSettingsBefore tables respectively contain a copy of the current settings and the 'before' settings. Great!

Out of interest, using this query you can see what photos' develop settings take up most space:

SELECT ds.image, LENGTH(text) / 1024 AS sizeInKiB,
       rf.absolutePath || fo.pathFromRoot || fi.baseName || '.' || fi.extension AS fullName
FROM Adobe_imageDevelopSettings AS ds
INNER JOIN Adobe_images AS im ON im.id_local = ds.image
INNER JOIN AgLibraryFile fi ON im.rootFile = fi.id_local
INNER JOIN AgLibraryFolder fo ON fi.folder = fo.id_local
INNER JOIN AgLibraryRootFolder rf ON fo.rootFolder = rf.id_local
GROUP BY ds.image
ORDER BY sizeInKiB DESC

Again, an example of what the output will look like:

image   size  fullName
304727   410   D:/Pictures/Adobe/2016/2016-01-11/hashru-bestuur-9.jpg
157877   298   D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8916.CR2
157091   189   D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8860.CR2
128589   186   D:/Pictures/Adobe/2013/2013-06-01 Description/IMG_7868.CR2
156533   103   D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8822.CR2
160062   103   D:/Pictures/Adobe/2014/2014-06-14 Phoenix Down/IMG_9057.CR2
156172   102   D:/Pictures/Adobe/2014/2014-06-13 Description/IMG_8795.CR2
135868    91   D:/Pictures/Adobe/2013/2013-08-14 Amsterdam/IMG_8204.CR2
157337    84   D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8877.CR2
258363    65   D:/Pictures/Adobe/2016/2016-02-21 Jiufen/IMG_2474.CR2
...

Finding the biggest photos

As we've already established the combined sized of history states, we can easily find just the big ones by adding a HAVING clause to the query we described earlier to get a list of photos whose edit history amounts to more than 10MB of data:

SELECT hi.image, COUNT(*) AS numStates, SUM(LENGTH(text)) / (1024 * 1024) AS sizeInMB,
       rf.absolutePath || fo.pathFromRoot || fi.baseName || '.' || fi.extension AS fullName
FROM Adobe_libraryImageDevelopHistoryStep AS hi
INNER JOIN Adobe_images AS im ON im.id_local = hi.image
INNER JOIN AgLibraryFile fi ON im.rootFile = fi.id_local
INNER JOIN AgLibraryFolder fo ON fi.folder = fo.id_local
INNER JOIN AgLibraryRootFolder rf ON fo.rootFolder = rf.id_local
GROUP BY hi.image
HAVING sizeInMB >= 10
ORDER BY sizeInMB DESC

Without the fancy clauses to make the output human readable, this is what the query looks like:

SELECT hi.image
FROM Adobe_libraryImageDevelopHistoryStep AS hi
GROUP BY hi.image
HAVING SUM(LENGTH(text)) / (1024 * 1024) >= 10

Deleting history states for the biggest photos

We can use the output of the previous query by wrapping it into a subquery. Trivially, this looks like:

SELECT image
FROM Adobe_libraryImageDevelopHistoryStep
WHERE image IN (
    SELECT hi.image
    FROM Adobe_libraryImageDevelopHistoryStep AS hi
    GROUP BY hi.image
    HAVING SUM(LENGTH(text)) / (1024 * 1024) >= 10
)

If, on inspection, this looks good, we can get more serious by replacing the SELECT ... FROM clause with a DELETE FROM clause:

DELETE FROM Adobe_libraryImageDevelopHistoryStep
WHERE image IN (
    SELECT hi.image
    FROM Adobe_libraryImageDevelopHistoryStep AS hi
    GROUP BY hi.image
    HAVING SUM(LENGTH(text)) / (1024 * 1024) >= 10
)

This will delete the history states for all photos whose states amount to more than 10MB of data. Note: this will not delete the initial ('before') or final develop settings.

Finally, optimise the database to reclaim your disk space. You can do this through SQLite, but also by clicking 'Optimise Catalog' in Adobe Lightroom.

Using the method described here, my Lightroom catalog size decreased by 33%. Very satisfying!

Comments

Pretty impressive!

I went a step further and deleted the entire contents of Adobe_libraryImageDevelopHistoryStep reducing my catalog from over 7GB to 760MB!

Thanks :)

Comments closed

This blog post has been archived; it is currently not possible to comment.