Reducing Adobe Lightroom catalog size by automatically purging big history states
Recently, when backing up my pictures folder, I noticed my Lightroom catalog had notably increased in size, even though I couldn't remember adding that many pictures. I decided to investigate.
I remembered reading a few years ago, on Jeffrey Friedl's blog, that he had a great experience purging his library's edit history. As his library is considerably larger than mine (~132k photos at the time, wheras I only have about ~9k RAWs in it), his gains were quite substantial. Given that I didn't know the source of the increase, I couldn't delete the history for a single photo — and I don't want to part with all my edit history quite yet. However, it turned out to be a great lead.
In this blog, I'll do my best to explain my reasoning, but if you're just interested in the results: there's a wrap-up at the bottom of the post.
First off: Adobe Lightroom's catalog is an SQLite database
Even though Adobe Lightroom's catalog files use an *.lrcat
extension, inspecting it reveals it to be an ordinary SQLite 3 database. however, this is great, because it allows us to edit it using the SQLite toolchain. I used the sqlite3
command-line tool, but there are great GUI tools available as well.
It should go without saying, but I feel I should mention it anyway: make a backup of your Lightroom catalog before fiddling with it!
I won't be going into detail on how the database is laid out — that's beyond the scope of this post. However, the SQL queries below should provide some insight into its structure, if that's what you're looking for.
Finding images with many edit states
Exploring the database, one particular table quickly catches my eye: Adobe_libraryImageDevelopHistoryStep
. As it turns out, this contains the edit history for all photos.
I used the following query to find images with loads of history states:
SELECT hi.image, COUNT(*) AS numStates, rf.absolutePath || fo.pathFromRoot || fi.baseName || '.' || fi.extension AS fullName FROM Adobe_libraryImageDevelopHistoryStep AS hi INNER JOIN Adobe_images AS im ON im.id_local = hi.image INNER JOIN AgLibraryFile fi ON im.rootFile = fi.id_local INNER JOIN AgLibraryFolder fo ON fi.folder = fo.id_local INNER JOIN AgLibraryRootFolder rf ON fo.rootFolder = rf.id_local GROUP BY hi.image ORDER BY numStates DESC
An example of what the output will look like:
image num fullName 304727 363 D:/Pictures/Adobe/2016/2016-01-11 HashRU bestuur/hashru-bestuur-9.jpg 157091 146 D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8860.CR2 128589 136 D:/Pictures/Adobe/2013/2013-06-01 Description/IMG_7868.CR2 157877 126 D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8916.CR2 156158 115 D:/Pictures/Adobe/2014/2014-06-13 Description/IMG_8794.CR2 157365 101 D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8882.CR2 88511 95 D:/Pictures/Adobe/2012/2012-08-25 Description/IMG_6269.CR2 156172 95 D:/Pictures/Adobe/2014/2014-06-13 Description/IMG_8795.CR2 156533 89 D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8822.CR2 50781 70 D:/Pictures/Adobe/2012/2012-05-23 Sunset/IMG_3406.CR2 ...
So, how large are they?
Curiously, Lightroom saves a plaintext history state in the text
column of this table — it appears these are serialised LUA objects. If you're using loads of brushes, as it turned out I did in the images in question, this quickly adds up. To how much, I wondered?
The query below yields the number of states, as well as their combined size in megabytes.
SELECT hi.image, COUNT(*) AS numStates, SUM(LENGTH(text)) / (1024 * 1024) AS sizeInMB, rf.absolutePath || fo.pathFromRoot || fi.baseName || '.' || fi.extension AS fullName FROM Adobe_libraryImageDevelopHistoryStep AS hi INNER JOIN Adobe_images AS im ON im.id_local = hi.image INNER JOIN AgLibraryFile fi ON im.rootFile = fi.id_local INNER JOIN AgLibraryFolder fo ON fi.folder = fo.id_local INNER JOIN AgLibraryRootFolder rf ON fo.rootFolder = rf.id_local GROUP BY hi.image ORDER BY sizeInMB DESC
An example of what the output will look like:
image num MB fullName 304727 363 65 D:/Pictures/Adobe/2016/2016-01-11/hashru-bestuur-9.jpg 157877 126 24 D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8916.CR2 128589 136 14 D:/Pictures/Adobe/2013/2013-06-01 Description/IMG_7868.CR2 157091 146 10 D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8860.CR2 156172 95 5 D:/Pictures/Adobe/2014/2014-06-13 Description/IMG_8795.CR2 156533 89 4 D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8822.CR2 156158 115 3 D:/Pictures/Adobe/2014/2014-06-13 Description/IMG_8794.CR2 88511 95 2 D:/Pictures/Adobe/2012/2012-08-25 Description/IMG_6269.CR2 160062 42 2 D:/Pictures/Adobe/2014/2014-06-14 Phoenix Down/IMG_9057.CR2 135868 16 1 D:/Pictures/Adobe/2013/2013-08-14 Amsterdam/IMG_8204.CR2 ...
So, my recent filesize increase was likely due to me editing a single photo, adding a whopping 65MB!
Purging the history for this photo as Jeffrey suggests frees the disk space again (after optimising the catalog), but I thought I'd go one step further — and automate this for the bigger photos in my catalog.
Intermission: can history states be purged safely?
As it turns out, yes, we can safely delete history states. The Adobe_imageDevelopSettings
and Adobe_imageDevelopSettingsBefore
tables respectively contain a copy of the current settings and the 'before' settings. Great!
Out of interest, using this query you can see what photos' develop settings take up most space:
SELECT ds.image, LENGTH(text) / 1024 AS sizeInKiB, rf.absolutePath || fo.pathFromRoot || fi.baseName || '.' || fi.extension AS fullName FROM Adobe_imageDevelopSettings AS ds INNER JOIN Adobe_images AS im ON im.id_local = ds.image INNER JOIN AgLibraryFile fi ON im.rootFile = fi.id_local INNER JOIN AgLibraryFolder fo ON fi.folder = fo.id_local INNER JOIN AgLibraryRootFolder rf ON fo.rootFolder = rf.id_local GROUP BY ds.image ORDER BY sizeInKiB DESC
Again, an example of what the output will look like:
image size fullName 304727 410 D:/Pictures/Adobe/2016/2016-01-11/hashru-bestuur-9.jpg 157877 298 D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8916.CR2 157091 189 D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8860.CR2 128589 186 D:/Pictures/Adobe/2013/2013-06-01 Description/IMG_7868.CR2 156533 103 D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8822.CR2 160062 103 D:/Pictures/Adobe/2014/2014-06-14 Phoenix Down/IMG_9057.CR2 156172 102 D:/Pictures/Adobe/2014/2014-06-13 Description/IMG_8795.CR2 135868 91 D:/Pictures/Adobe/2013/2013-08-14 Amsterdam/IMG_8204.CR2 157337 84 D:/Pictures/Adobe/2014/2014-06-14 Description/IMG_8877.CR2 258363 65 D:/Pictures/Adobe/2016/2016-02-21 Jiufen/IMG_2474.CR2 ...
Finding the biggest photos
As we've already established the combined sized of history states, we can easily find just the big ones by adding a HAVING
clause to the query we described earlier to get a list of photos whose edit history amounts to more than 10MB of data:
SELECT hi.image, COUNT(*) AS numStates, SUM(LENGTH(text)) / (1024 * 1024) AS sizeInMB, rf.absolutePath || fo.pathFromRoot || fi.baseName || '.' || fi.extension AS fullName FROM Adobe_libraryImageDevelopHistoryStep AS hi INNER JOIN Adobe_images AS im ON im.id_local = hi.image INNER JOIN AgLibraryFile fi ON im.rootFile = fi.id_local INNER JOIN AgLibraryFolder fo ON fi.folder = fo.id_local INNER JOIN AgLibraryRootFolder rf ON fo.rootFolder = rf.id_local GROUP BY hi.image HAVING sizeInMB >= 10 ORDER BY sizeInMB DESC
Without the fancy clauses to make the output human readable, this is what the query looks like:
SELECT hi.image FROM Adobe_libraryImageDevelopHistoryStep AS hi GROUP BY hi.image HAVING SUM(LENGTH(text)) / (1024 * 1024) >= 10
Deleting history states for the biggest photos
We can use the output of the previous query by wrapping it into a subquery. Trivially, this looks like:
SELECT image FROM Adobe_libraryImageDevelopHistoryStep WHERE image IN ( SELECT hi.image FROM Adobe_libraryImageDevelopHistoryStep AS hi GROUP BY hi.image HAVING SUM(LENGTH(text)) / (1024 * 1024) >= 10 )
If, on inspection, this looks good, we can get more serious by replacing the SELECT ... FROM
clause with a DELETE FROM
clause:
DELETE FROM Adobe_libraryImageDevelopHistoryStep WHERE image IN ( SELECT hi.image FROM Adobe_libraryImageDevelopHistoryStep AS hi GROUP BY hi.image HAVING SUM(LENGTH(text)) / (1024 * 1024) >= 10 )
This will delete the history states for all photos whose states amount to more than 10MB of data. Note: this will not delete the initial ('before') or final develop settings.
Finally, optimise the database to reclaim your disk space. You can do this through SQLite, but also by clicking 'Optimise Catalog' in Adobe Lightroom.
Using the method described here, my Lightroom catalog size decreased by 33%. Very satisfying!
Comments
Pretty impressive!
I went a step further and deleted the entire contents of Adobe_libraryImageDevelopHistoryStep reducing my catalog from over 7GB to 760MB!
Thanks :)
Comments closed
This blog post has been archived; it is currently not possible to comment.