Introduction
Canon helps you understand and take control of digital assets spread across many drives, backups, and years.
The Problem
Over time, files accumulate across devices: old hard drives, backup folders, cloud downloads, phone exports. Finding what you have, identifying duplicates, and organizing everything into a coherent archive becomes overwhelming.
The Approach
Canon takes a methodical, incremental approach:
- Scan your devices to index files and compute content hashes
- Enrich with metadata extracted by external tools (EXIF, file types, etc.)
- Discover what you have using filters and queries
- Archive selected files to a canonical location, at your own pace
Each step is revisitable. You can scan new sources, add more metadata, refine your queries, and archive in small batches. Canon tracks what’s already archived, so you always know your progress.
Key Features
- Content-based deduplication: Files are identified by their hash, not location
- Flexible metadata: Import any key-value facts from external tools
- Powerful filtering: Query by any combination of facts using boolean expressions
- Safe archiving: Preview operations, validate integrity, and maintain audit trails
- Incremental workflow: Work at your own pace with full state persistence
Ready to get started? See Setup and Getting Started.
Setup
Installation
Install Canon from crates.io:
cargo install canon-archive
This installs the canon binary.
From Source
Alternatively, build from source:
git clone https://github.com/robklg/canon.git
cd canon
cargo install --path .
Database
Canon stores all state in a SQLite database. The default location is ~/.canon/canon.db.
You can override this with the --db flag:
canon --db /path/to/custom.db scan ...
The database is created automatically on first use. It contains:
- Registered roots and their scan state
- All indexed sources with metadata
- Content hashes and object references
- Imported facts from enrichment
Verify Installation
canon --help
You should see the list of available commands. You’re ready to start scanning your files.
Getting Started
This guide walks through a typical Canon workflow: scanning files, enriching with metadata, querying, and archiving.
Scanning
First, index your source files and existing archive:
# Add source roots (files you want to organize)
canon scan --add --role source /path/to/photos
canon scan --add --role source /path/to/backup-drive/photos
canon scan --add --role source --comment "Old backup, possibly duplicates" /Volumes/OldDrive
# Add an archive root (your organized destination)
canon scan --add --role archive /Volumes/Archive
By default, Canon computes content hashes during scanning. This enables deduplication and archive tracking.
Enriching
Use external tools to extract metadata. The example below uses exiftool to extract EXIF data including GPS-based geolocation:
canon worklist --where 'source.ext|lowercase IN (jpg, jpeg, heic, mov, mp4)' \
| ./scripts/exif-worklist.sh \
| canon import-facts
See Enriching for details on the worklist/import pipeline.
Querying
Discover what facts are available and explore your files:
# See all available facts
canon facts
# Check value distribution for a specific fact
canon facts --key content.geo.region # Where were photos taken?
canon facts --key "content.DateTimeOriginal|year" # Which years?
# List files matching filters
canon ls --where 'content.geo.city=Bletchley'
# Preview files (macOS)
canon ls -0 --where 'content.geo.city=Bletchley' | xargs -0 open -a Preview
Archiving
When you find a collection worth archiving, create a manifest:
canon cluster generate \
--where 'content.DateTimeOriginal|year=2023' \
--where 'content.geo.region="North Holland"' \
--dest /Volumes/Archive/Trips/2023-Amsterdam
This creates manifest.toml with the query parameters and a manifest.lock with matching sources.
Edit manifest.toml to customize the output pattern:
[output]
pattern = "{content.DateTimeOriginal|date}/{filename}"
base_dir = "/Volumes/Archive/Trips/2023-Amsterdam"
Preview and apply:
canon apply manifest.toml --dry-run # Preview what will happen
canon apply manifest.toml # Execute the copy
Files are copied to the archive with paths like:
/Volumes/Archive/Trips/2023-Amsterdam/2023-06-16/IMG_001.jpg
Next Steps
- Learn about Concepts to understand how Canon models your files
- Explore the full Commands reference
- See Filters for advanced query syntax
Concepts
Understanding these core concepts will help you use Canon effectively.
- Roots: Storage locations that Canon tracks
- Source: A file discovered on disk
- Object: Unique content identified by hash
- Source vs. Object: How files relate to content
- Facts: Metadata attached to sources or objects
Source
A source is a file discovered on disk during scanning. Canon tracks:
- Location: Root path + relative path within the root
- Identity: Device ID and inode for move detection
- Metadata: Size and modification time
- Integrity: Partial hash (first + last 8KB) for validation during transfers
- State: A
basis_revcounter that increments when size or mtime changes
Sources represent where files are found. Multiple sources can point to the same content (see Object) when files are duplicated across locations.
When a source is scanned with hashing enabled (the default), Canon computes its SHA-256 hash and links it to an object. This enables deduplication and archive tracking.
Exclusion
Sources can be marked as excluded to skip them during archiving. A source is considered excluded if:
- The source itself is marked excluded, OR
- The source’s linked object is marked excluded
This two-level check means that excluding an object effectively excludes all sources with that content. Object-level exclusion is useful when you want to skip content regardless of where it appears.
Object
An object represents unique content identified by its SHA-256 hash. Objects are content-addressed: two files with identical bytes will have the same hash and thus reference the same object.
Objects enable:
- Deduplication: Multiple sources can point to the same object
- Archive tracking: When content exists in an archive, all sources with that hash are marked as archived
- Fact sharing: Metadata attached to an object is available on all sources with that content
Objects are created automatically when sources are hashed during scanning or enrichment.
Source vs. Object
Understanding the relationship between sources and objects is key to how Canon handles deduplication and archive tracking.
Sources Are Locations
When a root is scanned, Canon indexes every file it finds as a source. Each source represents a specific file at a specific path.
Objects Are Content
When sources are hashed, Canon creates or links them to objects. An object represents the underlying content, independent of where it was found.
Source A: /backup1/photos/IMG_001.jpg ─┐
Source B: /backup2/old/IMG_001.jpg ─┼─► Object (hash: abc123...)
Source C: /downloads/photo.jpg ─┘
All three sources above have identical content, so they reference the same object.
Fact Sharing
When a source is linked to an object:
- Content facts (like EXIF metadata) can be stored on the object and become available to all sources with that hash
- Source facts (like file path) remain specific to each source
This allows metadata to flow between different copies of the same content. Import a fact once, and it’s available everywhere that content exists.
Archive Tracking
Canon uses the source-object relationship to track archiving progress:
- When you archive a file, Canon copies it to an archive root and records the object’s hash
- Any source with that same hash is now considered “archived”
- The
coveragecommand shows how many of your sources exist in an archive
Hashing
By default, Canon hashes all files during scanning. Since hashing can be time-consuming for large collections, you can:
- Use
--no-hashduring scan to skip hashing initially - Hash selectively via the enrichment pipeline, targeting specific file types
Unhashed sources cannot be linked to objects, so they cannot be deduplicated or tracked for archive coverage.
Facts
Facts are key-value metadata attached to sources or objects.
Types of Facts
Built-in facts are collected automatically during scanning:
source.ext- File extensionsource.size- File size in bytessource.mtime- Modification timestampcontent.hash.sha256- Content hash (when computed)
Imported facts come from external tools via the enrichment pipeline:
- EXIF metadata:
content.Make,content.Model,content.DateTimeOriginal - Geolocation:
content.geo.city,content.geo.country - Media info:
content.mime,content.duration - Any custom key-value pairs you choose to import
Namespaces
Facts are namespaced:
source.*- Facts about the file on disk (path, size, timestamps)content.*- Facts about the content itself (stored on objects when hashed)
When querying, the content. prefix is optional: --where 'Make=Apple' is equivalent to --where 'content.Make=Apple'.
Value Types
Canon stores facts as:
- Text: Strings like
"Apple"or"image/jpeg" - Numbers: Integers or decimals like
1024or3.14 - Timestamps: Unix timestamps, enabling date modifiers like
|yearand|month
Type hints can be provided during import to ensure correct parsing. See Enriching for details.
Roots
A root is a directory on a storage device that Canon tracks. Each root is identified by its absolute path and assigned a role.
Roles
Canon distinguishes two root roles:
Source roots contain assets you want to explore, reconcile, or archive. They may be unstructured, incomplete, or contain duplicates. Examples: old backup drives, phone exports, download folders.
Archive roots hold an intentional structure that you maintain. Files archived by Canon are placed here. Examples: your organized photo library, music collection, document archive.
Rules
- Roots may not overlap (one root cannot be inside another)
- A root can be any directory, not just a drive or mount point
- You can have multiple roots of each type
- Roots can be suspended to temporarily hide them from operations
Typical Setup
Source roots:
/Volumes/OldBackup (unorganized photos from 2015)
/Volumes/PhoneExport (recent phone backup)
~/Downloads/Photos (miscellaneous downloads)
Archive roots:
/Volumes/Archive/Photos (canonical photo library)
/Volumes/Archive/Music (canonical music library)
Canon Commands
Common Options
Most commands that operate on sources share these options:
Path scope — Limit a command to a specific directory by passing a path:
canon ls /path/to/photos
canon facts /path/to/photos
canon coverage /path/to/photos
Filters — Select sources using --where with boolean expressions:
canon ls --where 'source.ext=jpg'
canon facts --where 'source.size > 1000000'
canon cluster generate --where 'geo.country=Netherlands' --dest /archive
Multiple --where flags are combined with AND. See Filters for the full syntax.
Command Reference
- Managing Roots: Add and manage storage locations
- Enriching: Import metadata from external tools
- worklist: Output sources for external processing
- import-facts: Import processor output
- Writing Processors: Build custom extractors
- Querying: Explore your indexed files
- Managing Sources: Control which sources are processed
- exclude: Mark sources to skip during archiving
- Archiving: Organize files into your canonical archive
Managing Roots
To track files in Canon, first you add and scan roots. This makes these sources available for further enrichment or archive operations. You can suspend roots to temporarily mask them from Canon commands.
Adding new roots, or scanning existing is performed through the scan command.
Managing roots, such as suspending or listing them is done with canon roots.
Scan
Scan directories and index files.
When you scan a particular root, Canon will walk the directory tree starting at the given path(s). For each file, basic metadata such as last modification time and size is collected, and (by default) the hash is computed. After scanning, Canon knows about the existence of all sources in that root. If the files were hashed they will be linked to objects.
The hashing process can take quite long, so it is possible to skip that (--no-hash).
Not hashing is an option if your intention is to hash selectively, for instance: you’re only interested in certain types of files.
There is no real limit on how many roots you can add. It may be helpful to scan collections of files that belong together as separate roots. Each root can be given a comment, so this can help you recall what is contained, but you can also use this to store some notes about what you discovered in these roots.
If you have an already organized location that you want Canon to treat as your canonical archive, scan it with --role archive from the start. The role is set when the root is added; to change it, you must remove the root and re-add it with the new role.
You can add multiple archive roots, for instance one for your music collection and another for your eBooks.
When to run scan
If your filesystem changes regularly, make sure to re-scan your roots with Canon. That way Canon can detect change, and you will not miss files for archiving. Note that, when archiving, Canon always checks the validity of the files to be archived.
Another use case is periodic integrity verification of your archives. Use --verify to recompute hashes for all files and detect corruption. Canon exits with a non-zero status if any mismatches are found, making it suitable for cron jobs that alert on failure.
Examples
# Add a new root and scan it (--add and --role required for new roots)
canon scan --add --role source /path/to/photos
# Scan multiple new roots
canon scan --add --role source /path/to/photos /path/to/more/photos
# Add with a descriptive comment
canon scan --add --role source --comment "Photos from 2020 trip" /path/to/photos
# Add as an archive root (for tracking already-organized files)
canon scan --add --role archive /path/to/archive
# Re-scan an existing root (--role optional, validated against existing)
canon scan /path/to/photos
# Scan just a subtree within an existing root
canon scan /path/to/photos/2024
# Scan without computing hashes (just index files)
canon scan --no-hash /path/to/photos
# Verify archive integrity by recomputing all hashes (good for cron jobs)
canon scan --verify /Volumes/Archive
Hash computation: By default, Canon computes content hashes for new and changed files during scan. This enables deduplication and archive tracking. Use --no-hash to skip hashing if you just want to index files quickly.
Integrity verification: Use --verify to recompute hashes for all files, even unchanged ones. Run periodically (e.g., via cron) to detect file corruption. If a file’s hash changes without its mtime changing, Canon warns about possible corruption and exits with an error.
Discovering untracked directories: Use --candidates to find directories with files that aren’t yet under any root. This is useful when exploring a drive or backup to see what could be added:
# Find candidate roots to add under a path
canon scan --candidates /Volumes/Backup
# Output shows directories with untracked files
Candidate roots to add:
/Volumes/Backup/photos (3 directories with files)
/Volumes/Backup/imports (1 directory with files)
Directories under existing roots are skipped. When multiple subdirectories share a common ancestor that could be added as a single root, they’re rolled up (unless that ancestor contains an existing root).
Output shows what was found:
Scanned 1234 files: 100 new, 5 updated, 2 moved, 1127 unchanged, 0 missing
Hashed 105 files
canon roots
List and manage registered roots.
Roots are added via scan and managed with the roots command. You can list, suspend/unsuspend, add comments, or remove roots.
Important notes:
- Removing a root also removes its sources and attached facts from the database
- Removing a root does not delete any files on disk
- If you re-add a removed root, you’ll need to re-enrich it
# List all roots with file counts and last scan time
canon roots
# List roots at or beneath a specific path
canon roots /path/to/photos
# List only suspended roots
canon roots --suspended
# Set a comment on a root (omit text to clear)
canon roots comment id:1 "Old backup, possibly duplicates"
canon roots comment id:1
# Suspend a root (hides from all operations without deleting data)
canon roots suspend id:1
canon roots suspend path:/path/to/photos
# Unsuspend a root (make visible again)
canon roots unsuspend id:1
# Remove a root by ID (files on disk are NOT deleted)
canon roots rm id:1
# Remove a root by path
canon roots rm path:/path/to/photos
# Skip confirmation prompt
canon roots rm id:1 --yes
Example output:
ID ROLE FILES LAST SCAN PATH
1 source 16635 2h ago /path/to/photos
2 archive 169941 5d ago /path/to/archive
3 source 1234 never /path/to/backup (Old backup, possibly duplicates)
Suspending Roots
Suspended roots are hidden from listings, excluded from scan --all, and their sources are excluded from all queries (ls, facts, coverage, worklist, etc.). Suspended roots still prevent overlapping (you cannot add a new root at a suspended root’s path). Use --suspended to list only suspended roots.
Removing Roots
When removing a root, Canon shows how many sources are “in archive” (same content exists in an archive) vs “not in archive”, and suggests using canon ls <path> to preview which sources will be forgotten.
Root Specs
Several commands accept root specifications in two formats:
| Format | Example | Description |
|---|---|---|
id:N | id:1 | By database ID (shown in canon roots output) |
path:/... | path:/path/to/photos | By exact path |
canon roots suspend id:1
canon roots suspend path:/path/to/photos
Enriching
Add metadata to indexed files using external processors.
Canon uses a pipeline model: worklist outputs sources as JSONL, an external processor extracts metadata, then import-facts stores the results.
canon worklist → processor → canon import-facts
A processor can be any CLI tool or script that extracts information from files: exiftool for EXIF data, file for MIME types, ffprobe for media info, or custom scripts you write yourself.
Basic Usage
Extract EXIF metadata from images:
canon worklist --where 'source.ext|lowercase IN (jpg, jpeg, heic)' \
| ./scripts/exif-worklist.sh \
| canon import-facts
Note the --where filter: it’s usually smart to limit the worklist to files the processor can actually handle.
Detect MIME types for all files:
canon worklist | canonargs --fact mime -- file -b --mime-type {} | canon import-facts
After enrichment, the imported facts become available for filtering and querying.
Provided Processors
Canon includes ready-to-use processors:
| Processor | Purpose | Requires |
|---|---|---|
scripts/exif-worklist.sh | EXIF, GPS, and media metadata | exiftool, jq |
scripts/hash-worklist.sh | SHA-256 content hashes | jq |
canonargs --fact mime -- file -b --mime-type {} | MIME type detection | canonargs |
Install canonargs with: cargo install canonargs
Going Deeper
worklist- Full options for generating worklistsimport-facts- Input format and type hints- Writing Processors - Build your own enrichment scripts
Tip: Selective Hashing
Content hashing normally happens during scan. If you prefer to hash only specific file types, use --no-hash during scan and hash selectively via the pipeline:
canon scan --no-hash --add --role source /path/to/mixed-files
canon worklist --where 'mime~"image/*" OR mime~"video/*"' \
| ./scripts/hash-worklist.sh \
| canon import-facts
canon worklist
Output sources as JSONL for processing by external tools.
# All sources (from source roots only)
canon worklist
# Only sources missing a content hash
canon worklist --where 'NOT content.hash.sha256?'
# Only JPG files
canon worklist --where 'source.ext=jpg'
# Scope to a specific directory
canon worklist /path/to/photos
# Include sources from archive roots (for backfilling facts)
canon worklist --include-archived
# Include existing facts in output (for chained enrichment)
canon worklist --emit content.geo.lat --emit content.geo.lon
Output Format
Each line is a JSON object with source metadata:
{"source_id":123,"path":"/full/path/to/file.jpg","root_id":1,"size":1024,"mtime":1703980800,"basis_rev":0}
| Field | Description |
|---|---|
source_id | Database ID (pass through to import-facts) |
path | Full absolute path to the file |
root_id | ID of the root containing this source |
size | File size in bytes |
mtime | Modification time (Unix timestamp) |
basis_rev | Revision counter for staleness detection |
Emitting Existing Facts
With --emit, requested facts are included in the output (null if absent):
canon worklist --emit geo.lat --emit geo.lon
{"source_id":123,"path":"/...","basis_rev":0,"facts":{"geo.lat":52.37,"geo.lon":4.89}}
{"source_id":124,"path":"/...","basis_rev":0,"facts":{"geo.lat":null,"geo.lon":null}}
This enables processors to build on previous enrichment:
- Dependent enrichment: Use extracted coordinates to look up location names
- Fact combination: Merge data from multiple sources into derived facts
Example: reverse geocoding files that have coordinates but no city name:
canon worklist --emit geo.lat --emit geo.lon --where 'geo.lat? AND NOT geo.city?' \
| ./scripts/reverse-geocode.sh \
| canon import-facts
Staleness Detection
The worklist is a snapshot of sources at a point in time. Each entry includes basis_rev which tracks file changes. Processors should pass this through to import-facts, which will skip the import if the file changed since the worklist was generated.
The size and mtime fields allow processors to verify a file hasn’t changed before extracting facts.
canon import-facts
Import facts from JSONL on stdin. Designed to receive output from a processor that consumed a worklist.
canon worklist | some-processor | canon import-facts
# Allow importing facts for sources in archive roots
canon worklist --include-archived | some-processor | canon import-facts --allow-archived
Input Format
Each line must be a JSON object with source_id, basis_rev, and facts:
{"source_id":123,"basis_rev":0,"facts":{"hash.sha256":"abc123...","mime":"image/jpeg"}}
| Field | Description |
|---|---|
source_id | Source ID from the worklist (required) |
basis_rev | Revision from the worklist for staleness check (required) |
facts | Object mapping fact keys to values |
The processor must pass through source_id and basis_rev from the worklist entry. If basis_rev doesn’t match the source’s current value, the import is skipped (the file changed since the worklist was generated).
Fact Namespacing
Facts are automatically namespaced under content.*. For example, mime becomes content.mime.
The special key hash.sha256 creates or links an object, enabling deduplication and archive tracking.
Type Hints
Types matter. Canon stores facts as text, numbers, or timestamps. The type determines what operations work on a fact:
- Timestamps enable date modifiers (
|year,|month,|date) and date comparisons (>=2024-01-01) - Numbers enable numeric comparisons (
>1000,<=5.0) and the|bucketmodifier - Text enables string matching (
=,~glob) and string modifiers (|lowercase,|stem)
If a datetime like "2024:07:23 11:06:32" is stored as text instead of a timestamp, queries like --where 'DateTimeOriginal|year=2024' won’t work—the modifier expects a timestamp, not a string.
Providing Type Hints
Wrap values in an object with value and type:
{"source_id":123,"basis_rev":0,"facts":{
"DateTimeOriginal": {"value": "2024:07:23 11:06:32", "type": "datetime"},
"duration": {"value": "1:23:45", "type": "duration"},
"rating": 5
}}
| Type | Parses | Stored As |
|---|---|---|
datetime | ISO dates, EXIF format, plain years (2024) | Unix timestamp |
duration | "1:23:45", "5:30", or seconds as number | Seconds (number) |
| (none) | Strings as text, numbers as numbers | As-is |
Common Pitfalls
Dates as strings: EXIF dates from tools like exiftool come as strings ("2024:07:23 11:06:32"). Without a type hint, they’re stored as text and time modifiers won’t work. Always use "type": "datetime" for date fields.
Mixed types: A fact key must have a consistent type across all sources. You cannot store DateTimeOriginal as text for some files and as a timestamp for others. If you initially imported facts with the wrong type and need to re-import with the correct type, first delete the existing entries:
# Delete all DateTimeOriginal facts that were stored as text
canon facts delete --key content.DateTimeOriginal --type text
Then re-run your processor with proper type hints.
Archive Sources
By default, importing facts for sources in archive roots is skipped. Use --allow-archived to enable this (useful for backfilling metadata on already-archived files).
Writing Processors
Processors are scripts or programs that read worklist entries, extract metadata from files, and output facts for import.
Input and Output
A processor reads JSONL from worklist and writes JSONL for import-facts.
Input (from worklist):
{"source_id":123,"path":"/photos/IMG_001.jpg","basis_rev":0,"size":1024,"mtime":1703980800}
Output (for import-facts):
{"source_id":123,"basis_rev":0,"facts":{"Make":"Apple","Model":"iPhone 12"}}
The processor must pass through source_id and basis_rev unchanged.
Custom Processors
Read JSONL from stdin, extract facts from each file, output JSONL to stdout:
#!/bin/bash
while IFS= read -r line; do
source_id=$(echo "$line" | jq -r '.source_id')
basis_rev=$(echo "$line" | jq -r '.basis_rev')
path=$(echo "$line" | jq -r '.path')
# Extract facts (example: EXIF data)
facts=$(exiftool -json -Make -Model "$path" 2>/dev/null | jq '.[0]')
jq -nc \
--argjson source_id "$source_id" \
--argjson basis_rev "$basis_rev" \
--argjson facts "$facts" \
'{source_id: $source_id, basis_rev: $basis_rev, facts: $facts}'
done
The canonargs Helper
If you don’t want to handle JSONL parsing and output formatting yourself, canonargs takes care of that. You only provide a command that extracts data from a single file.
Installation
cargo install canonargs
Single Fact Mode
When your command outputs a single value:
canon worklist | canonargs --fact mime -- file -b --mime-type {} | canon import-facts
The {} is replaced with the file path. The command’s stdout becomes the fact value.
Default behavior: Values are stored as text. To specify a type, add --type:
# Store as datetime (enables |year, |month modifiers)
canon worklist | canonargs --fact DateTimeOriginal --type datetime -- exiftool -DateTimeOriginal -s3 {} | canon import-facts
# Store image width as number (using ImageMagick's identify)
canon worklist | canonargs --fact width --type number -- identify -format '%w' {} | canon import-facts
Valid types: datetime, duration, number
Key-Value Mode
When your command outputs key=value pairs (one per line):
canon worklist | canonargs --kv -- my-extractor {} | canon import-facts
Default behavior: All values are stored as text. To specify types, use key:type=value syntax:
width:number=1920
height:number=1080
DateTimeOriginal:datetime=2024:07:23 14:30:00
codec=h264
JSON Mode
When your command outputs a JSON object:
canon worklist | canonargs --json -- exiftool -json {} | canon import-facts
Example extractor output:
{"Make": "Apple", "Model": "iPhone 12", "DateTimeOriginal": "2024:07:23 14:30:00"}
JSON mode auto-detects numbers. If your command outputs "width": 1920 (a JSON number), it’s stored as a number. If it outputs "width": "1920" (a quoted string), it’s stored as text.
For datetime fields, you still need to use the typed hint format:
{"DateTimeOriginal": {"value": "2024:07:23 14:30:00", "type": "datetime"}}
Chaining
Processors can be chained since canonargs passes through the worklist entry and merges facts:
canon worklist \
| canonargs --fact mime -- file -b --mime-type {} \
| canonargs --json -- exiftool -json {} \
| canon import-facts
Using Existing Facts
Processors can access previously imported facts via the --emit flag on worklist. See Emitting Existing Facts for details.
Type Hints
Important: The type of a fact determines what operations work on it:
- Timestamps enable
|year,|monthmodifiers and date comparisons (>=2024-01-01) - Numbers enable numeric comparisons (
>1000) and|bucketmodifier - Text enables string matching and
|lowercase,|stemmodifiers
If your processor outputs dates as strings or numbers as strings, add type hints:
{"source_id":123,"basis_rev":0,"facts":{
"DateTimeOriginal": {"value": "2024:07:23 11:06:32", "type": "datetime"},
"duration": {"value": "1:23:45", "type": "duration"},
"width": 1920
}}
Without "type": "datetime", a date string like "2024:07:23 11:06:32" is stored as text and --where 'DateTimeOriginal|year=2024' won’t work.
Numbers from JSON are automatically stored as numbers. But if your extractor outputs "width": "1920" (a string), numeric comparisons like --where 'width>1000' won’t work as expected.
See import-facts for full details.
Tips
- Always pass through
source_idandbasis_revunchanged - Use
jq -cfor compact JSON output (one object per line) - Handle errors gracefully—skip files that can’t be processed
- Use type hints for datetime fields so modifiers work correctly
- Ensure numbers are actual JSON numbers, not quoted strings
Querying
After scanning and enriching, you can explore your indexed files.
ls- List sources matching filter expressionsfacts- Discover available facts and check coveragecompare- Compare directories to find overlap
All query commands support path scoping (limit to a subdirectory) and --where filters.
canon ls
List sources matching filters. Useful for quick inspection and piping to other tools.
# List all sources in current directory
canon ls .
# List sources matching a filter
canon ls --where 'source.ext=jpg'
# Filter by source ID
canon ls --where 'source.id=12345'
# List only archived sources (content exists in an archive)
canon ls --archived
# List archived sources with their archive location(s)
# Output: source_path<TAB>archive_path (one line per archive location)
canon ls --archived=show
# List only unarchived sources (hashed but not in any archive)
canon ls --unarchived
# List only unhashed sources (no content hash yet)
canon ls --unhashed
# Show duplicate files (same content hash), grouped by hash
canon ls --duplicates
# Include sources from archive roots (automatic when scope is in an archive)
canon ls --include-archived
# Include excluded sources
canon ls --include-excluded
# Long format with size and date
canon ls -l
# Null-delimited output for xargs (handles spaces in paths, macOS)
canon ls -0 --where 'source.ext=jpg' | xargs -0 open -a Preview
Path display:
- Relative path input (
.,subdir) → relative output paths - Absolute path input (
/path/to/dir) → absolute output paths
Output is one path per line (stdout), with a count printed to stderr:
vacation/img001.jpg
vacation/img002.jpg
work/doc.pdf
3 sources
canon facts
Discover what metadata you have and check coverage.
# Overview of all facts (source roots only by default)
canon facts
# Scoped to a directory
canon facts /path/to/photos
# With filters
canon facts --where 'source.ext=jpg'
# Value distribution for a specific fact
canon facts --key content.Make
# With modifiers: group mtime by year-month
canon facts --key source.mtime|yearmonth
# With accessors: distribution by top-level directory
canon facts --key source.rel_path[0]
# Combine accessor and modifier: distribution by filename extension
canon facts --key source.rel_path[-1]|ext
# Show hidden built-in facts
canon facts --all
# Unlimited results (default is 50)
canon facts --key content.hash.sha256 --limit 0
# Include sources from archive roots
canon facts --include-archived
# Group by root (see which roots contribute to each value)
canon facts --key source.ext --by-root
# Group by any fact key (with modifiers)
canon facts --key source.ext --group-by 'source.mtime|year'
# Compound grouping (root + another fact)
canon facts --key source.ext --by-root --group-by 'content.Make'
Example output:
Sources matching filters: 34692
Fact Count Coverage
────────────────────────────────────────────────────
source.ext 34692 100.0% (built-in)
source.size 34692 100.0% (built-in)
source.mtime 34692 100.0% (built-in)
source.path 34692 100.0% (built-in)
content.hash.sha256 34692 100.0%
content.mime 34692 100.0%
content.Model 7935 22.9%
content.Make 7935 22.9%
...
Example grouped output (--by-root):
source.ext (by root)
jpg (total: 12,500, 36.0%)
id:1 ...stack/Backup/Pictures 8,000 64.0%
id:2 ...castor-import/gringo 4,500 36.0%
png (total: 8,200, 23.6%)
id:1 ...stack/Backup/Pictures 5,000 61.0%
id:3 ...castor-import/hydra 3,200 39.0%
canon facts delete
Delete facts by key. Useful for removing incorrect or unwanted metadata.
# Preview deletion (dry-run by default)
canon facts delete content.mime --on object
canon facts delete content.Make --on source /path/to/photos --where 'source.ext=jpg'
# Execute deletion
canon facts delete content.mime --on object --yes
--on sourceor--on objectis required to specify entity type- Protected namespaces (
source.*) cannot be deleted - Dry-run by default; use
--yesto execute
canon prune
Clean up orphaned or stale data from the database.
# Preview stale facts (file changed since fact was recorded)
canon prune --stale-facts
# Preview orphaned objects (no present sources reference them)
canon prune --orphaned-objects
# Preview facts for excluded sources/objects
canon prune --excluded-facts
canon prune --excluded-facts=source # Only source facts
canon prune --excluded-facts=object # Only object facts
# Execute deletion
canon prune --stale-facts --yes
canon prune --orphaned-objects --yes
canon prune --excluded-facts --yes
Stale facts are those where observed_basis_rev no longer matches the source’s current basis_rev (meaning the file was modified after the fact was imported).
Orphaned objects are content entries with no remaining present sources. This can happen when files are deleted. You may want to keep them as a historical record, or delete them to clean up the database.
Excluded facts are metadata for sources or objects you’ve marked as excluded. Since you’ve decided not to archive them, you may want to remove their facts to free up database space.
canon compare
Compare two folders by content hash. Useful for verifying backups or finding differences between directories.
# Compare two directories
canon compare /path/to/folder_a /path/to/folder_b
# With filters
canon compare /path/to/folder_a /path/to/folder_b --where 'source.ext=jpg'
# Show file paths for differences
canon compare /path/to/folder_a /path/to/folder_b --verbose
Output shows:
- Files only in A (by content)
- Files only in B (by content)
- Files in both (matching content hash)
Exit code is 0 if identical, 1 if differences found.
Managing Sources
After scanning and enriching, you may want to control which sources are included in archiving operations.
The exclude command lets you mark sources to skip during cluster generate and apply. This is useful for:
- Ignoring temporary or system files
- Skipping known duplicates while keeping a preferred copy
- Filtering out small files below a size threshold
- Removing unwanted files from consideration without deleting them
Exclusions are stored directly on sources and can be cleared at any time.
canon exclude
Manage source exclusions. Excluded sources are skipped by most commands.
# Mark sources as excluded (e.g., small files, temp files)
canon exclude set --where 'source.size<1000'
canon exclude set /path/to/photos --where 'source.ext=tmp'
# Exclude a specific file by path
canon exclude set /path/to/photos/unwanted.jpg
# Exclude by source ID (shown in ls --duplicates output)
canon exclude set --id 12345
# Preview what would be excluded
canon exclude set --where 'source.ext=bak' --dry-run
# List currently excluded sources
canon exclude list
canon exclude list /path/to/photos
# Remove exclusions
canon exclude clear
canon exclude clear --where 'source.ext=tmp'
# Preview what would be cleared
canon exclude clear --where 'source.ext=tmp' --dry-run
canon exclude duplicates
Automatically exclude duplicate files while keeping copies in a preferred location.
# Exclude duplicates, keeping files under /preferred/path
canon exclude duplicates /scope/path --prefer /preferred/path
# Preview what would be excluded
canon exclude duplicates /scope/path --prefer /preferred/path --dry-run
# With filters
canon exclude duplicates /scope/path --prefer /preferred/path --where 'source.ext=jpg'
This is useful for deduplicating across backup drives while keeping the “canonical” copy in your preferred location.
How exclusions affect other commands:
| Command | Default behavior | Override |
|---|---|---|
worklist | Skips excluded | --include-excluded |
facts | Skips excluded, shows count | --include-excluded |
coverage | Stats on included only | --include-excluded shows excluded dimension |
cluster generate | Always skips excluded | No override (hard gate) |
apply | Blocks if manifest has excluded | No override (hard gate) |
Exclusions are stored directly on sources and objects in the database.
Archiving
When you find a collection of files to archive, Canon uses a two-step process:
- Generate a manifest with
cluster- select files and define the destination - Apply the manifest with
apply- copy or move files to the archive
This workflow lets you review and customize the output before committing to any file operations.
coverage- Check how much has been archivedcluster- Generate a manifest for a set of filesapply- Execute the manifest to copy/move files
canon coverage
Show archive coverage statistics - how many sources are hashed and how many are archived.
# Overview of all source roots
canon coverage
# Scoped to a specific directory
canon coverage /path/to/photos
# With filters
canon coverage --where 'source.ext=jpg'
# Coverage relative to a specific archive root
canon coverage --archive id:1
canon coverage --archive path:/path/to/archive
# Include archive roots in analysis
canon coverage --include-archived
Example output:
Archive Coverage Report
Root: /path/to/backup1 (source)
Total sources: 1,234
Hashed: 1,100 (89.1%)
Archived: 850 (77.3% of hashed)
Unarchived: 250
Root: /path/to/backup2 (source)
Total sources: 567
Hashed: 500 (88.2%)
Archived: 400 (80.0% of hashed)
Unarchived: 100
────────────────────────────────────────
Overall:
Total sources: 1,801
Hashed: 1,600 (88.8%)
Archived: 1,250 (78.1% of hashed)
Unarchived: 350
- Hashed: Sources with a content hash (ready for archiving)
- Archived: Sources whose content exists in an archive root
- With
--archive: Shows “In this archive” vs “Not in archive” for that specific archive
canon cluster generate
Generate a manifest of files matching filters. The --dest flag specifies where files will be copied and must be inside a registered archive root.
# All photos to an archive (unhashed sources are automatically skipped)
canon cluster generate --where 'source.ext IN (jpg, png, heic)' --dest /Volumes/Archive/Photos
# Destination can be a subdirectory within an archive
canon cluster generate --where 'source.ext IN (jpg, png, heic)' --dest /Volumes/Archive/Photos/2024
# Scope to a specific path
canon cluster generate /path/to/photos --dest /Volumes/Archive
# Custom output file
canon cluster generate --where 'source.ext=jpg' --dest /Volumes/Archive -o my-manifest.toml
# Include sources from archive roots
canon cluster generate --where 'source.ext=jpg' --dest /Volumes/Archive --include-archived
# Show which files were excluded (already archived)
canon cluster generate --where 'source.ext=jpg' --dest /Volumes/Archive --show-archived
# Overwrite existing manifest file
canon cluster generate --where 'source.ext=jpg' --dest /Volumes/Archive --force
The command generates two files: a manifest (.toml) that you edit, and a lock file (.lock) containing the source list.
Typical workflow:
canon cluster generate --where 'source.ext IN (jpg, png, heic)' --dest /Volumes/Archive
# Edit manifest.toml to customize the output pattern
canon apply manifest.toml --dry-run # Preview
canon apply manifest.toml # Execute
Manifest structure:
The generated manifest includes helpful comments listing all available pattern variables, modifiers, and aliases based on the facts present in your sources:
# Available facts for pattern (100% coverage on 1234 sources):
#
# Built-in:
# filename text - Filename (last path component)
# source.ext text - File extension
# source.mtime time - Modification time
# ...
#
# Content facts:
# content.Make text
# content.Model text
# ...
#
# Modifiers:
# Time: |year |month |day |date ...
# String: |stem |ext |lowercase ...
[output]
pattern = "{filename}" # ← Edit this to customize organization
base_dir = "/Volumes/Archive"
archive_root_id = 2
Common output patterns:
# Flat (default) - all files in base_dir
pattern = "{filename}"
# Preserve original folder structure (relocate as-is)
pattern = "{source.rel_path}"
# By EXIF date
pattern = "{content.DateTimeOriginal|year}/{content.DateTimeOriginal|month}/{filename}"
# By EXIF date with hash prefix (avoids collisions)
pattern = "{content.DateTimeOriginal|year}/{content.DateTimeOriginal|month}/{hash_short}_{filename}"
# By camera model
pattern = "{content.Make}/{content.Model}/{filename}"
# By file type
pattern = "{source.ext}/{filename}"
See Pattern Expressions for the full syntax reference, including modifiers, path accessors, and aliases.
Refreshing the Lock File
Use canon cluster refresh to update the lock file if sources have changed since the manifest was generated:
# Re-query and update the lock file
canon cluster refresh manifest.toml
This re-runs the manifest’s query and updates manifest.lock with the current matching sources. The manifest settings remain unchanged.
canon apply
Apply a manifest to copy/move files. Copied files are automatically registered in the database with the same content hash, so they’re immediately recognized as archived (no separate scan needed).
# Preview what would happen (fast - skips source existence checks)
canon apply manifest.toml --dry-run
# Copy files (default mode, preserves mtime/permissions on Unix)
canon apply manifest.toml
# Show per-file progress during transfer
canon apply manifest.toml --verbose
# Rename files instead of copying (Unix only, fails on cross-device)
canon apply manifest.toml --rename
# Move files: rename if same device, copy+delete if cross-device
canon apply manifest.toml --move --yes
# Only apply sources from specific roots
canon apply manifest.toml --root id:1 --root id:2
canon apply manifest.toml --root path:/path/to/source
# Allow duplicates across archives (but not within destination)
canon apply manifest.toml --allow-cross-archive-duplicates
Transfer modes:
| Flag | Behavior |
|---|---|
| (default) | Copy + preserve mtime/permissions (Unix) |
--rename | Atomic rename; fails if cross-device (Unix only) |
--move | Try rename; fallback to copy+delete on cross-device (Unix only, requires --yes) |
All modes use noclobber semantics: if a destination file exists, apply aborts with an error.
Integrity validation:
During transfer, Canon validates each source file’s partial hash (first 8KB + last 8KB) to detect file corruption or modification since the manifest was generated. If validation fails, the transfer is aborted.
Root filtering:
Use --root to apply only a subset of sources from the manifest. Useful for staged application when sources are on different drives.
--root id:N- Filter by root ID (shown in manifest asroot_id)--root path:/path- Filter by root path (must match exactly)
Pre-flight checks (mandatory):
-
Destination collisions - If multiple sources would map to the same destination path (e.g., using
{filename}when sources have duplicate names), apply aborts with an error showing which files conflict. -
Archive conflicts - Checks if files already exist in the destination archive or other archives.
-
Excluded sources - Blocks if any sources in the manifest are marked as excluded.
Edit the manifest’s [output] section to customize the destination:
[output]
pattern = "{content.DateTimeOriginal|year}/{content.DateTimeOriginal|month}/{filename}"
base_dir = "/path/to/archive"
Pattern variables use fact keys with optional modifiers (see Pattern Expressions for the full syntax):
{filename},{stem},{ext}- Filename aliases{hash},{hash_short}- Content hash aliases{source.mtime|year},{source.mtime|month}- File modification date{content.DateTimeOriginal|year}- EXIF date with modifier{content.Make},{content.Model}- Any fact key
Facts Reference
Facts are key-value metadata. See Concepts: Facts for an overview.
Namespaces
| Namespace | Description |
|---|---|
source.* | Facts about the file on disk (path, size, mtime) |
content.* | Facts about the content (hash, EXIF, mime type) |
object.* | Object-level properties |
The content. prefix is optional when querying. For example, Make=Apple is equivalent to content.Make=Apple.
Values
Facts can hold three value types:
| Type | Examples | Notes |
|---|---|---|
| Text | "Apple", "image/jpeg" | Strings; quote if contains spaces |
| Number | 1024, 3.14, -5 | Integers or decimals |
| Timestamp | 1704067200 | Unix timestamps; enable date modifiers |
Modifiers
Transform values using | syntax:
Time Modifiers
For timestamp values (like source.mtime or EXIF dates):
| Modifier | Output | Example |
|---|---|---|
year | 4-digit year | 2024 |
month | 2-digit month | 07 |
day | 2-digit day | 23 |
hour | 2-digit hour (24h) | 14 |
minute | 2-digit minute | 30 |
second | 2-digit second | 45 |
date | ISO date | 2024-07-23 |
time | ISO time | 14:30:45 |
datetime | ISO datetime | 2024-07-23T14:30:45 |
yearmonth | Year-month | 2024-07 |
week | ISO week number | 30 |
weekday | Day of week (Mon=1) | 2 |
quarter | Quarter (1-4) | 3 |
String Modifiers
| Modifier | Description | Example |
|---|---|---|
lowercase | Convert to lowercase | JPG → jpg |
uppercase | Convert to uppercase | jpg → JPG |
capitalize | Capitalize first letter | apple → Apple |
stem | Filename without extension | photo.jpg → photo |
ext | File extension | photo.jpg → jpg |
short | First 8 characters | abc123def456 → abc123de |
Numeric Modifiers
| Modifier | Description |
|---|---|
bucket | Group into ranges (1-10, 10-100, etc.) |
bucket(a,b,c) | Custom ranges (<a, a-b, b-c, >c) |
Example: source.size|bucket groups file sizes into human-readable ranges.
Path Accessors
Python-style indexing for path values:
| Syntax | Meaning |
|---|---|
key[-1] | Last segment (filename) |
key[0] | First segment |
key[1:3] | Slice segments 1 and 2 |
key[:-1] | All but last segment |
Accessors can be combined with modifiers:
source.rel_path[-1] → IMG_001.jpg
source.rel_path[-1]|stem → IMG_001
source.rel_path[0] → photos
Pruning Facts
The canon prune command can delete facts to free database space.
Excluded Entity Facts
Delete facts for sources or objects you’ve excluded:
# Dry-run: show what would be deleted (default)
canon prune --excluded-facts
# Delete facts for both excluded sources and objects
canon prune --excluded-facts --yes
# Delete only source facts (excluded sources)
canon prune --excluded-facts=source --yes
# Delete only object facts (excluded objects)
canon prune --excluded-facts=object --yes
This is useful when you’ve excluded sources/objects you’re not interested in archiving and want to reclaim the database space used by their metadata.
Other Prune Options
| Flag | Description |
|---|---|
--stale-facts | Delete source facts where the file changed since recording |
--orphaned-objects | Delete objects with no present sources (and their facts) |
All prune operations are dry-run by default. Add --yes to execute.
See Also
- Built-in Facts - Complete list of automatic facts
- Filters - Using facts in queries
- Pattern Expressions - Using facts in archive patterns
Built-in Facts Reference
These facts are automatically available for all sources without enrichment.
Source Facts
| Fact | Type | Description |
|---|---|---|
source.id | num | Database ID (hidden*) |
source.ext | text | File extension (lowercase, no dot) |
source.size | num | File size in bytes |
source.mtime | time | Modification timestamp |
source.path | path | Full absolute path |
source.root | path | Root directory path (hidden) |
source.rel_path | path | Path relative to root (hidden) |
source.device | num | Device ID (hidden) |
source.inode | num | Inode number (hidden) |
Content Facts
| Fact | Type | Description |
|---|---|---|
content.hash.sha256 | text | SHA-256 content hash |
Pattern Aliases
These aliases are available in pattern expressions:
| Alias | Expands To |
|---|---|
filename | source.rel_path[-1] |
stem | source.rel_path[-1]|stem |
ext | source.rel_path[-1]|ext |
hash | content.hash.sha256 |
hash_short | content.hash.sha256|short |
id | source.id |
*Hidden facts are not shown in canon facts by default. Use --all to include them.
Filter Syntax
Filters select sources based on facts using a boolean expression language. Most commands accept --where to filter which sources they operate on. Multiple --where flags are combined with AND.
Operators
Basic
| Syntax | Meaning |
|---|---|
key? | Fact exists |
key=value | Fact equals value (case-sensitive) |
key!=value | Fact doesn’t equal value (case-sensitive) |
key~pattern | Glob pattern match (case-sensitive) |
key!~pattern | Glob pattern doesn’t match |
key>value | Greater than (numbers/dates) |
key>=value | Greater or equal |
key<value | Less than |
key<=value | Less or equal |
key IN (v1, v2, ...) | Fact matches any value in list |
key NOT IN (v1, v2, ...) | Fact doesn’t match any value in list |
Glob Patterns
The ~ operator supports shell-style glob patterns:
| Pattern | Meaning |
|---|---|
* | Match zero or more characters |
? | Match exactly one character |
[abc] | Match any character in set |
[a-z] | Match character range |
[!abc] | Match any character NOT in set |
\* | Literal asterisk (escape) |
# Files starting with IMG_
--where 'filename~"IMG_*"'
# Files with 3-letter extension
--where 'source.ext~"???"'
# Files in a year subdirectory
--where 'source.rel_path~"*/2024/*"'
# Exclude temp files
--where 'filename!~"*.tmp"'
Boolean Operators
| Syntax | Meaning |
|---|---|
expr AND expr | Both conditions must match |
expr OR expr | Either condition matches |
NOT expr | Negates the condition |
(expr) | Grouping for precedence |
Operator precedence (highest to lowest): NOT, AND, OR. Use parentheses to override.
Using Modifiers
Modifiers can be applied to fact keys using the | syntax. See Facts for the complete list.
# Files from 2024
--where 'source.mtime|year=2024'
# January photos
--where 'content.DateTimeOriginal|month=1'
# Case-insensitive extension matching
--where 'source.ext|lowercase=jpg'
# Case-insensitive glob
--where 'filename|lowercase~"img_*"'
Examples
# Files with a content hash
--where 'content.hash.sha256?'
# Files missing a content hash
--where 'NOT content.hash.sha256?'
# JPG files only
--where 'source.ext=jpg'
# JPG or PNG files
--where 'source.ext=jpg OR source.ext=png'
# Common image formats
--where 'source.ext IN (jpg, png, gif, webp)'
# Exclude certain extensions
--where 'source.ext NOT IN (tmp, bak, log)'
# Not temporary files
--where 'NOT source.ext=tmp'
# iPhone photos (content. prefix is optional)
--where 'Make=Apple'
# Files larger than 1MB
--where 'source.size>1000000'
# Files modified in 2024 or later
--where 'source.mtime>=2024-01-01'
# Large images (combining with parentheses)
--where '(source.ext=jpg OR source.ext=png) AND source.size>1000000'
# Multiple --where flags combine with AND
--where 'source.ext=jpg' --where 'content.Make=Apple'
Pattern Expressions
Pattern expressions define how files are organized in archives. They use {expr} syntax to insert dynamic values based on facts.
Patterns are used in the pattern field of cluster manifests. When you run canon cluster generate, it creates a manifest with a default pattern = "{filename}" that you can customize.
Basic Syntax
Patterns consist of literal path segments and expressions in curly braces:
{content.DateTimeOriginal|year}/{content.DateTimeOriginal|month}/{filename}
This would produce paths like: 2024/07/IMG_001.jpg
Fact Keys
Any fact key can be used in a pattern:
{source.ext}- File extension{source.mtime}- Modification time{content.Make}- Camera manufacturer (from EXIF){content.hash.sha256}- Content hash
The content. prefix is optional for content facts, so {Make} is equivalent to {content.Make}.
Modifiers
Transform values using the | syntax. See Facts for the complete list.
{source.mtime|year} → 2024
{source.mtime|yearmonth} → 2024-07
{content.hash.sha256|short} → a1b2c3d4
{source.ext|uppercase} → JPG
Multiple modifiers can be chained:
{filename|stem|lowercase} → img_001
Path Accessors
Extract segments from path values using Python-style indexing:
| Syntax | Meaning |
|---|---|
key[-1] | Last segment (filename) |
key[0] | First segment |
key[1:3] | Slice segments 1 and 2 |
key[:-1] | All but last segment |
Examples with source.rel_path = "photos/2024/vacation/IMG_001.jpg":
{source.rel_path[-1]} → IMG_001.jpg
{source.rel_path[0]} → photos
{source.rel_path[1:-1]} → 2024/vacation
{source.rel_path[-1]|stem} → IMG_001
Aliases
Aliases provide shorthand for common expressions. Use canon facts --show-aliases to see all available aliases.
| Alias | Expands To |
|---|---|
filename | source.rel_path[-1] |
stem | source.rel_path[-1]|stem |
ext | source.rel_path[-1]|ext |
hash | content.hash.sha256 |
hash_short | content.hash.sha256|short |
id | source.id |
Example using aliases:
{hash_short}_{filename} → a1b2c3d4_IMG_001.jpg
Missing Values
Canon requires all facts used in a pattern to have values for every source. If any source is missing a required fact, canon apply will refuse to proceed and report which facts are missing.
When you run canon cluster generate, the manifest includes comments listing all facts with 100% coverage—these are safe to use in your pattern.
If sources are missing required facts, you can:
- Filter them out during generation:
--where 'DateTimeOriginal?' - Import the missing facts via the enrichment pipeline
Common Patterns
# Flat (all files in one directory)
pattern = "{filename}"
# Preserve original structure
pattern = "{source.rel_path}"
# By EXIF capture date
pattern = "{content.DateTimeOriginal|year}/{content.DateTimeOriginal|month}/{filename}"
# By date with hash prefix (collision-safe)
pattern = "{content.DateTimeOriginal|date}/{hash_short}_{filename}"
# By camera
pattern = "{content.Make}/{content.Model}/{filename}"
# By file type and year
pattern = "{source.ext}/{source.mtime|year}/{filename}"