Automatically Categorizing Postings
- Investment accounts have a deterministic, finite set of transaction types, and are easy to build correct and complete transaction postings for, during import
- For credit card and bank accounts, I use the awesome smart_importer. ledger-guesser seems to be a similar tool for ledger/hledger users
- In the past, I used to use Yodlee. Yodlee is an account aggregator like Mint, that downloads your credit card transactions from your bank, and uses transaction metadata and other means to categorize transactions. However, it caused more trouble than it was worth in the form of frequently missing transactions, missed updates, login problems, and incomplete or mangled transaction data. Most credit card importers are [[Actual Importer Code Samples|trivial to write]] anyway
Making smart_importer fast
smart_importer needs training data. Giving it 20 years of my entire ledger of all accounts for each import slows down the process since beancount must first be called on the source. Instead, since I use per-account ledger files, I only limit the transactions sent to smart_importer with relevant ones:
BEAN_SRC=$(bean-identify my.import $file | grep "^Account:" | sed 's/Account: *//' | sed 's#:#.#g')
BEAN_SRC="${INGEST_ROOT}/../source/${BEAN_SRC}.bc"
bean-extract my.import -f $BEAN_SRC $file
Optimizing performance is particularly valuable during importer development to keep the debug-fix-test loop fast.
During normal operations, I use the line below (in zsh; use your shell equivalent) to extract all training data, but untransformed by plugins. This both makes it faster and avoids transformations that plugins perform, which in most cases is not to be done on imported data:
bean-extract my.import -f <(echo 'plugin "beancount.plugins.auto_accounts"'; cat ${INGEST_ROOT}/../source/*) $file
Notes mentioning this note
The Five-Minute Ledger Update
Problem: Updating Your Ledger is a Pain! That’s right, updating your ledger with data from your financial institutions is the...