Reading and Manipulating Tables in CSVs via petl

Last updated on August 29, 2023

petl is an ETL library for tables that beancount_reds_importers uses. It makes working with CSV files easy. Here is a walkthrough:

This example CSV importer uses petl to read a .csv into a table for manipulation. The output of this reader is a list of namedtuples corresponding roughly to OFX transactions. The following steps achieve this (based on my comments in the code):

read csv into petl table:
- rdr = etl.fromcsv(file.name)
skip header and footer rows (configurable)
- rdr = rdr.skip(self.skip_head_rows)
- rdr = rdr.head(len(rdr) - self.skip_tail_rows - 1)
prepare_raw_columns: an overridable method to help get the raw table in shape. As an example, the schwab importer does the following
- rdr.cutout('') # remove the last column, which is empty
- for rows with interest, the date column contains text such as: ‘11/16/2018 as of 11/15/2018’. We convert these into a regular parseable date: ‘11/16/2018’
```
def cleanup_date(d):
    """'11/16/2018 as of 11/15/2018' --> '11/16/2018'"""
    return d.split(' ', 1)[0]
rdr = rdr.convert('Date', cleanup_date)
```
- add a ‘tradeDate’ column, which is a copy of the ‘Date’ column, to correspond to the transaction builder fields
  - rdr = rdr.addfield('tradeDate', lambda x: x['Date'])
- add a ‘total’ column, which is a copy of the ‘Amount’ column, to correspond to the transaction builder fields
  - rdr = rdr.addfield('total', lambda x: x['Amount'])
rename columns: columns headers are renamed by declaring a dictionary, to standardize them to the labels that the transaction builder expects. For the included schwab importer, that looks like:
```
  self.header_map = {
  "Action":      'type',
  "Date":        'date',
  "tradeDate":   'tradeDate',
  "Description": 'memo',
  "Symbol":      'security',
  #etc.}
  rdr = rdr.rename(self.header_map)
```
Above, on the left are column names in the file as downloaded from the bank. On the right are standardized names as expected by the transaction builder.
convert_columns: this fixes up the actual data in each column. The base class does the following: map types to standard types. The standard types that the transaction builder uses are loosely based on OFX standards. For example, the schwab importer needs this mapping:
```
 self.transaction_type_map = {
     'Bank Interest':      'income',
     'Buy':                'buystock',
     'Cash Dividend':      'dividends',
     'MoneyLink Transfer': 'transfer',
     'Reinvest Dividend':  'dividends',
     'Reinvest Shares':    'buystock',
     'Sell':               'sellstock',
     }
 rdr = rdr.convert('type', self.transaction_type_map)
```
- numbers are parsed from string and converted into Decimal type. Non-numeric characters like ‘$’ are removed.
- dates are parsed and converted into datetime type.
table is now ready for use by the importer. petl makes each row available via petl’s namedtuples to the importer code.

Code link

Notes mentioning this note

File Format Readers

File format readers included in beancount_reds_importers are: