Reading and Manipulating Tables in CSVs via petl
petl is an ETL library for tables that beancount_reds_importers uses. It makes working with CSV files easy. Here is a walkthrough:
This example CSV importer uses petl to read a .csv
into a table for manipulation. The
output of this reader is a list of namedtuples corresponding roughly to OFX
transactions. The following steps achieve this (based on my comments in the code):
- read csv into petl table:
rdr = etl.fromcsv(file.name)
- skip header and footer rows (configurable)
rdr = rdr.skip(self.skip_head_rows)
rdr = rdr.head(len(rdr) - self.skip_tail_rows - 1)
- prepare_raw_columns: an overridable method to help get the raw table in shape. As an example, the schwab importer does the following
rdr.cutout('') # remove the last column, which is empty
- for rows with interest, the date column contains text such as: ‘11/16/2018 as of 11/15/2018’. We convert these into a regular parseable date: ‘11/16/2018’
def cleanup_date(d): """'11/16/2018 as of 11/15/2018' --> '11/16/2018'""" return d.split(' ', 1)[0] rdr = rdr.convert('Date', cleanup_date)
- add a ‘tradeDate’ column, which is a copy of the ‘Date’ column, to correspond to the transaction builder fields
rdr = rdr.addfield('tradeDate', lambda x: x['Date'])
- add a ‘total’ column, which is a copy of the ‘Amount’ column, to correspond to the transaction builder fields
rdr = rdr.addfield('total', lambda x: x['Amount'])
- rename columns: columns headers are renamed by declaring a dictionary, to standardize
them to the labels that the transaction builder expects. For the included schwab
importer, that looks like:
self.header_map = { "Action": 'type', "Date": 'date', "tradeDate": 'tradeDate', "Description": 'memo', "Symbol": 'security', #etc.} rdr = rdr.rename(self.header_map)
Above, on the left are column names in the file as downloaded from the bank. On the right are standardized names as expected by the transaction builder.
- convert_columns: this fixes up the actual data in each column. The base
class does the following: map types to standard types. The standard types
that the transaction builder uses are loosely based on OFX standards. For
example, the schwab importer needs this mapping:
self.transaction_type_map = { 'Bank Interest': 'income', 'Buy': 'buystock', 'Cash Dividend': 'dividends', 'MoneyLink Transfer': 'transfer', 'Reinvest Dividend': 'dividends', 'Reinvest Shares': 'buystock', 'Sell': 'sellstock', } rdr = rdr.convert('type', self.transaction_type_map)
- numbers are parsed from string and converted into Decimal type. Non-numeric characters like ‘$’ are removed.
- dates are parsed and converted into
datetime
type.
- table is now ready for use by the importer.
petl
makes each row available via petl’snamedtuple
s to the importer code.