Windows/Linux Block Size
- When choosing block size, try to select rowsPerRead to yield ~10M elements in the block, or even less
- With 20 columns, rowsPerRead=500e3
- With 1000 cols, rowsPerRead=1000
- This tends to give a block size such that you can process multiple blocks per read
- Use blocksPerRead > 1
- The exact value depends on how much RAM you have available
- Generally having multiple blocks in memory simultaneously improves performance
- It is easy to increase blocksPerRead, but expensive to re-block, so err on the side of having smaller blocks
- If you use rxSplit() or rxDataStep() to create samples, e.g. training/validation, then use rxDataStep() to re-block according to the previous principle