Quote:A slightly-less-ugly way to do it is to use the constant Float.MAX_VALUE as your "NaN", which is even less likely than 98.7654 to be part of your data set
Thanks. That's certainly nicer, though I was hoping I was missing something. Seems daft that assigning to a custom number is 100 times more efficient than checking isNaN.
Quote:Beyond that, if you really need to speed up the import process, your best bet is probably to write your own parsing/conversion method
I'd like to avoid this as I'm more than likely to make a mistake. The file import doesn't happen too often, so can be lived with, though makes initial load painful.
Quote:The standard conversion routines are written to handle numbers in scientific/engineering notation like "4.147e25". Are there any of those in your data set? If not, then you can write a routine that doesn't need to scan the string for occurrences of the letter e.
Unfortunately my data is unknown, and may well include engineering notation (indeed I was pleasantly surprised when I found it interpreting them).
Thanks for the advice.
Quote:Note that split(",|\\t") can be a culprit for your performance issue!
Don't think it's too bad. Running the same split assigned to String[] and not cast to float runs two orders of magnitude faster. As I don't know what my input data is, I'd like to catch as much as I can.
Quote:Also beware: you used Float[] pieces. That means each generated float will be wrapped in a Float object thanks (?) to auto-boxing.
I
hope this isn't happening. My rough order events:
- Am first declaring data[][] with length of input file and width of columns.
- Using float[] to populate an array of entries from a single row.
- Test each entry for isNaN. If false, data[0][a] = float[a]. If true, set data[0][a] = Float.MAX_VALUE.
- Repeat for each item in float[]
- Redeclare float[] for the next line of the source data.
- Repeat for all rows.
If I'm right, float[] should only ever exist for the period that the line is being parsed. After that it should be reused for the other lines.
---
Thanks both for input. Other thoughts most welcome.