Hello Processing types, I have a query about my sketch. I am reading data from a text file and creating from it a new text file of comma separated values. The input files are about 10MB and the output files about 50MB. The extra size comes from filling gaps in the data with zero values. Ultimately I want to do maths operations with two files, subtracting one from another to determine differences. For this purpose I adapted my sketch to put the data into arrays so instead of a text file containing columns of three CSV values I would have three arrays of similar length. However, the code below is now 100x slower to run than without the array writing code. Can someone suggest why this is and hopefully a way of speeding the code up? Instead of taking 20 seconds to run its now been about 20 minutes and its barely 20% through.
Apologies for the crudeness of my code. I am very much an enthusiastic amateur at this game.
- /*
- This code takes a text file of LCMS data produced by MSConvertGUI and produces another text
- file in the same folder containing three colummns of CSV data which hold time (seconds), m/z
- and signal of the LCMS TIC chromatogram.
- */
- import java.io.File;
- PrintWriter output;
- String folderPath; // folder path where logfiles will be saved
- int count;
- String time;
- String mz;
- String signal;
- int index = 0;
- int present = 1;
- Float floatTime;
- Float roundedTime;
- Float[] timeArray = new Float[1];
- Float[] mzArray = new Float[1];
- String[] signalArray = new String[1];
- // ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
- void setup() {
- selectInput("Select a file to process:", "fileSelected");
- }
- // ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
- void fileSelected(File selection) {
- if (selection == null) {
- println("Window was closed or the user hit cancel.");
- exit();
- }
- else {
- println("User selected " + selection.getAbsolutePath());
- }
- folderPath = selection.getParent(); // folder path to save to
- String name = selection.getName();
- // println(folderPath);
- String file = folderPath + "\\" + "LCMS ultimate .txt writer output - " + name; // text file to save to
- String lines[] = loadStrings(selection);
- output = createWriter(file);
- println(file);
- for(String i : lines) { // this is the main loop of the sketch
- int nums = i.indexOf("["); // identifies which lines hold m/z or signal data
- int secs = i.indexOf("cvParam: scan start time,"); // identifies which lines hold the time signature in seconds
- if(nums > 0 || secs > 0) { // what to do with those lines
- if(nums > 0){
- String[] Trim = split(i, ']'); // cuts out the text guff
- Trim = trim(Trim); // trims off whitespace to leave a number
- if(index == 2) { signal = Trim[1];} // saves result in one of the ints
- else {mz = Trim[1];}
- index++;
- }
- else {
- String[] Trim = split(i, ','); // does same as above for time result
- time = trim(Trim[1]);
- floatTime = float(time) * 1000; // rounds to 3 sig figs
- roundedTime = round(floatTime) / 1000.0;
- index++;
- }
- }
- if (index == 3) { // every three lines write the data as a line of CSV values
- String[] mzData = split(mz," ");
- String[] signalData = split(signal," ");
- count = mzData.length;
- for(int u = 5000; u < 9501; u++) {
- output.print(roundedTime);
- output.print(", ");
- timeArray = (Float[]) append(timeArray,roundedTime); // this is new array writing code
- Float thisMZ = u / 10.0;
- output.print(thisMZ);
- output.print(", ");
- mzArray = (Float[]) append(mzArray, thisMZ); // this is new array writing code
- String mzString = str(thisMZ);
- present = 0;
- for(int e = 0; e < count; e++) {
- if(mzString.equals(mzData[e]) == true) {
- output.println(signalData[e]);
- signalArray = append(signalArray, signalData[e]); // this is new array writing code
- present = 1;
- }
- }
- if(present < 1) {
- output.println("0");
- signalArray = append(signalArray, "0"); // this is new array writing code
- }
- present = 1;
- }
- println(roundedTime);
- output.flush();
- }
- if(index > 2) {index = 0;}
- }
- output.flush();
- output.println(" "); // this is all new array writing code
- output.println(timeArray);
- output.flush();
- output.println(mzArray);
- output.flush();
- output.println(signalArray);
- output.flush();
- output.close();
- exit();
- }
In hindsight I can probably see a way of doing the mathematical operations using the text files using loadStrings() to put the CSV rows into an array and doing the calculations a row at a time. The zero-filling is designed to make the layout of the data identical and enables this. Maybe I will just proceed along these lines but I am still very interested to know why writing the data to arrays slows it down so much.
Many thanks in advance for your generously dispensed wisdom.
1