Media consultant: DAM solutions in music, broadcast and publishing. Musician: writing and producing my own noise. Traveler: going to new places and meeting new people, then eating their yummy food.
I need to parse data into multiple files based on the leading digits of a value, and naming the files on the leading digits. This is just the output declaration, but I go through this each step of the way. I have to create these files and write data directly to them as I am processing ~90 GB of data each time, so arrays are not going to work.
I do this for the declaration, then the creatWriter, println, flush and close, so there are literally hundreds of lines of code that could be condensed via a loop.
I'd like to do this, but haven't gotten it to successfully create the output files, then write to them, etc.:
for (int i = 0; i < 10; i++) {
output(i) = createWriter(outPath + i + ".txt");
}
In the book "Visualizing Data" it mentions that BZIP (.bz) files can be read directly by the Processing API methods loadstrings(), createReader(), etc.
Does this also apply to .bz2 files? I've not been able to crack it, but this would be tremendously helpful. Any insights much appreciated.
To pre-process a huge data file for SQL loading, I have to read a text file and parse each line. Everything works great on a test file, but the real file is over a GB with millions of lines and generates OutOfMemoryError. I assume the array is too large...
Is there an alternative method of reading one line at a time instead of loading them all into an array?
I've searched and can't find documentation or forum topics around formatting output using print() or println().
I need to emulate various existing reports, so need to add commas, slashes, etc. to integers, dates, etc. I assume there are arguments to send but can't find mention of them anywhere.
I'm reading in log files to determine whether there are missing files in the set. They are consistently named, with servername at the front and hour at the end of the filenames. I need to check whether the current file is from the same server as the previous, and set "same" or "different" booleans to control the following actions. However, I it is resisting such determination and never recognizing the servers as "same". DRIVING ME NUTS. I'm sure there's a simple mistake in the code somewhere. If anyone can give it a few minutes, I'd appreciate your insights.
In comparefiles() I make the comparison, and act accordingly. The following data produces the following output:
Different!
1 |servername1_log|0401| missed | first hour(s) 0 through 0
Different!
1 |servername1_log|0401| missed | last hour(s) 2 through 23
Different!
1 |servername1_log|0401| missed | last hour(s) 3 through 23
Different!
1 |servername1_log|0401| missed | last hour(s) 4 through 23
Different!
1 |servername2_log|0401| missed | last hour(s) 1 through 23
2 |servername2_log|0401| missed | last hour(s) 2 through 23
I have more than one Sketch open (on Windows Vista), and one of them is executing a job which is updating the console via println(). If I want do anything in the second Sketch, the println updates now show in the second console. Is there a way to isolate each of these? I'd like to be able to work on more than one Sketch at a time.
I haven't been able to find clear directions for sorting 2D arrays. I have to process data that is time based, but dependent on userIDs. What I am doing is putting userID, datetimestamp and a value in a 2D array. There are thousands of these rows:
userdata[i][1] = userID
userdata[i][2] = datetimestamp
userdata[i][3] = value
I need to sort on userID & datetimestamp in order to determine how to handle the associated values.