Parsing badly formatted text files
in
Programming Questions
•
11 months ago
Hello! I'm working on a project based on
live traffic speed data from New York (not really live at the moment for obvious hurricane related reasons, but the last updated data is still there). Problem is, there are a lot of extra line breaks inserted at different times (you can see where this happens about 75% of the way down the linked file), so the CSV style breaks down and using loadStrings() doesn't work the way it should.
At the moment I have a version working which is using brute force and ignorance to strip out any line which doesn't begin with a whole number (the ID attribute) and has at least 12 items, but this approach cuts out a significant portion of the useful data. Since the data is (theoretically) live I can't use Excel or whatever to clean it up, although opening it as a CSV in LibreOffice Calc and separating fields with " does a pretty good job.
So, tl;dr version: does anyone have any advice or strategies for dealing with irregular data with erroneous line breaks?
Thanks!
Stripped down version of what I'm doing at the moment, if you want to punish yourself:
- String[] data;
- String[][] splitData;
- ArrayList<CamData> cams;
- int activeCam= 0;
- void setup() {
- size(100, 100);
- cams= new ArrayList<CamData>();
- getData();
- for (int i=0; i<cams.size(); i++) {
- CamData c= cams.get(i);
- if (c.parse() && c.fixCoords()) {
- if (c.coords==null) {
- cams.remove(i);
- }
- println("camera: "+ i+ ": ");
- println(c.coords);
- }
- else {
- cams.remove(i);
- }
- }
- }
- void draw() {
- }
- void getData() {
- try {
- data= loadStrings("http://207.251.86.229/nyc-links-cams/LinkSpeedQuery.txt");
- }
- catch(Exception e) {
- println("unable to load data");
- exit();
- }
- splitData= new String[data.length][13]; //each line of the input txt
- for (int i=1; i<data.length; i++) {
- splitData[i]= trim(splitData[i]);
- splitData[i]= splitTokens(data[i], "\"");
- try {
- if (splitData[i][0]!=null && splitData[i].length>=12) {
- cams.add(new CamData(splitData[i][0], splitData[i][2], splitData[i][6], splitData[i][12]));
- }
- }
- catch(Exception e) {
- //println(i+ ": " +e);
- }
- }
- }
- class CamData {
- String initid;
- String initspeed;
- String initcamStatus;
- String initcoords;
- int camStatus;
- int id;
- float speed;
- float[] coords;
- CamData(String theId, String theSpeed, String theCamStatus, String theCoords) {
- initid= theId;
- initspeed= theSpeed;
- initcamStatus= theCamStatus;
- initcoords= theCoords;
- }
- boolean parse() {
- try {
- id= int(initid);
- speed= float(initspeed);
- camStatus= int(initcamStatus);
- if (Float.isNaN(speed) ||camStatus==-101) {
- return false;
- }
- else {
- return true;
- }
- }
- catch(Exception e) {
- return false;
- }
- }
- boolean fixCoords(){ //converts coordinate strings to lat/long floats
- String[] splitCoords= splitTokens(initcoords, ",()/n ");
- splitCoords= trim(splitCoords);
- //println(splitCoords);
- if(Float.isNaN(Float.parseFloat(splitCoords[0]))){
- return false;
- }
- else{
- coords= new float[splitCoords.length];
- for(int i=0; i<splitCoords.length; i++){
- coords[i]= float(splitCoords[i]);
- }
- if(coords==null){
- return false;
- }
- else{
- return true;
- }
- }
- }
- }//end of class
1