Loading...
Logo
Processing Forum

Parsing .csv

in Programming Questions  •  1 year ago  
I am working on cleaning up some data to use and wanted to replace all the commas with tabs and save it as a .tsv file but for some reason the code didn't pick up on all the commas to replace with tabs and I cant find a pattern as to why is missed a few....anyone had this problem?

For example:
# 2051 Marshfield Hills in the .tsv file has no tab or comma but in the original .csv file there was a comma there...why wasn't it replaced by a tab?

.csv file
2035,42.062204,-71.235774,FOXBORO,25
2038,42.08868,-71.404814,FRANKLIN,25
2040,41.970474,-70.701357,GREENBUSH,25
2041,42.069642,-70.649075,GREEN HARBOR,25
2043,42.212105,-70.884989,HINGHAM,25
2044,41.970474,-70.701357,HINGHAM,25
2045,42.284413,-70.873659,HULL,25
2047,42.142836,-70.69353,HUMAROCK,25
2048,42.013182,-71.218373,MANSFIELD,25
2050,42.111805,-70.710744,MARSHFIELD,25
2051,42.151202,-70.734146,MARSHFIELD HILLS,25
2052,42.181265,-71.309934,MEDFIELD,25
2053,42.156282,-71.427663,MEDWAY,25
2054,42.165249,-71.36126,MILLIS,25

.tsv file
2040    41.970474    -70.701357    GREENBUSH    25
2041    42.069642    -70.649075    GREEN HARBOR    25
2043    42.212105    -70.884989    HINGHAM    25
2044    41.970474    -70.701357    HINGHAM    25
2045    42.284413    -70.873659    HULL    25
2047    42.142836    -70.69353    HUMAROCK    25
2048    42.013182    -71.218373    MANSFIELD    25
2050    42.111805    -70.710744    MARSHFIELD    25
2051    42.151202    -70.734146    MARSHFIELD HILLS      25
2052    42.181265    -71.309934    MEDFIELD    25
2053    42.156282    -71.427663    MEDWAY    25
2054    42.165249    -71.36126    MILLIS    25
2055    41.970474    -70.701357    MINOT    25



Copy code
  1. //Clean zipnov99.csv and save as cleanZips.tsv
  2. //load each line into an array of strings
  3. String[] zipLines;
  4. void setup() {
  5.   zipLines = loadStrings("zipnov99.csv");
  6.   PrintWriter tsv = createWriter("zips.tsv");
  7.   makeTabs(zipLines);
  8.   for(int i=0; i < zipLines.length; i++) {
  9.     println(zipLines[i]);
  10.     tsv.println(zipLines[i]);
  11.   }
  12. }
  13. //change commas (",") to tabs (\t) and save as .tsv
  14. void makeTabs(String[] dataArray) {
  15.   for(int i=0; i < dataArray.length; i++) {
  16.     dataArray[i] = dataArray[i].replaceAll(",","\t");
  17.   }
  18. }

Thank you!

Replies(5)

Re: Parsing .csv

1 year ago
This kind of parser does the same thing...(meaning it looks like it doesn't read all the commas properly). But when I println the number of pieces it says that all the values have 5 pieces. Could notepad just not be putting a tab when I look at the .tsv file for some reason? I am confused.

Copy code
  1. void makeTabs2(String[] dataArray) {
  2.   for(int i=0; i<dataArray.length; i++) {
  3.     String[] pieces = split(dataArray[i], ',');
  4.     dataArray[i] = pieces[0] + "\t" + pieces[1] + "\t" + pieces[2] + "\t" + pieces[3] + "\t" + pieces[4];
  5.     println(pieces.length);
  6.   }

Re: Parsing .csv

1 year ago
" Marshfield Hills in the .tsv file has no tab or comma"
The line in the .tsv file looks like the surrounding ones. How can you tell it has no tabs?
I fear I can't understand your problem from the samples you provide. The codes look OK.

" Could notepad just not be putting a comma when I look at the .tsv file but they are really there?"
I understand even less. How could there be commas in the .tsv file?
Anyway, I advise to use a real text editor instead of Notepad! Notepad++ is a good one, for example.

Re: Parsing .csv

1 year ago
Wow. So when I pasted the data above right from notepad it pasted it with a tab there between there ("MARSHFIELD HILLS      25) but in the three text editors I used to view it (word, wordpad and notebook) it showed it like:
2051    42.151202    -70.734146    MARSHFIELD HILLS25

SO I guess it was the text editor.

Now that I used notepad++ it all looks perfect except the loop didn't go through all of the data. In the conversion from csv to tsv I went from 99950 lines to 99658 lines.

I can't attach a file here that I see but I could hyperlink it if you want to see it.

Thanks
McKinnley

Re: Parsing .csv

1 year ago
In Notepad++, you can jump to the faulty line (Go to line) and see what can be wrong. And you probably can view the spaces, tabs and end of lines.
My editor is SciTE, which allows such operations, but it is based on the same view component than Notepad++, so I suppose they have similar functions.

Re: Parsing .csv

1 year ago
I see what you mean..a better editor is making life a lot easier.

I narrowed down the problem to find that I loose data when going from .csv to .tsv with the following code:

File going in goes from 42195 lines and file coming out has 42042 lines.
Copy code
  1. void makeTabs(String[] dataArray) {
  2.   PrintWriter tsv = createWriter("zips.tsv");
  3.   for(int i=0; i < dataArray.length; i++) {
  4.     dataArray[i] = dataArray[i].replaceAll(",","\t");
  5.     tsv.println(dataArray[i]);
  6.     println(i + "  " + dataArray[i]);
  7.   }
  8.   println("length: " + dataArray.length);
  9. }

I know the loop goes all the way through because in the console when I print line I get all the data lines. The just don't all get to the file for some reason.