Loading...
Logo
Processing Forum

save bit array to hard drive?

in General Discussion  •  Other  •  1 month ago  
Hello all, I have an array of boolean, 30 or 50 elements. how can I convert it to an int or bit array and store vast numbers of this array on my hard drive?
Edited: I have many (~1 million) different arrays of boolean, 30 or 50 elements. Can I store those arrays (with each 30-50 bits) in one file?

any thoughts? thank you!

Greetings, Chrisir    


Replies(18)

hi
i dont know if is this wht u asking, but
 to transform variables to byts processing offers tools ,like byte();

and to save info in thee hard drive
check the references
in the part of output (files) and / or input(files)

i hoped i help
:D
Here's the code which does the job. 

Also here's link to github:

Copy code
  1. final int C_COUNT  = 30; //
  2. final String C_FILENAME = "booleans_as_bytes.txt";


  3. boolean[] myOriginalBooleans;

  4. void setup(){

  5.   // fill your boolean[] with some values
  6.   myOriginalBooleans = makeRandomBooleanArray(C_COUNT);  
  7.   printBoolean(myOriginalBooleans);

  8.   saveBooleansToFile(C_FILENAME, myOriginalBooleans);
  9.   

  10.   // now load bytes from file into boolean array
  11.   boolean[] reloadedBooleanValues = loadBooleansFromFile(C_FILENAME);
  12.   
  13.   printBoolean(reloadedBooleanValues);

  14.  
  15.   noLoop();

  16. }
  17. void draw(){
  18. }



  19. /**
  20. * Saves given booleans into file, storing them as bytes.
  21. */
  22. void saveBooleansToFile(String destFile, boolean[] booleans){
  23.   // convert boolean to byte
  24.   byte[] myBytes = new byte[booleans.length];
  25.   for(int i = 0 ; i < booleans.length ; i++){
  26.      if ( booleans[i] ){
  27.         myBytes[i] = 1; // true
  28.      }
  29.      else{
  30.         myBytes[i] = 0; // false
  31.      }
  32.   }
  33.   saveBytes(destFile, myBytes);
  34. }


  35. /**
  36. * Loads booleans from binary file, where each
  37. * byte represents a true/false value.
  38. */
  39. boolean[] loadBooleansFromFile(String filename){
  40.     byte[] fileBytes = loadBytes(filename);
  41.     if ( fileBytes == null ){
  42.        println("Error file: " + filename + " cannot be opened. Cannot load booleans");
  43.        return null; // error file was empty
  44.     }
  45.     boolean[] booleans = new boolean[fileBytes.length];
  46.     for(int i = 0 ; i < fileBytes.length ; i++){
  47.        if ( fileBytes[i] == 0 ){
  48.           booleans[i] = false;
  49.        }
  50.        else{
  51.           booleans[i] = true;
  52.        }
  53.     }
  54.     
  55.     return booleans;
  56. }



  57. /**
  58. * Generates boolean array of size c 
  59. * filled with booleans at random.
  60. *
  61. */
  62. boolean[] makeRandomBooleanArray(int c)
  63. {
  64.    
  65.    boolean[] b= new boolean[c]; 
  66.    for(int i = 0 ; i < b.length; i++){
  67.       if ( random(1) > 0.5 ){
  68.          b[i] = true;
  69.       }
  70.    }
  71.    
  72.    return b;
  73. }



  74. void printBoolean(boolean[] b){
  75.    for(int i = 0 ; i < b.length ; i++){
  76.       if ( i % 8 == 0 && i > 0){
  77.          print("|"); // divider of byes
  78.       }
  79.       if (b[i]){
  80.          print("*");
  81.       }
  82.       else{
  83.          print(".");
  84.       }
  85.    }
  86.    
  87.    print("\t array size: " + b.length);
  88.    
  89.    println();
  90. }

hello dimkir,

thank you so much for your huge effort.
I appreciate it a lot and I feel really great!

Nevertheless I still feel we are wasting a lot of space here.

Because when you have boolean array
true - false - true
it's like bits:
101
which would be 5.

So my thinking was converting it to a int or long and save it.

Because I have so many of those boolean-arrays, I'd like to store them in one file.

Thanks again!

Greetings, Chrisir    


Because when you have boolean array
true - false - true
it's like bits:
101
which would be 5.

I wish that'd be true but... AFAIK, the boolean primitive uses a whole byte to store true or false.
So, 7 bits go to waste for each boolean anyways.  :P
It can be worth to look at Java's BitSet class...
@Chrisir,

 my initial effort was with BitSet class, i even made a draft sketch

but eventually I got confused with the way it stores bits in memory and preallocates them. It didn't seem to store properly the number of bits. (Eg. BitSet in case you have sequence 1000000000000000000000000000000000000000000 would only store 1 bit, ( despite the sequence is ...hmm... 30smth bits long). So eventually after half an hour of experiments I dropped this idea.

Taking into account modern computers with gigabytes of memory, and taking into account that typical use case maybe will involve 100k true/false values, it doesn't really matter much if you use 100 kbytes or 12.5kbytes (100kb / 8 = 12.5 kb). 
Even if you have 1 million boolean values ( 1 Mb vs 125kb [1Mb / 8 = 125Kb] the memory booleans take when stored as byte, in comparison with typical amount of  RAM computers have now (eg. 8Gb  = 8192Mb)) makes savings of 775Kb become minuscule.

So luckily we're not in 80ties and not programming in C, so we have this luxury of making things simple. :) 


UPDATE:
If you use BitSet then it will be much harder to convert your sketch to JavaScript. It seems that even saveBytes() isn't availble for PJS and needs to be replaced with save strings. 
Very true. The concept of "waste" is very relative, particularly today.

I started programming in the 80s, with computers having 16 KB of memory and 1 MHz CPUs, so I can understand the conservative mindset (but I try to shake it off...).


interesting discussion, thanks to you all.

It is true that space isn't so important anymore.

Another issue I have is that I'd like to store all the byte-arrays (probably one million+X) in one file.
As a String I would use one line per byte-array and then append the next line.
How can this be done with a byte-array?

Because I have so many of those boolean-arrays, I'd like to store them in one file.

Thanks again!

Greetings, Chrisir    



i'm a 'bit' late to this, ... here's my version that encodes boolean to bits.
i think it does quite the same as java's bitset.


 public void setup() {
   
    BitBool bitbool;
    
    // generate some data, and save it
    bitbool = new BitBool(19); // needs 3 byte of memory.
    bitbool.set( 0, true);
    bitbool.set( 1, false);
    bitbool.set( 2, true);
    bitbool.set( 3, false);
    bitbool.set( 7, true);
    bitbool.set(18, true);
    bitbool.save("bitbool.dat");

    
    // load data from file
    bitbool = new BitBool("bitbool.dat");
    for(int i = 0; i < bitbool.length; i++){
      System.out.println("["+i+"] "+bitbool.get(i));
    }
    
       
    exit();
  }

 
 
 
  public class BitBool{
    
    public final byte[] bool;
    public final int length;
    
    
    // init with given size, and all "false"
    public BitBool(int length){
      this.length = length;
      bool = new byte[(length+7)>>3];
    }
    
    // init from given array
    public BitBool(boolean[] src){
      length = src.length;
      bool = new byte[(length+7)>>3];
      
      for(int i = 0; i < length; i++){
        bool[i>>3] |= (src[i]?1:0) << (i&7);
      }
    }
    
    // init from given file
    public BitBool(String filename){
      bool = loadBytes(filename);
      length = bool.length*8; 
      // TODO: maybe the exact length should be saved in the first 4 bytes of 
      // the file. Or, mark the end (=length+1) with 1.
    }
    
    
    
    // set boolean
    public void set(int i, boolean IO){
      int arr = i>>3;
      int bit = i&7;
      bool[arr] &= ~(1<<bit);         // clear old bit
      bool[arr] |= (IO?1:0) << bit;   // set new bit
    }
    
    // get boolean
    public boolean get(int i){
      int arr = i>>3;
      int bit = i&7;
      return ((bool[arr]>>bit)&1)==1;
    }
    
    
    
    // save bytes
    public void save(String filename){
      
      // version 1: 
//      try {
//        saveBytes(new FileOutputStream(new File(filename)), bool);
//      } catch (FileNotFoundException e) {
//        e.printStackTrace();
//      }
      
      // version 2: had an error when trying to overwrite an 
      //            existing file (processing 1.5.1)
       saveBytes(filename, bool);
      
      // version 3: saves a view byte more actually
      // saveStrings(filename, new String[]{ new String(bool) } );
    }
  }

... about the discussion in general.

e.g. one wants to save a huge number of bit-mask (stencil buffers from frames) or whatever, i think it makes total sense to represent booleans as bits. first it saves HD-space, and second it reduced the time for reading/writing the files. And also a lot more of such masks can be kept in memory, which reduced reading/writing again.


Thank you so much Thomas, great!

But here I am still with one array per file, right?   

How would it look like when it was all arrays in one file?

Greetings, Chrisir    


If you want to save space on hard-disk, there's an undocumented processing feature:

if you add ".gz" to the filename you're saving with saveBytes() or saveStrings() this file will be automatically "gzipped" (because there's built-in Java functionality) and when you load file with loadBytes() or loadStrings() and the file's extension ends with ".gz" then Processing automatically recognizes that this is compressed file and will automatically uncompress it for you. From your sketches perspective all of that happens transparently (meaning that your sketche's code doesn't see any difference).

Here's an example:

Copy code
  1. final int C_STRING_COUNT = 10 * 1000;

  2. void setup(){
  3.    String[] strings = new String[C_STRING_COUNT];
  4.    for(int i = 0  ; i < C_STRING_COUNT ; i++){
  5.       strings[i] = new String("Number " + i );      
  6.    }
  7.    
  8.    saveStrings("regular_file.txt", strings);
  9.    saveStrings("gzipped_file.txt.gz", strings); 
  10.    
  11.    // the compression would work also with saveBytes("my_byte_file.bin.gz", someByteArray);
  12.    
  13.    
  14.    // and now let's load gzipped file back
  15.    
  16.    String[] comingBackFromGzipped = loadStrings("gzipped_file.txt.gz");
  17.    println("Loaded back from gzipped: " + comingBackFromGzipped.length + " strings");
  18.    
  19.    println("Here's sample 10 items, to show that they were loaded back fine:" );
  20.    for(int i = 0; i < 10; i++){
  21.      println(comingBackFromGzipped[i]);
  22.      
  23.    }
  24.    
  25.    noLoop();
  26.    exit();
  27. }


  28. void draw(){
  29.   
  30.    
  31. }




After you run it, you can see in your data folder there will be two files: 
  • regular_file.txt (just regular file with one line per string, the way you're used to it)
  • gzipped_file.txt.gz (this is compressed file [my WinRar can open it and I can see inside the regular text file])



On the above picture my Windows Explorer has hidden actual extensions for files, so for clarity I enabled them, so that you can see REAL filenames and extensions:





I didn't know this

Thanks a lot

Can anybody answer how I can store multiple byte-arrays in one file?

I think even Thomas' solution does only one array per file, right?

Greetings, Chrisir


@Chrisir, 

  if you want to store multiple "bit-groups" in file, it's important to figure out how are you gonna use them? Are you gonna just generate them once (before you deploy the sketch)? And then only read them?

Or are you going to update their values after every run of the sketch?

And if you're going to update them:

are you ONLY (meaning that once you decide on size, it NEVER changes throughout the normal usage scenarios of your sketch) going to change values of the bits arrays (meaning never shrinking or expanding array, but just flipping bits)?

or are you going to change dimensions of arrays as well?

What are going to be the average size of one array? What's going to be the size of all arrays you will use? Will all the arrays need to stay in memory all the time?

What platforms? Are you only staying in Java Desktop mode? or are you to use Android? (I guess we dumpted good'ol JavaScript when we started considering BitSet class as potential option).





Honestly, if working with processsing, I would for simplicity just store 1 array per file. If you try to implement it "faster" via RandomAccessFile
, you just see (from the questions I have above) how much harder it becomes and it looses flexibility. 

IMHO I would stay with the simpliest way- one array per file.


ps. Or if anyone else knows quick and simple way to implement it, I would be delighted to learn myself as well (because I often tend to think complicated solutions).
i too would go for one array per file.

but you can also use the class from yesterday, and initialize it with the correct size (number of all boolean of all arrays) and insert the values then at the right position.

anyway, if its really just about saving/loading a bunch of boolean arrays, then the following does the job.


... as you can see, most of the code is just copied from the BitBool class

  public void setup() {
    
    // some test-arrays, that we want to save
    boolean[] b1 = {true, true, false, false};
    boolean[] b2 = {true};
    boolean[] b3 = {false, true, false, true, false, true};

    
    
    // save: boolean -> bit -> file
    saveBooleanAsBit("data/tmp.dat", b1, b2, b3);
    
    
    
    // load: file -> bit -> boolean
    boolean[] bool = loadBitAsBoolean("data/tmp.dat");
    
    
    
    // check if everything is right
    for(int i = 0; i < bool.length; i++){
      System.out.println(bool[i]);
    }
    
    
    exit();
  }
  
  
  
  

  public boolean[] loadBitAsBoolean(String filename){
    // load bytes
    byte[] bytebool = loadBytes(filename);
    // get number of boolean values (TODO: byteorder)
    int num_bool = bytebool[0]<<24 | bytebool[1]<<16 | bytebool[2]<<8 | bytebool[3];
    // alloc arrays
    boolean[] bool = new boolean[num_bool];
    // bit -> boolean
    for(int i = 0; i < num_bool; i++){
      bool[i] = ((bytebool[4 + (i>>3)]>>(i&7))&1)==1;
    }
    return bool;
  }
  
  
  
  
  
  public void saveBooleanAsBit(String filename, boolean[] ... bool){
    
    // get number of all boolean
    int num_bool = 0;
    for(int i = 0; i < bool.length; i++){
      num_bool += bool[i].length;
    }

    // alloc buffer
    byte[] bytebool = new byte[4 + ((num_bool+7)>>3)];

    // save number of boolean in first 4 bytes (TODO: byteorder)
    bytebool[0] = (byte) ((num_bool>>24)&0xFF);
    bytebool[1] = (byte) ((num_bool>>16)&0xFF);
    bytebool[2] = (byte) ((num_bool>> 8)&0xFF);
    bytebool[3] = (byte) ((num_bool    )&0xFF);
    
    // boolean -> bit
    int idx = 0;
    for(int i = 0; i < bool.length; i++){
      boolean[] bool_i = bool[i];
      for(int j = 0; j < bool_i.length; j++){
        bytebool[4+(idx>>3)] |= (bool_i[j]?1:0)<<(idx&7);
        idx++;
      }
    }
    
    
    
    // save 
    // version 1:
//    try {
//      saveBytes(new FileOutputStream(new File(filename)), bytebool);
//    } catch (FileNotFoundException e) {
//      e.printStackTrace();
//    }
    
    // version 2: // had an error when trying to overwrite a file
    saveBytes(filename, bytebool); 
    
    // version 3: seems to 
//    saveStrings(filename, new String[]{new String(bool)});
  }
  

I thought the the opening and closing of files would take way too much time if I had one file per array? I stay with java mode thanks! Greetings, Chrisir
If you just read files, then opening/closing them wont' take too much time.

And again depending on the qty of files: if it's 100 or 1000 or 10000? Eg. if you have in single directory 10k files, then opening of files may take long time, just to process the listing. But then you just have to organized them into subdirectories and make sure each subdirectory doesn't have more than 500 files and you're back on track with the speed.

And still: it's still not recommended doing this in draw(). Your IO (file) operations should always happen on separate thread. 


ok, thanks a lot, I will try....  

Greetings, Chrisir