Sorting a hashmap by value

edited April 2018 in Questions about Code

Hi

I have a hashmap that represents word frequencies with String:Int combinations to represent the Word:count respectively. How can I sort the hashmap, so the most common words would be first and the least common would appear last? I've only found examples for earlier versions of processing on this forum.

Here's the code (also contains other things):

import java.util.Date;
import java.util.Map;

PImage[] img;

//===============  margins and stuff
int galleryImageHeight = 13;  


//=============== fonts and styles
PFont countryTitle;
PFont labels;

//=============== loading tables
Table wordsTable;

//============== Words Hashmaps
HashMap<String, Integer> hm = new HashMap<String, Integer>();

void setup() {
  size(1024, 768, P2D);
 ((PGraphicsOpenGL)g).textureSampling(3);
  //fonts and styles
  countryTitle = createFont("QLRG.TTF", 32);
  labels = createFont("Roboto-Regular.ttf", 10);

  //=============== loading tables
  wordsTable = loadTable("syriaWords.csv", "header");

  for (TableRow row : wordsTable.rows()) {
    hm.put(row.getString("word"), row.getInt("count"));
  }



// Using just the path of this sketch to demonstrate,
// but you can list any directory you like.
String path = sketchPath();

println("Listing all filenames in a directory: ");
String[] filenames = listFileNames(path+"/data");
printArray(filenames);

println("\nListing info about all files in a directory: ");
File[] files = listFiles(path);
for (int i = 0; i < files.length; i++) {
  File f = files[i];    
  println("Name: " + f.getName());
  println("Is directory: " + f.isDirectory());
  println("Size: " + f.length());
  String lastModified = new Date(f.lastModified()).toString();
  println("Last Modified: " + lastModified);
  println("-----------------------");
}

println("\nListing info about all files in a directory and all subdirectories: ");
ArrayList<File> allFiles = listFilesRecursive(path);

img = new PImage[allFiles.size()];
int i = 0;
for (File f : allFiles) {
  println("Name: " + f.getName());
  println("Full path: " + f.getAbsolutePath());
  println("Is directory: " + f.isDirectory());
  println("Size: " + f.length());
  String lastModified = new Date(f.lastModified()).toString();
  println("Last Modified: " + lastModified);
  println("-----------------------");
  println("Is Image: " + isItAnImage(f.getName()));
  if (isItAnImage(f.getAbsolutePath())==true) {
    img[i] = loadImage(f.getAbsolutePath());
    println(f.getAbsolutePath());
    println (img[i]);
    img[i].resize(0, galleryImageHeight);
    i++;
  }
}

noLoop();
}

// Nothing is drawn in this program and the draw() doesn't loop because
// of the noLoop() in setup()
void draw() {
  background(255, 255, 255);
  for (int i=1; i<img.length; i++) {
    println("i = "+i);
    println(img[i]);
    for (int r=0; r<(img.length)/5; r++) {         //rows
      int playHead = 30;
      for (int c=0; c<15; c++) {      //columns
        if (i<img.length-30) {
          image(img[i], playHead, r*(galleryImageHeight)+30);
          playHead = playHead + img[i].width;
          i++;
        }
      }
    }
  }
  textFont(countryTitle);
  textAlign(LEFT);
  fill(200);
  text("SYRIA", 30, height/2);
  textFont(labels);
  int t=1;
  for (Map.Entry me : hm.entrySet()) {
    int freq = hm.get(me.getKey());
    String s = "";
    for (int i=0; i<freq; i++){
      s = s+" "+me.getKey();
      }
    fill(10*t);
    text(s.toLowerCase(), 30, t*10+400);  // Text wraps within text box
    t++;
    print(me.getKey() + " is ");
    println(me.getValue());
    }

}

// This function returns all the files in a directory as an array of Strings  
String[] listFileNames(String dir) {
  File file = new File(dir);
  if (file.isDirectory()) {
    String names[] = file.list();
    return names;
  } else {
    // If it's not a directory
    return null;
  }
}

// This function returns all the files in a directory as an array of File objects
// This is useful if you want more info about the file
File[] listFiles(String dir) {
  File file = new File(dir);
  if (file.isDirectory()) {
    File[] files = file.listFiles();
    return files;
  } else {
    // If it's not a directory
    return null;
  }
}

// Function to get a list of all files in a directory and all subdirectories
ArrayList<File> listFilesRecursive(String dir) {
  ArrayList<File> fileList = new ArrayList<File>(); 
  recurseDir(fileList, dir);
  return fileList;
}

// Recursive function to traverse subdirectories
void recurseDir(ArrayList<File> a, String dir) {
  File file = new File(dir);
  if (file.isDirectory()) {
    // If you want to include directories in the list
    a.add(file);  
    File[] subfiles = file.listFiles();
    for (int i = 0; i < subfiles.length; i++) {
      // Call this function on all files in this directory
      recurseDir(a, subfiles[i].getAbsolutePath());
    }
  } else {
    a.add(file);
  }
}

boolean isItAnImage(String loadPath) {
  return (
    loadPath.endsWith(".jpg") ||
    loadPath.endsWith(".jpeg") ||
    loadPath.endsWith(".png")  ) ;
}

Answers

  • Only idea for a workaround

    PM me when you need it

  • My attempt. To use the following code, you need to create a file inside the sketch folder called MapUtilities.java and add the following code:

    //REFERENCE: https://stackoverflow.com/questions/109383/sort-a-mapkey-value-by-values-java
    //REFERENCE: https://stackoverflow.com/questions/16370305/error-in-eclipse-the-method-from-the-type-refers-to-the-missing-type-en
    //REFERENCE: https://memorynotfound.com/sorted-map-example/
    
    import java.util.*;
    import java.util.Map.Entry;
    
    public class MapUtilities {
    
      public static <K, V extends Comparable<V>> List<Entry<K, V>> sortByValue(Map<K, V> map) {
        List<Entry<K, V>> entries = new ArrayList<Entry<K, V>>(map.entrySet());
        Collections.sort(entries, new ByValue<K, V>());
        Collections.reverse(entries);
        return entries;
      }
    

    To test it, in the sketch tab add the code below.

    Kf

    import java.util.Map;
    import java.util.*;
    
    
    final int N=10;
    HashMap<String, Integer> hm = new HashMap<String, Integer>();
    
    void setup() {
    
      for (int i=0; i<N; i++) {   
        hm.put(""+i, (int)random(N));
      }
    
    
      for (Map.Entry<String, Integer> entry : hm.entrySet()) {
        println(entry.getKey()+" ==== "+entry.getValue());
      }
    
      List<Map.Entry<String, Integer> > sorted = MapUtilities.sortByValue(hm);
    
    
      Iterator it = sorted.iterator();
    
      println("\n\n\nSorting results");
    
      while (it.hasNext()) {
        Map.Entry<String, Integer> r = (Map.Entry<String, Integer>)it.next();
        println( r.getKey()+" " + r.getValue()        );
      }
    }
    
    
      private static class ByValue<K, V extends Comparable<V>> implements Comparator<Entry<K, V>> {
        public int compare(Entry<K, V> o1, Entry<K, V> o2) {
          return o1.getValue().compareTo(o2.getValue());
        }
      }
    
    }
    
  • Sorry @kfrajer I'm confused as to how to implement this in my code With your instructions I got a lot of errors :(

  • cut your example down to the basics of what you need - no images, no fonts, no trundling through directories looking for things, nothing like that.

    and include (some of) the data that you want sorted. and an example of how you want the output to look.

  • edited April 2018

    @koogs

    thanks, here's the reduced code:

    import java.util.Map;
    Table wordsTable;
    
    //============== Words Hashmaps
    HashMap<String, Integer> hm = new HashMap<String, Integer>();
    
    void setup() {
      size(1280,720);
       wordsTable = loadTable("syriaWords.csv", "header");
      for (TableRow row : wordsTable.rows()) {
        hm.put(row.getString("word"), row.getInt("count"));
        print(row.getString("word"));
      }
    }
    
    void draw() {
      background(255, 255, 255);
      fill(100);
      int t=1;
      float textBoxHeight = 0;
      for (Map.Entry me : hm.entrySet()) {
        String s = "";
        int freq = hm.get(me.getKey());
        for (int i=0; i<freq; i++){
          s = s+me.getKey()+" ";
        }
    
        text(s.toLowerCase(), 30, 30+textBoxHeight, 300, 500);  // Text wraps within text box
        textBoxHeight = 14*ceil(textWidth(s)/300)+textBoxHeight+7;
    
      }
    }
    

    and here's the data (it's a pretty small csv): https://expirebox.com/download/0c3ce07fbe02d91eb10026f7335d7811.html

  • Thank you, it looks like a possible solution I tried looking around but I'm not quite sure - how do I iterate it?

  • ok I think I've solved it. It's kind of ugly and I'm sure there's a smarter way to do it, but this worked for me:

          labelsDict = new IntDict();
          for (TableRow row : wordsTable.rows()) {
            labelsDict.set(row.getString("word"), row.getInt("count"));
          }
          labelsDict.sortValuesReverse();
    

    and then:

      for (int p = 0; p<labelsDict.size(); p++) {
    
        String s = "";
        int freq = labelsDict.valueArray()[p];
        for (int i=0; i<freq; i++){
          s = s+labelsDict.keyArray()[p]+" ";
        }
    
        text(s.toLowerCase(), 30, 400+textBoxHeight, columnWidth, 500);  // Text wraps within text box
        textBoxHeight = labelsLeading*ceil(textWidth(s)/columnWidth)+textBoxHeight+7;
    
      }
    

    Thx @GoToLoop

  • edited May 2018

    If you follow the instructions as before, you should be able to run the code below

    Kf

    import java.util.Map;
    import java.util.*;
    
    Table wordsTable;
    HashMap<String, Integer> hm = new HashMap<String, Integer>();
    List<Map.Entry<String, Integer> > sorted;
    
    void setup() {
      size(1280, 720);
      background(255, 255, 255);
      fill(100);
    
      wordsTable = loadTable(dataPath("data.csv"), "header");
      for (TableRow row : wordsTable.rows()) {
        hm.put(row.getString("word"), row.getInt("count"));
        println(row.getString("word"));
      }
    
      sorted = MapUtilities.sortByValue(hm);    
    
    
      float textBoxHeight = 0;
    
      for (Map.Entry<String, Integer> me : sorted) {
    
        //println(me.getKey()+" ==== "+me.getValue());  
        Integer freq = me.getValue();//hm.get(me.getKey());
    
        String s = "";
        for (int i=0; i<freq; i++) {
          s = s+me.getKey()+"";
        }
        println(freq, s);
        textBoxHeight = 14*ceil(textWidth(s)/300)+textBoxHeight+7;
        text(s.toLowerCase(), 30, 30+textBoxHeight, 300, 500);  
      }
    }
    
    void draw(){  }
    
  • edited April 2018

    I've perused the IntDict class a lil' here:
    https://GitHub.com/processing/processing/blob/master/core/src/processing/data/IntDict.java

    I haven't tested yet; but it seems like we can iterate over it (both keys & values) by calling its entries() method:
    https://GitHub.com/processing/processing/blob/master/core/src/processing/data/IntDict.java#L176

    That will return an Iterable for IntDict.Entry objects:
    https://GitHub.com/processing/processing/blob/master/core/src/processing/data/IntDict.java#L165

    So in theory you can use a loop like this: for (final IntDict.Entry entry : labelsDict.entries()) {}

    And each IntDict.Entry got exactly 2 fields: key & value:
    https://GitHub.com/processing/processing/blob/master/core/src/processing/data/IntDict.java#L166-L167

    for (final IntDict.Entry entry : labelsDict.entries()) println(entry.key, entry.value);

    For purely full display's reasons, you can simply invoke its toJSON() method instead:
    https://GitHub.com/processing/processing/blob/master/core/src/processing/data/IntDict.java#L792

    And pass that to Processing's text(): text(labelsDict.toJSON(), 30, 400); *-:)

  • The best way would be to create a simple inner class which has just two attributes, the word and the frequency the word appears. The class should implement comparable so we can easily order the words by popularity.

    Simply iterate over the map and add each map entry (key-value pair) to a list. Finally sort the list.

    This is what the sketch below does

    import java.util.*;
    
    Map<String, Integer> map;
    
    void setup() {
      map = new HashMap<String, Integer>();
      createDummyData(map);
      List<WordCountFreqItem> list = getWordsInPopularityOrder(map);
      // Display word counts
      for (WordCountFreqItem wcfi : list) {
        println(wcfi);
      }
    }
    
    /**
      For a given map create a list of word frequencies in popularity order
    */
    List<WordCountFreqItem> getWordsInPopularityOrder(Map<String, Integer> map) {
      List<WordCountFreqItem> list = new ArrayList<WordCountFreqItem>();
      for (Map.Entry<String, Integer> entry : map.entrySet()) {
        list.add(new WordCountFreqItem(entry.getKey(), entry.getValue()));
      }
      Collections.sort(list);
      return list;
    }
    
    /**
      Utility class to hold the frequency a word appears in some text.
    */
    class WordCountFreqItem implements Comparable<WordCountFreqItem> {
      public final String word;
      public final Integer count;
    
      public WordCountFreqItem(String word, int count) {
        this.word = word;
        this.count = count;
      }
    
      // This will order the words so the most popular come first. Words that
      // occur with the same frequency are placed in alphabetical order
      public int compareTo(WordCountFreqItem wco) {
        int result = wco.count.compareTo(this.count);
        return result != 0 ? result : this.word.compareToIgnoreCase(wco.word);
      }
    
      public String toString() {
        return count + "\t" + word;
      }
    }
    
    void createDummyData(Map<String, Integer> map) {
      map.put("Peter", 12);
      map.put("quark", 9);
      map.put("the", 22);
      map.put("and", 10);
      map.put("popular", 3);
      map.put("wonderful", 2);
      map.put("ocean", 1);
      map.put("place", 5);
      map.put("whales", 7);
      map.put("a", 24);
      map.put("blue", 8);
      map.put("majesty", 7);
    }
    
  • edited April 2018

    @quark, Java already offers a SortedMap interface:
    https://Docs.Oracle.com/javase/10/docs/api/java/util/SortedMap.html

    Implemented as a TreeMap class container:
    https://Docs.Oracle.com/javase/10/docs/api/java/util/TreeMap.html

    And when creating a TreeMap:
    https://Docs.Oracle.com/javase/10/docs/api/java/util/TreeMap.html#<init>(java.util.Comparator)

    We can pass a Comparator for sorting its keys automatically:
    https://Docs.Oracle.com/javase/10/docs/api/java/util/Comparator.html

    By implementing its method compareTo():
    https://Docs.Oracle.com/javase/10/docs/api/java/util/Comparator.html#compare(T,T)

    Although it'd be slightly quirky b/c in this case we wish to sort the keys by its corresponding value.

    So inside compareTo() we're gonna need to access the SortedMap container, and call its method get() passing the received parameter String key, in order to reach its mapped Integer value, so we can order the keys based on their value: #:-S
    https://Docs.Oracle.com/javase/10/docs/api/java/util/Map.html#get(java.lang.Object)

    That's why it's much easier to simply use Processing's IntDict when dealing w/ String keys mapped to Integer values: \m/
    https://Processing.org/reference/IntDict.html

  • edited April 2018

    @GoToLoop As you say SortedMap and TreeMap maintain their entries in key order so are irrelevant to this discussion.

    I admit that I had forgotten about IntDict (probably because I have never had the need to use it ;) ). On investigation it would seem that using 'IntDict' to store the word frequencies is by far the best way to go.

    I will leave my code for two reasons.
    1) It works :)
    2) it demonstrates a valuable technique to get a list of hashmap entries sorted by value while retaining the associated key

  • edited April 2018

    @quark, you're right: No direct way to sort a SortedMap by value w/o resorting to some extra container and other boilerplates. X_X

    When we call its put() method, it triggers its associated Comparator before the value has been assigned to the key. :-&

    What I didn't notice until now is that the Table class also got sorting already builtin: @-)
    https://processing.org/reference/Table_sort_.html

    "gn.csv":

    word,count
    gnawing,14283
    gnarled,10192
    gnu,35632
    gnome,39034
    gnomes,13585
    gnostic,17020
    

    "Word_Freqs_Table.pde":

    /**
     * Word Freqs Table (v1.0)
     * GoToLoop (2018/Apr/30)
     * Forum.Processing.org/two/discussion/27863/sorting-a-hashmap-by-value#Item_14
     */
    
    static final String CSV_FILENAME = "gn.csv";
    Table t;
    
    void setup() {
      t = loadTable(CSV_FILENAME, "header");
    
      t.setColumnType(0, Table.STRING);
      t.setColumnType(1, Table.INT);
    
      t.print(); // unsorted
      println();
    
      t.sort(1);
      t.print(); // ascending sorted
      println();
    
      t.sortReverse(1);
      t.print(); // descending sorted
    
      exit();
    }
    
Sign In or Register to comment.