We are about to switch to a new forum software. Until then we have removed the registration on this forum.
Hi, I'm overriding hashcode for some classes. Well I don't really know how to properly do this, but I boldly :P tried.
The first class is a data holder (Data) , it has an int that has the range of 0-9 and a float from 0.0-1.0 (for now they are only 0 or 1). So I came up with ,
int hashcode = myInt * 100 + int(myFloat); // e.g. 7 and 1.0 = 701
// it will need adjust if float are fully used..
And happily found that my tests indicates that this is working.
Now the second class have an int[] + an arrayLIst of the elements of previous class (Data). Looking into how to do this I found a lot of different information...
Is that correct to assume that listofData.hashCode()
will work if Data hashCode is fine? Same for the array can I just use arrayOfInts.hashCode() ?
And then how to put both together safely?
I did this, and my firsts tests seems to show it is working, but...
//@ override
public int hashCode() {
//scenario
return arrayOfInts.hashCode() * listOfData.hashCode();
}//hashCode
I feel that I should look for a more robust stuff here. Then I found this link that has a solution to implement hashcode in any situation... Looks good doesn't it?
http://www.javapractices.com/topic/TopicAction.do?Id=28
Any thoughts and advices are welcome.
thanks
Answers
Unfortunately in Java, hashCode() is a mere 4-byte/32-bit
int
value.Unless a class doesn't have much to store, the whole hash will never fit in such tiny 4-byte space!
That's why a hashCode() isn't expect to be unique. Simply an attempt for it!
The better the attempt algorithm, the less hashCode() collision among instances w/ diff. field values.
In your 1st class, you said you only needed a range from 0 to 9 + a range from 0.0 to 1.0.
For such tiny ranges, even an 8-bit
byte
could comfortably store both!That'd still leave ya w/ 24 bits vacant from an
int
! :PHere's a class example which generates a unique hashCode() for its 3 fields:
@quark, String's own hashCode() is pretty good, but not unique.
There exists a slightly chance that diff. combo values of
myInt + "" + myFloat
yield same hash! L-)But w/ limited ranges, perhaps it ends up being unique! Who knows? 8->
Either way, as aforementioned, uniqueness isn't required. But gr8 when achievable! :-bd
Unless there are two objects that have exactly the same values for myInt and myFloat then there is no more chance of two objects having the same hashCode than any other method. The advantage of using the String class hashCode is obvious, it takes the responsibility of ensuring a good code distribution from us, and we can assume that the Java developers have done a reasonable job of it.
Writing your own hashCode method from scratch is not recommended because you would have make sure you have an even distribution so they can be used effectively in hash maps and sets.
Creating a unique String from the objects attribute values and using the String hashCode is a good way of getting round it.
Also if you provide your own hashCode you should override the equals method because any two objects that are considered equal MUST produce the same hashCode.
BTW if you have two objects that are considered unequal, in other words equals() returns false, it is NOT a requirements that their hashCodes are different.
The hasCode value should NOT be used be used to check for equality between objects, that is the purpose of the equals() method
@quark, dunno if you paid attention, but
myByte<<030 | (int)(myFloat*100)<<020 | myShort;
yields a unique hashCode() if expected range is obeyed! Those 4 fields together fit in 32 bits!
Like I said, even if String's hashCode() algorithm is good, it's very far from the definition of unique!
And they were developed for texts, not numerical values: 3:-O
http://docs.oracle.com/javase/8/docs/api/java/lang/String.html#hashCode--
@GoToLoop
You seem to be under the impression that it is important that unequal objects produce a different hashCodes. That is NOT true, what is important is the distribution of values produced, they should be evenly distributed over the 32 bit range. The only reason most hashCodes are usually unique is that they are evenly distributed over 2^32 unique values, difficult to get duplicates even if you have a million unequal objects.
The sole purpose of the hashCode is to ensure efficient object storage in hash structures, they should NEVER be used to compare objects for equality.
I saw your hashCode example but the only way to guarentee an even distribution is if the values for the individual attributes are evenly dstributed over their respective ranges. For instance the values of myByte must be evenly distributed over the range -128 to 127; if myByte is always positive then bit 32 of the hash would always be 0. A good hasCode implementation would give it a 50/50 chance.
Converting the values to a long String then using the String class hashCode avoids that probem.
Well, in my simple case, my hashCode() is equivalent to equals().
I simply fail to see why a unique hash should be lowered to a non-unique hash! 8-}
As soon it's not possible to fit a hash into a 32 bit
int
value,that's where we should start looking for a good distribution hash algorithm.
I've seen many online examples of it, but never 1 that turned itself into a String! ~:>
From my 2nd reply I said:
Thanks @quark and @GoToLoop. I think this String method is what I was looking for.
I'am overriding
equals()
as well.But the hash seems to be important to avoid duplicates in Sets. Without overriding it my Sets keep having duplicated elements. Once
hashCode()
was override it starts to work as expected.BTW what about the linked page, did you guys had time to look at it? What about that?
cheers
vk
Just a little. It contains diff. algorithms for diff. types. Very diff. than make everything a String! :-$
quark: "The sole purpose of the hashCode is to ensure efficient object storage in hash structures, they should NEVER be used to compare objects for equality."
Not the "sole purpose", I think.
I believe it is used for quick comparison, not for equality, but for inequality. If hash codes are different, then the objects must be different. If they are equals, then the algorithm MUST use equals() to ensure a real equality.
That's why good practices tell us to always write both equals() and hashCode() at the same time, and to use the same fields for them. Although some people tell hashCode can use a subset of equals' fields.
quark, I am not sure I would use your algorithm of hashCode(). This function is called a lot, eg. when walking a tree structure, and called on each step of the algorithm. But your method creates a StringBuffer (or Builder) on each call, converts the fields, compute the hash. It is very costly in resources.
If you want a really good algorithm; import the Guava library and use their methods for hashCode / equals.
Note that they have also helpers for compareTo and toString:
If you ask Eclipse to generate these functions for you, it will come with a robust method itself, purely based on integer algebra...
Note that for Map and Set, if there is a conflict of hashCode (ie. same hash code, but equals says they are different), they are still stored in the collection, but in a chained list, so it is less efficient because retrieval must iterate on the lists. Thus the importance of a good algorithm with an even distribution, as quark said.
@PhiLho
True hashCode might be used to test for inequality, but surely that's the purpose of equals or !equals. Also I am not convinced that it would be quicker to generate the hashCode rather than use !equals(...).
Yes Eclipse provides boiler plate hashCode and equals methods but that doesn't help in Processing IDE.
I am not familiar with the Guava library but at 2.3Mb seems a lot if you are only going to use the hashCode and equals implementations. I suspect there is a lot more useful stuff in there so I must have a look. :)
That is true for tree structures but not for case for maps and sets where the hash is calculated once for a
put
and aget
and is used to determine the index (bucket) for the object, after thatget
uses 'equals' to locate the matching object.@_vk
If you don't override hashCode then it will use the default version in the Object class - extract from java.lang.Object
It means that if you create 2 distinct objects and assign the same values to both objects they will have different hashCodes for instance the output from this program
Produced the hashcodes 2083592302 and 2067471732 so both will be stored in a HashSet or HashMap
I agree w/ @PhiLho too. When I perused about hashCode() some years ago, I've learnt stored calculated hash value is tested before equals() for Map & Set data structure types. If hashes don't match, equals() test is skipped.
Hence hashCode() is indeed a quicker inequality test! :-B
And of course, unnecessary &/or redundant object creations should be avoided as much as possible.
Especially to something that is called many times! :-SS
But like I said, If some class is able to uniquely fit in its hash into a 32-bit
int
value,mathematically its
hashCode() == equals()
! \m/Are you suggesting that given two objects a and b then line 1 is faster than line 2 in all cases? Surely it depends on how they are implemented?
What I suggested is that storing the result of an object's hashCode() is faster than keep calling equals() all the time! That is, if saved hash values don't match, no need to bother going to equals() as well!
@GoToLoop
PhiLho stated
My interpretation of this is some people are using
a.hashCode() != b.hashCode();
as an alternative to!a.equals(b);
even if they were not using a HashMap or HashSet collection. In this situation I don't see how anyone can claim one is faster than the other it will depend on their implementation.Your statement
This is certainly true inside the HashMap or HashSet but is not relevant outside of these collections, which is what I was getting at.
BTW I had a look at your DataHolder class
This would certainly meet the hashCode() contract but it would need extra code to ensure that the value of myFloat is ALWAYS constrained to >=0 and <1. At the very least myFloat should be declared private (although this won't work in an inner class, in which case it would need to be final) and if not final the class would need a public setter that constrains its value.
Also what happens when myFloat differs in the third decimal point for instance 0.000 and 0.004 would result in the same hashcode.
That's why I haven't made any extra effort to protect the myFloat field!
Just left some comments about its working range which gotta be respected for whomever uses that class! 8-|
Well, those warnings seem to work for Python & JS programmers after all! 8-}
What does that mean :-?
Although I dunno Python very well, it seems it doesn't feature any
private
or similar keywords! :O)Hey guys, you are amazing :). @PhiLho, thanks I will take a look in guava. I don't really wanted to bother about hashCode() creation, I mean I was just trying to have a Set to avoid duplicates... here is where I end up... :)
I thought that I should go for the tools the language provides... Let's see where it goes... Anyway thanks a bunch you all. I certainly will be needing more help in those maters.
:)