Home

BlogFace.org

Free Whitepapers!

nolacoaster

FDL

View

Advertisement

September 25th, 2009

Separating Java: Objects

Add to Memories Tell a Friend
FDL
Now, what about objects? Certainly Java, as an object-oriented language, provides many features for creating an using objects. Many of those those features, unfortunately, seem to obfuscate the essence of objects. In some ways the most important features of objects are that they implement an interface, and that we can treat them as merely their interface. Here is an interface, and means of constructing objects that implement it. Notes to follow.

public interface IntSet2 {
  public IntSet2 add(int i);
  public boolean contains(int i);
}

final public class IntSet2s {
  private static IntSet2 append(final int i, final IntSet2 set) {
    return new IntSet2() {
 
      public IntSet2 add(int i) {
        return append(i, this);
      }
 
      public boolean contains(int j) {
        return i == j || set.contains(j);
      }  
    };
  }
  public IntSet2 singleItemIntSet(final int i) {
    return new IntSet2() {
      public IntSet2 add(int i) {
        return append(i, this);
      }
 
      public boolean contains(int j) {
        return i == j;
      }
    };
  }
  public IntSet2 evenIntSet() {
        return new IntSet2() {
      public IntSet2 add(int i) {
        return append(i, this);
      }
 
      public boolean contains(int i) {
        return i % 2 == 0;
      }
    };
  }
  public IntSet2 emptyIntSet() {
    return new IntSet2(){
      public IntSet2 add(int i) {
        return singleItemIntSet(i);
      }
 
      public boolean contains(int i) {
        return false;
      }
    };
  }
}
 
Notes
  • All the objects created in this program are anonymous instances of the IntSet2 interface. Any code that want to can implement this interface! 
  • We have completely side-stepped classes. We use constructor functions to create instances of IntSet2.
  • Code cannot use an instanceof test to tell which implementation of the IntSet2 is being used. In fact, it has no way of telling the difference between the different implementations other than by observing their behavior through the interface.
  • Each implementation of interface can have a completely different representation, and it is impossible, even for instances of the same interface, to see the representation of an IntSet2.
  • Benefits: Programming to interfaces can give great flexibility by allowing any piece of code to implement it, so long as the implementation conforms to the documented behavior. 
  • Caveats: It can sometimes be hard to implement efficient code in some cases where this pure object style is used.
     

Separating Java: ADTs

Add to Memories Tell a Friend
FDL
In Java, Abstract Data Types and Objects are rather intertwined, often to the detriment of programmers. In this post and the next, I'll try to separate the two concepts as best as I understand. Note that most of these ideas, and the Set example specifically, come to me from William Cook's recent essay. The ideas are also well-covered in TAPL, without the Java code.

Here, I will use a subset of Java's features to express an ADT, such as one would have in a language like ML or Ada. It's a (not very efficient) set. Notes follow.

final public class IntSet {
 
  private int[] members;
  private IntSet() {
    this.members = new int[0];  
  }
  public static IntSet empty() {
    return new IntSet();
  }
  public static IntSet add(IntSet set, int i) {
    if( Arrays.binarySearch(set.members, i) != -1 ) {
      return set;
    }
    else {
      IntSet result = new IntSet();
      int old_length = set.members.length;
      result.members = Arrays.copyOf(set.members, old_length + 1);
      result.members[old_length] = i;
      Arrays.sort(result.members);
      return result;
    }
  }
  public static boolean contains(IntSet set, int i) {
    return Arrays.binarySearch(set.members, i) != -1;
  }

  public static IntSet union(IntSet s1, IntSet s2) { /* Performs merge sort. */ }
}

Notes:
  • The class Set is the abstract data type. It has been declared final, which tells the programmer that the representation is fixed and known completely inside of the class.
  • The members field is private to hide the representation. 
  • The constructor is private so that new Sets must be created through one of the existing introduction means, namely empty().
  • All methods are static. No dynamic dispatch is being used here. We do not want to send messages to objects that can handle that message in an arbitrary way, rather we want to perform some operation on a hidden representation by functions inside of the abstraction that are allowed to view the hidden representation.
  • Benefits: Functions inside the Set class can operate on sets with complete knowledge of a set's representation. This can occasionally be beneficial, particularly in cases where performance is important. Knowing that all sets are implemented as sorted arrays of ints can sometimes allow me to write better performing methods. (Binary methods in particular, but other methods as well.) For instance, in the union method above which is incomplete, can perform its task using merge sort if that is more efficient. Still, the representation is hidden from the rest of the program, so we can change things internally without affecting code as long as we don't change the public methods.
  • Caveats: No one else can define a Set that is compatible with this Set! If a certain method expects an IntSet of the sort we've just defined, then the IntSet can only be created through this class, and not through any other means. The representation for all IntSets is fixed (but hidden).

April 30th, 2009

I just encountered something in Java that I thought was worth mentioning. It's not exactly a gotcha, but it's good to be aware of.

If you have a nested instance class, synchronizing on this in the inner class is not the same as synchronizing on this in the outer class. This may seem obvious, but it is worth noting because the inner class can refer to methods of either the inner or the outer class in a seemingly ambiguous manner (without explicit qualification). Here's an example program, and its output:

public class WhatIsSyncQualThis {
 
class IJustDriftedAway {
void superFoo() {
synchronized(IJustDriftedAway.this) {
if( Thread.holdsLock(WhatIsSyncQualThis.this) )
System.out.println("1 - held");
}
synchronized(this) {
if( Thread.holdsLock(WhatIsSyncQualThis.this) )
System.out.println("2 - held");
}
synchronized(WhatIsSyncQualThis.this) {
if( Thread.holdsLock(this) )
System.out.println("3 - held");
}
synchronized(this) {
if( Thread.holdsLock(this) )
System.out.println("4 - held");
}
}
}
void fooIt() {
(new IJustDriftedAway()).superFoo();
}
public static void main(String[] args) {
System.out.println("Running");
(new WhatIsSyncQualThis()).fooIt();
System.out.println("Done");
}
}

Output:
Running
4 - held
Done

If you know how nested instance classes are implemented, this is not at all surprising. A separate class is created that takes as a constructor argument a reference to an instance of the outer class. So there are two separate instances upon which we are synchronizing.


February 23rd, 2009

I am currently working on a document. It's kind of a manual. Maybe more like a manefesto. But really, it's a collection of the ideas and strategies that I use when writing Java code. As it turns out, many of those ideas and strategies come from my attempts (and the attempts of other's that I have learned) to use Java in a style that is more like functional languages, particularly the ML languages with which I have some experience.

This document is really targetted at people with some Java experience, and some functional experience, but people who would not consider themselves experts. In particular, it's not targetted at academics, or at least not at PL people. But hopefully it may contain some ideas that you find cool.

Part of the reason that I am posting this now rather than after I have completed a first draft is due to my relatively slow progress. I am hoping that if I post what I've got, you guys may give me some feedback and some cooler ideas, and that this enthusiasm will be enough to help me complete this in a reasonable amount of time. That being said, after this post I will not put up any more blog posts until I at least have an entire draft. If you want to wait until then to read, please be my guest.

So without futher ado, the current draft of, "How to Hack Java Like a Functional Programmer" will be posted at the following URL, and will be updated just as often as I work on it until it is complete:
http://www.nelsbeckman.com/java_like_an_fp.pdf

November 24th, 2008

The Atomic Nugget design pattern is a way of encapsulating some action that a client wants to perform atomically with respect to a provider class. The idea is that the client will write a "first-class function," in the form of an anonymous inner class, and then give that action to the provider, who will then internally acquire and release the necessary synchronization. In order to make this work, we will need an interface for our first-class functions:

public interface Lambda<T1, T2> {
public T2 eval(T1 t);
}

This interface is the type of first-class functions that take one argument and return one argument. The next thing we'll need is an interface that signifies the fact that my provider class is an atomic nugget, which will allow client's code to be run atomically. As I previously mentioned, and like the lock/unlock solution, this requires some forsight from the developer.

public interface AtomicNugget<T> {
public <S> S atomic(Lambda<T,S> operation);
I've been telling people about the Atomic Nugget design pattern for a while, but I have not yet described it in detail anywhere. Atomic Nugget is a design pattern of my own creation that allows for better encapsulation with respect to synchronization in Java and similar languages. And yes, I mostly just like it because of the name, but I do happen to think it's legitamately useful. In this post, I'll just briefly talk about why Atomic Nugget is useful. In the next post, I'll give the code.

In Java, there are simple ways to protect the integrity of your objects with respect to multi-threading if you are the class implementer; you can make all of your public methods synchronized. This will ensure that access from multiple threads to an object instance are effectively serialized. (Of course, it's not always as simple as that, but to a first approximation, this strategy works.) However, what if you are implementing a class that depends on other thread-shared classes? For example, the following worker thread references a thread-shared work queue that is internally synchronized. 

class WorkerThread { 
  private Deque workQueue = // ...
  public void run() { while(true) { if( !this.workQueue.isEmpty() ) doWork( this.workQueue.pop() ) }  }
}

Please ignore the fact that this worker class has many problems. I'm trying to keep the examples short. There is one problem I would like you to focus on, however. The problem is the race condition. If another thread removes the last item from the work queue in between the time this thread calls isEmpty, and the time it calls pop, an exception will be thrown on the call to pop. This is a symptom of a more general problem. The designer of the queue has delineated certain critical sections when they wrote the methods of the queue class. However, as a client, I want to enlarge those critical sections, but in order to do so, I need access to the locks or synchronization primitives that are used internally to the queue. How do people solve this problem now?
  1. I could try to make this work by synchronizing on workQueue. However, this is very brittle and violates encapsulation. It requires me to look inside the implementation of the work queue, and ensure that the methods I want to call are synchronized on the work queue object itself, and not some other private member object. Also, if the implementation of the work queue ever changes, to use a different synchronization strategy, my code will be broken.
  2. The implementer can provide "lock" and "unlock" methods. This requires class implementers to think ahead to imagine how their classes might be used in the future. (My solution will require this too, as you will see.) But it also painfully requires clients to remember to call the unlock method, which in the simple case might not be to much additional work, but in reality requires clients to use a "finally" block, so that exceptions will not cause locks to be held on to indefinitely. And in general, it's just not nice to have to put your synchronization into the hands of your client.
Enter the Atomic Nugget.

August 14th, 2008

Java You Weirdo!

Add to Memories Tell a Friend
FDL
The following Java is legal. What does it even mean?

public class WeirdAssign {
    static int foo() {
	int a = a = 1; //Strange line
	return a;
    }
    public static void main(String[] args) {
	System.out.println("a is: " + foo());
    }
}


The output of this test is, "A is: 1".

May 2nd, 2008

This week I discovered a "gotcha" in the Java language that goes into the 'performance' pile, rather than the 'unusual behavior' pile. It has to do with iterators.

See I've been working on this implementation of Software Transactional Memory in Java, and the performance was pretty slow. Now I'm not really a run-time guy, so this fact normally wouldn't bother me, but it was slow enough that a static optimization that I developed was getting lost in the noise. So I busted out the old profiler to see what was slowing things down. I immediately discovered that our code was spending a lot of time iterating over sets, or at least that's how I first interpreted the results. And this made since to me. The run-time uses tons of sets; read sets, write sets, undo sets. And every time a transaction aborts or commits, we're iterating over lots of them, like so:

for( TxnRecord txn_rec : writeSet ) {
  // do something
}

Seems normal enough. So I replace the HashSet implementation with a LinkedHashSet, which should improve iteration performance. I was a little surprised when this improved the performance not one bit. Then I looked at the profiler results again and realized the following: It wasn't "next()" or "hasNext()" taking all the time, which would indicate slow iteration, it was the actual call to the "iterator()" method, which Java5's enhanced for loops call behind the scenes.

After investigating the implementation of LinkedHashSet I realized something surprising. For empty sets, a call to the "iterator()" method allocates at least two new objects, one for the new keySet of the underlying HashMap, and one for the iterator object itself. All this for an empty set. It turns out that my code was absolutely full of empty sets, since most objects aren't modified in a transaction and their write and undo sets are empty. If you have enough empty sets, this allocation adds up over time. By replacing the above code with:

if( !writeSet.isEmpty ) {
  for( TxnRecord txn_rec : writeSet ) {
    // do something
   }
}

The overall running time of the benchmark went from ~11secs to ~7secs!

This is bad, because iterating over empty collections should essentially have no run-time cost. Looking at the Java implementation, it would be easy to have some kind of singleton, "empty iterator" object that the HashSet's iterator would return if the set were empty. For example:

iterator() {
  if( size() == 0 ) return Collections.<Iterator<T>>emptyIterator();
  else return map.keySet().iterator(); // This is what it currently says.
}

This singleton iterator could then always return false when "hasNext" is called. The only reason why this might not be acceptable is if there is some rule about iterators needing to reflect subsequent "add" operations to their underlying collection, but honestly that would seem a little strange of a requirement. At the very least, the singleton iterator could hold a pointer to the original collection and continue to call size() after its creation...

Josh Block, are you listening?

April 15th, 2008

Java Type Inference Sucks

Add to Memories Tell a Friend
FDL
Why doesn't this work?! Boo Java.
Variable unpacked_var =
(this_copy.unpackedVar == other.unpackedVar ?
this_copy.unpackedVar :
throw new IllegalStateException("Unhandled casee"));

January 28th, 2008

Concurrency

Add to Memories Tell a Friend
FDL
List of Concurrent Java Programs:
Just wanted to let you all know officially that I have started a page on my site where I list a bunch of concurrent programs written in Java. The idea is that they may be useful for my own research, and so far it has been hard finding large lists of concurrent programs so that I could pick out one actually relevant for my work. Hopefully this can help change that problem, except that there are only three entries so far... Dig the gnar-gnar CSS while you're at it!

Wait/Notify:
While I'm on the subject of Java concurrency, something came up twice on Friday that I wanted to bring up again; wait/notify. As you may know, Java provides monitors as its built-in all-purpose synchronization primitive. These monitors provide both locking, for mutual exclusion, and wait/notify, for thread synchronization. Other than busy-waiting and thread join, the only means by which one can cause a thread to hold in Java is to call wait and then wait around for another thread to call notify on that same object. Both a research colleague and a paper written by my advisor claimed that wait/notify are rarely used in Java programs. While this certainly may be true, does this imply that most concurrent code written in Java only needs mutual exclusion primitives? What about things like thread barriers, and other thread-ordering devices? Don't people need monitors for those sorts of things?

Performance Tension:
One strange sensation I encountered occurred while attempting to write a concurrent dataflow analysis for the Crystal analysis framework, a Java analysis framework written in Java. There was a point where I found an obvious performance problem, but didn't want to fix it, deciding instead to wait until the program itself actually showed poor performance. This action itself caused me some pause. This action was undoubtedly caused by the phrase that has been ingrained in my subconscious, "premature optimization is the root of all evil." But of course I was busy implementing a concurrent library class whose whole point (ostensibly) was to increase analysis performance. It at least made me realize that there is some tension between what we are always taught about optimization, that it is a discussion best saved until the end of the day, and concurrency, which often permeates the architecture of the program itself. In fact the changes necessitated some global restructuring of the program. "Interesting," is all I could really conclude.

September 27th, 2007

Java 'finally'

Add to Memories Tell a Friend
FDL
I have learned about a feature of Java that interests me. Actually, according to this web page, it's one of the shameful things about Java:

In a try/finally block, the finally code really is called in all situations. That's good, because we want to be able to do some in all cases clean-up code a lot of the times. But what happens if you 'intercept' normal control flow? For instance, the code inside the try block wants to return, but we don't let it? Turns out, this code will not return.

void foo() {
  while(true) {
    try {
      return;
    } finally {
      continue;
} } }


While this seems a little bit wild, it's actually really helpful for my current project. You see we're translating some new programming constructs into straight-up Java. The problem is, a user could call return in his code when, according to the semantics of our new features, the method shouldn't return. In summary, I was glad that this was the case, because it lets met avoid some gnarly rewriting of the input code.

September 18th, 2007

Java 5 Concurrency

Add to Memories Tell a Friend
FDL
Now that I have a little extra time, graciously granted to me because of my completion of two paper submissions (ICSE and WICSA), I thought I'd point out some cool things.

The Java standard library has at least two concurrency features that I was unaware of, and am now intrigued by.
The first feature (new in Java5) is atomic objects. Java's atomic objects make the development of non-blocking algorithms and data structures actually possible, by providing atomic compare and set operations for primitive and object types. The neat thing about their implementation is that they automagically use the fastest underlying implementation available on your hardware, which is neat, since sometimes really fast compare and swap primitives actually exist.

The other feature is Futures, which allow delayed, concurrent computations. Futures have been a feature of functional programming languages for a while (and honestly, are probably a more appropriate abstraction in their world), but are still neat nonetheless. They allow you to compute some value off in the background, in a parallel thread. When you need the result of that computation, the library blocks until it has finished. So in practice, you get a little bit of extra concurrency without much intellectual overhead.

The point is, if you're programming concurrent code in Java, there's a lot of stuff already available, and it's worth your time getting familiar with the library in order to avoid reimplementing anything.
Powered by LiveJournal.com