Sunday, June 14, 2009

Hashtable Performance in SAS, C++, Java, and VB

This course is a one credit independent study course that will compare and contrast hashtable implementation and performance in four programming languages.

The reading for the course will include sections of the following texts dealing with hashtables: The C++ Programming Language by B. Stroustrup, Exploring Java, C# in a Nutshell, Using Hash Objects in SAS 9 by B. Fehlner, and How to Implement the SAS Data Step Hash Object by Bill Parman.

The programming component will consist of implementing a hashtable in each language with multiple data files (one with 10, 100, and 1000 rows of hash table data and 1000, 10,000, 100,000, and 1,000,000 rows of data to check with the hash table). Metrics on the amount of time needed for the code to run will be collected and compared and the results graphed and analyzed in SAS. The coding reference for the SAS programming needed to analyze the results will be Sharpening your SAS Skills by S. Gupta and will include proc tabulate and proc report functions.

There will be a term paper requirement which will review some background information on hashtables as well as review the code and tools needed to implement them for this project.

The following development environments will be used on a Windows XP PC:

Sun Microsystems Java(TM) SE Development Kit 6 Update 10
Microsoft Visual Basic 2008 (Part of Microsoft Visual Studio 2008 Version 9.0.30729.1, running with Microsoft .NET Framework Version 3.5 SP1
Microsoft Visual C++ 2008 (Part of Microsoft Visual Studio 2008 Version 9.0.30729.1, running with Microsoft .NET Framework Version 3.5 SP1
SAS(TM) Learning Edition 4.1.99.492 running SAS system version 9.13040.17368.10510

Also a SAS Unix environment will be used for testing large SAS data files.

The results of the performance analysis will then be presented.

In addition to the speed of execution, ease of coding, ease of using the environment, cost, and system impact of each language will be considered.

Timeline outlining the progress of the work.

Here is a monthly breakdown of what will be done:

By the end of January – Complete the reading and have the SAS code complete.
By the end of February – Have the C++ code complete.
By the end of March – Have the Java code complete.
By the end of April – Have the VB code complete.
By the end of May – Have the term paper complete.

Monthly updates will be provided to the supervising professor. The grade will be based on the quality of the final paper.

A web search showed other universities offer instruction in computer science courses in hashtable analysis, often as part of courses in algorithms.

http://www.cs.grinnell.edu/~walker/courses/153.sp02/lab-hashtables-inheritance.html
http://www.uh.edu/grad_catalog/nsm/cosc_courses.html

One goal of the course is to help prepare for the SAS Advanced Programming for SAS 9 Certification Exam. For this purpose, the following text will also be used.

SAS Certification Prep Guide: Advanced Programming for SAS 9 Paperback

  • Paperback: 992 pages
  • Publisher: SAS Publishing (November 30, 2007)
  • Language: English
  • ISBN-10: 1599945592
  • ISBN-13: 978-1599945590
  • http://www.cs.uwm.edu/~cchelwig/

    No comments:

    Post a Comment