![]() |
![]() |
|||||
|
The C++ Standard Library provides the std::map container for collections of items that need to be looked up by a "key" identifier. This is much more flexible that an array container like std::vector that can only look items up by a contiguous set of integer indexes. While std::map (and its new C++11 relative std::unordered_map) have some admirable complexity guarantees they have much worse performance than an array. For fast technical applications we would love to have the semantics and syntax of std::map but the performance of an array. Happily, there are a number of situations where, with a little work, we can craft such a solution. The C++ map containers are designed to provide good complexity guarantees for large containers with an arbitrary mix of insertion, lookup, and deletion operations. Achieving this asymptotic complexity comes at the cost of a large time and space overhead that is wasteful for collections that don't fall in this category. Collections that are small or where lookups dominate insertion/deletion are common in scientific and engineering applications. An example that is prominent in ObjexxSISAME is vectors and matrices indexed by the axes for the model's degree of freedom set (ObjexxSISAME supports any meaningful 1, 2, or 3D model DOF set): these have only up to 6 keys and the key set is fixed once they are initialized. Or we may be modeling proteins where we need to look up the 20 canonical amino acids by a named key. Using a map for lookups on such containers is taking a massive, yes massive, performance hit. So, how do we create this fast map-like container? Well, there are variations depending on the specifics but the basic ingredients are:
The performance impact of using a custom key map system can be dramatic, as the benchmark results shown here demonstrate. The specific design can vary depending on such criteria as whether the key-to-index map is common to many containers and whether it is small enough that repeating it is worth the space cost to avoid the extra indirection, whether the key-to-index map is known at compile time (and thus can be a template argument, avoiding both space and indirection costs). Additional notes:
The takeaway from this is that use of std::map and similar containers should be limited in technical software and that there are much faster containers for many situations where maps get used. Objexx has developed a suite of fast key map containers for various scenarios that we have used with great results in our in-house and client applications. Python has built-in support for parallel processing via the multiprocessing, subprocess, and thread packages but does not provide any tools to help use system resources effectively to run a set of jobs in parallel. That requires the ability to measure the system resource loads and start queued jobs when sufficient resources become available. It is not hard to build such a package with the right tools and some basic observations. There are not a lot of python packages for measuring system resources and loads but psutil is a good choice. Using psutil you can obtain metrics such as the total and available system memory, and the total or per-core CPU load. Because psutil cannot measure the i/o load on a system, there is not a good way to avoid running too many i/o-bound jobs in parallel. The best simple approach is probably to add a cap on the number of jobs that can be running in parallel. This is not an ideal approach but can suffice for most purposes until a better package becomes available.
A basic job controller and Job class system can be written with a modest amount of code. This probably has:
Some considerations in building such a system include:
Qt provides a rich GUI framework with excellent support for tables but when your need a feature that isn't directly supported things can get tricky. One such feature for tables is a so-called "frozen" column or row that should remain fixed like the headers when scrolling. Qt table widget and view classes do not support frozen columns or rows and enabling them can be both tricky to get right and an efficiency problem.
Frozen rows and columns are useful when you they contain label-like content that doesn't belong in the header for some reason, such as it is editable. In our case, we use a frozen row in numerical tables for the dimensional units of the columns as in the example shown here. As you scroll the table it is useful to keep the units in view and sometimes the units can be changed via drop-down combo boxes to convert the values. Qt's Frozen Column Example presents the basics of an approach but we found it lacking in a few ways:
With careful attention to these details a defect-free and efficient Qt frozen column or row table can be implemented. PySide is a Python binding for Qt. PySide is similar to the PyQt system but provides LGPL licensing and a community development model that makes it more attractive for commercial projects than PyQt. While the commercial license cost of PyQt is a factor the bigger issue with PyQt is the need to build it manually on each development system for every release: unlike the GPL version no binary installers are provided for the commercial PyQt. We have been waiting for the right time to attempt migrating some projects from PyQt to PySide. The main issues were the maturity/stability of PySide and support for our full toolchain. We had experimented with early PySide releases with some applications and found that, despite a few problems and the lack of matplotlib plotting support, the GUIs ran correctly. With the recent release of PySide 1.1.0 and matplotlib 1.1.0 with PySide support it seemed that the time had come to attempt the full migration. Recently we successfully ported a couple of engineering applications from PyQt to PySide. BasicsThe basics of a PyQt to PySide port can be found in the Differences Between PySide and PyQt page. Among other items this includes:
Here are some other porting tips we learned along the way:
PackagingPackaging with PyInstaller was expected to be an easy migration from PyQt but we hit a few speedbumps. The biggest was that PyInstaller sees matplotlib importing both PySide and PyQt modules and so it will bundle libraries from both of them if both are installed. If you have a commercial application but have the GPL PyQt installed (for non-commercial use, of course) this could create at least the appearance of a licensing issue. It took some trial and error to get the right solution because the PyInstaller documentation is a bit hand-wavy and because you have to look at the binaries and pure return values from Analysis() to see what to exclude. Adding the right excludes argument to the Analysis() constructor in the spec file does the trick: a = Analysis( ..., excludes=['sip', 'PyQt4', 'PyQt4.QtCore', 'PyQt4.QtGui']) If you simply remove PyQt dependencies from a.binaries after Analysis() is built you will still have the PyQt dependencies in your binaries and they will fail to run without the PyQt libraries. Using matplotlib's pylab/pyplot interface causes PyInstaller to build a dependency on Tk into your application executables even though Tk is not used in a PySide application. Fixing this to avoid shipping Tk libraries with your application means growing our excludes call. On Windows it looks like this: a = Analysis( ..., excludes=['sip', 'PyQt4', 'PyQt4.QtCore', 'PyQt4.QtGui',
'_tkinter', 'tk85.dll', 'tcl85.dll'])
Since you are presumably using matplotlib's Qt4Agg backend it is nice to also eliminate the other backends. We find that you can just delete them from the binary collection PyInstaller generates but why not exlude them from the beginning and save the unnecessary copying/deletion: a = Analysis( ..., excludes=['sip', 'PyQt4', 'PyQt4.QtCore', 'PyQt4.QtGui',
'_tkinter', 'tk85.dll', 'tcl85.dll',
'matplotlib.backends._tkagg',
'matplotlib.backends._gtkagg',
'matplotlib.backends._backend_gdk'])
You may also need to modify the matplotlibrc and matplotlib.conf files PyInstaller puts in the mpl-data subdirectory to change the default backends to Qt4Agg. If your system-wide versions of those files already have this change then you won't need to do it as part of the packaging process. Another issue we found was that the generated executables don't freeze in the QT_API=pyside environment choice set during the build phase so that your application will still need QT_API to be set at runtime. To avoid making users need to know/set this it is best to move the QT_API setup inside your code before the first matplotlib import that could happen in any code path: os.environ[ 'QT_API' ] = 'pyside' import matplotlib ObjexxSISAME uses OpenSceneGraph (OSG) for real-time 3D visualization. We integrated an OSG viewer into the Qt-based GUI being developed for ObjexxSISAME 2.0. The Qt integration "cookbook" for OSG has changed with OSG 3.0 and is still being refined. Various examples are helpful but none give the whole picture. After some effort we have this running nicely and here are notes on the approach that we found best:
With this hard-won knowledge we have OSG running happily in the ObjexxSISAME 2.0 code base. We are using YAML style input files for a number of applications because they are easy to read and edit outside of a GUI. We were dismayed to find that loading times were quite slow for files of modest size (5K lines, 200KB) when using stock YAML parsers such as yaml-cpp and libyaml loading. Digging into the specification and stock implementations reveals why parsing YAML is slow: YAML supports many features and syntax variations that complicate parsing and can require super-linear look ahead processing. Here's an example from the YAML 1.2 specification:
!!map {
? !!str "implicit block key"
: !!seq [
!!map {
? !!str "implicit flow key"
: !!str "value",
}
]
}
Clearly, the readability and elegance of YAML has been lost along with good performance. Realizing that we do not want or need syntactic support for higher level capabilities (since we are building that into our file structure semantics where it belongs) we found that very simple and fast (linear time) parsers could be written (in C++ and Python for our purposes) for this simplified YAML. These custom parsers are now in use in ObjexxSISAME and a Python-based wrapper developed as part of the reengineering/modernization of a legacy FEM application F2PY is a solid tool for interfacing a Python front end application with a Fortran compute engine but we found some limitations and quirks that led us to develop a nonstandard method for using it. Using F2PY in the standard, documented fashion has some problems. If you are not using the canonical Fortran compiler, especially on Windows, you can have difficulty getting F2PY to find and use your compiler. And if you want or need to use certain compiler switches the process for doing that is not easy. The problem that F2PY makes is trying to store and maintain knowledge about all the major Fortran compilers on different platforms, down to the level of the format that they report their versions with. Inevitably, much of this information is out of date and thus support for your platform/compiler combination within the F2PY system requires a bit of hacking. This is worse on Windows, primarily because the F2PY developer does not use Windows (Objexx has contributed a few patches and problem reports for these issues). There are many F2PY mailing list reports about it failing to find and use the compiler that was requested. After investing some time in trying to get F2PY to work with alternative compilers on Windows Objexx went a different route and we extracted the necessary parts and do our own compiling, just using F2PY to generate the interfaces. This is not hard to do and eliminates much grief. Here is an outline of the steps we followed:
Veneer InterfaceAnother recommendation for fast and reliable F2PY use is to only present a lightweight Fortran veneer with the interfaces that the Python needs to access to F2PY and to keep the actual computational code in separately built libraries. This interface is essentially a set of wrappers that call the real computational code that lives in these separate libraries. This has the benefits of faster F2PY processing and lower likelihood of F2PY seeing something that trips it up. It also keeps !f2py comments (that are used to get the desired interface without customizing the .pyf file by hand) out of your computational Fortran code, where they can confuse developers not familiar with F2PY and could be altered or removed, which is hard to notice in a large source file. Another benefit is that it clearly identifies and limits the interface used by Python. Pluggable User Fortran LibrariesUsing an interface veneer as described above enables pluggable dynamic/shared Fortran libraries that can be built without installing or using F2PY, which is very desirable for end user custom libraries, and swapped in and out for each run of your application. The approach is as follows:
Argument-Free F2PY CallbacksObjexx has developed a method for using callbacks within the Fortran that does not require passing the callback routine throughout the Fortran argument lists, which is a burden and can degrade the clarity of the code. This requires use of a Fortran 2003 feature called procedure pointers, which are supported by recent versions of a number of the major Fortran compilers including both Intel Fortran and GFortran. This method involves storing the pointer to the callback procedure in a module that is then made available in any routine by adding a USE module statement in that routine. This is a fairly obvious approach in C-based languages but until procedure pointers were added to Fortran there was no way to get the callback reference to a Fortran routine other than by passing it as an argument. The callback routine and associated procedure pointer is defined in the callback module like this: ! Callback routine interface
INTERFACE
SUBROUTINE callback_prototype( msg, level )
CHARACTER(*), INTENT(IN) :: msg
INTEGER, INTENT(IN) :: level
END SUBROUTINE callback_prototype
END INTERFACE
! Callback procedure pointer
PROCEDURE( callback_prototype ), POINTER :: callback_ptr => NULL()
On every entry into the Fortran the callback needs to be registered with the Fortran since the Fortran DLL state is not carried over. That is done by calling a routine like this: SUBROUTINE message_caller( message_callback ) USE CallbackModule IMPLICIT NONE ! Arguments EXTERNAL :: message_callback !f2py character*(*) msg !f2py integer level !f2py call message_callback(msg,level) ! Set the message callback procedure pointer callback_ptr => message_callback END SUBROUTINE message_caller and putting a call like this at the top of every routine that is an entry point from Python: CALL message_caller( message_callback ) Then within the Fortran messages can be sent via the registered callback by adding the line: USE CallbackModule to the routine and making calls like this: CALL message_call( msg, LogLevel_FATAL ) where message_call is a wrapper routine that lives in the callback module that looks like this: SUBROUTINE message_call( msg, level ) CHARACTER(*), INTENT(IN) :: msg INTEGER, INTENT(IN) :: level CALL message_callback_ptr( TRIM( msg ), level ) END SUBROUTINE message_call This setup appears a bit complicated but all the details are in the callback module and the client code just has to add one USE statement and can then do callbacks without adding arguments throughout the Fortran call tree. F2PY LoggingLeveraging the F2PY callback capabilities, logging systems can be built in Python and Fortran that work together. First, this means that the Python code has a nicely featured logging system that can do things such as routing log messages to one or more "sinks" that can include the console in a CLI run, a log pane/widget in a GUI run, and a log file for any run. Also, the level at which logging happens, e.g., informational messages and higher or also debug messages when debugging, can be controlled by the user. Python's built-in logging package provides this capability. The next step is to add a Fortran logging module, that exploits the argument free callback mechanism, that can accept messages from anywhere in the Fortran with levels (info, warning, …) and can both act on those messages within the Fortran and pass them back to the Python where they can be wired into the Python logging mechanism. In addition to simple text messages, interfaces can be added for other message types, such as the time step completed and the current estimate or max number of time steps that the Python GUI can use to display a progress bar. This integration is also important when the Fortran is doing a STOP to terminate a run due to some problem: the Fortran can notify the Python that it is about to terminate, clean up resources, and then do the STOP. This is especially important when using Python multiprocessing so that processes are not left hanging, which happens if the Fortran dies with signaling the Python. A fairly small amount of boilerplate code is required to make this all work and the results are well worth the effort. |
ArticlesPython Parallel Job Controller |
|||||
| | | | | | | | | | | ||||||
| Copyright © 2013 Objexx Engineering, Inc. All Rights Reserved. | ||||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |