Core Java and J2EE: Java and J2EE performance tips

Performance planning for managers

Include budget for performance management.
Create internal performance experts.
Set performance requirements in the specifications.
Include a performance focus in the analysis.
Require performance predictions from the design.
Create a performance test environment.
Test a simulation or skeleton system for validation.
Integrate performance logging into the application layer boundaries.
Performance test the system at multiple scales and tune using the resulting information
Deploy the system with performance logging features.

Balancing Network Load with Priority Queues

Hardware traffic managers redirect user requests to a farm of servers based on server availability, IP address, or port number. All traffic is routed to the load balancer, then requests are fanned out to servers based on the balancing algorithm.

Popular load-balancing algorithms include: server availability (find a server with available processing capability); IP address management (route to the nearest server by IP address); port number (locate different types of servers on different machines, and route by port number); HTTP header checking (route by URI or cookie, etc).

Web hits should cater for handling peak hit rate, not the average rate.

You can model hit rates using gaussian distribution to determine the average hit rate per time unit (e.g. per second) at peak usage, then a poisson probability gives the probability of a given number of users simulatneously hitting the server within that time unit. [Article gives an example with gaussian fitted to peak traffic of 4000 users with a standard deviation of 20 minutes resulting in an average of 1.33 users per second at the peak, which in turn gives the probabilities that 0, 1, 2, 3, 4, 5, 6 users hitting the server within one second as 26%, 35%, 23%, 10%, 3%, 1%, 0.2%. Service time was 53 milliseconds, which means that the server can service 19 hits per second without the service rate requiring requests being queued.]

System throughput is the arrival rate divided by the service rate. If the ratio becomes greater than one, requests exceed the system capability and will be lost or need to be queued.

If requests are queued because capacity is exceeded, the throughput must drop sufficiently to handle the queued requests or the system will fail (the service rate must increase or arrival rate decrease). If the average throughput exceeds 1, then the system will fail.

Sort incoming requests into different priority queues, and service the requests according to the priorities assigned to each queue. [Article gives the example where combining user and automatic requests in one queue can result in a worst case user wait of 3.5 minutes, as opposed to less than 0.1 seconds if priority queues are used].

[Note that Java application servers often do not show a constant service time. Instead the service time often increases with higher concurrency due to non-linear effects of garbage collection].

Designing remote interfaces
Remote object creation has overheads: several objects needed to support the remote object are also created and manipulated.

Remote method invocations involve a network round-trip and marshalling and unmarshaling of parameters. This adds together to impose a significant latency on remote method invocations.

Different object parameters can have very different marshalling and unmarshaling costs.

A poorly designed remote interface can kill a program's performance.

Excessive remote invocation network round-trips are a huge performance problem.

Calling a remote method that returns multiple values contained in a temporary object (such as a Point), rather than making multiple consecutive method calls to retrieve them individually, is likely to be more efficient. (Note that this is exactly the opposite of the advice offered for good performance of local objects.)

Avoid unnecessary round-trips: retrieve several related items simultaneously in one remote invocation, if possible.

Avoid returning remote objects when the caller may not need to hold a reference to the remote object.

Avoid passing complex objects to remote methods when the remote object doesn't necessarily need to have a copy of the object.

If a common high-level operation requires many consecutive remote method calls, you need to revisit the class's interface.

A naively designed remote interface can lead to an application that has serious scalability and performance problems.

[Article gives examples showing the effect of applying the listed advice].

Detailed article on load testing systems
Internet systems should be load-tested throughout development.

Load testing can provide the basis for: Comparing varying architectural approaches; Performance tuning; Capacity planning.

Initially you should identify the probable performance and scalability based on the requirements. You should be asking about: numbers of users/components; component interactions; throughput and transaction rates; performance requirements.

Factor in batch requirements and performance characteristics of dependent (sub)systems. Note that additional layers, like security, add overheads to performance.

Logging and stateful EJB can degrade performance.

After the initial identification phase, the target should be for a model architecture that can be load-tested to feedback information.

Scalability hotspots are more likely to exist in the tiers that are shared across multiple client sessions.

Performance measurements should be from presentation start to presentation completion, i.e. user clicks button (start) and information is displayed (completion).

Use load-test suites and frameworks to perform repeatable load testing.

J2EE Application server performance
Good performance has sub-second latency (response time) and hundreds of (e-commerce) transactions per second.

Avoid using the SingleThreadModel interface for servlets: write thread-safe code instead.

ServletRequest.getRemoteHost() is very inefficient, and can take seconds to complete the reverse DNS lookup it performs.

OutputStream can be faster than PrintWriter. JSPs are only generally slower than servlets when returning binary data, since JSPs always use a PrintWriter, whereas servlets can take advantage of a faster OutputStream.

Excessive use of custom tags may create unnecessary processing overhead.

Using multiple levels of BodyTags combined with iteration will likely slow down the processing of the page significantly.

For read-only queries involving large amounts of data, avoid EJB objects and use JavaBeans as an intermediary to access manipulate and store the data for JSP access.

Use stateless session EJBs to cache and manage infrequently changed data. Update the EJB occasionally.

Use a dedicated session bean to perform and cache all JNDI lookups in a minimum number of requests.

Designing Entity Beans for Improved Performance
Remember that every call of an entity bean method is potentially a remote call.

Designing with one access method per data attribute should only be used where remote access will not occur, i.e. entities are guaranteed to be in the same container.

Use a value object which encapsulates all of an entity's data attributes, and which transfers all the data in one network transfer. This may result in large objects being transferred though.

Group entity bean data attributes in subsets, and use multiple value objects to provide remote access to those subsets.

use externalization instead of serialization

Common issues affecting Web performance
Symptoms of network problems include slow response times, excessive database table scans, database deadlocks, pages not available, memory leaks and high CPU usage.

Causes of performance problems can include the application design, incorrect database tuning, internal and external network bottlenecks, undersized or non-performing hardware or Web and application server configuration errors.

Root causes of performance problems come equally from four main areas: databases, Web servers, application servers and the network, with each area typically causing about a quarter of the problems.

The most common database problems are insufficient indexing, fragmented databases, out-of-date statistics and faulty application design. Solutions include tuning the index, compacting the database, updating the database and rewriting the application so that the database server controls the query process.

The most common network problems are undersized, misconfigured or incompatible routers, switches, firewalls and load balancers, and inadequate bandwidth somewhere along he communication route.

The most common application server problems are poor cache management, unoptimized database queries, incorrect software configuration and poor concurrent handling of client requests.

The most common web server problems are poor design algorithms, incorrect configurations, poorly written code, memory problems and overloaded CPUs.

Having a testing environment that mirrors the expected real-world environment is very important in achieving good performance.

The deployed system needs to be tested and continually monitored.

Use smart proxies to monitor the performance of RMI calls.

Best Practices for Developing High Performance Web and Enterprise Applications

Do not store large object graphs in javax.servlet.http.HttpSession. Servlets may need to serialize and deserialize HttpSession objects for persistent sessions, and making them large produces a large serialization overhead.

Use the tag "" to avoid creating HttpSessions in JSPs.

Minimize synchronization in Servlets to avoid multiple execution threads becoming effectively single-threaded.

Do not use javax.servlet.SingleThreadModel.

Use the HttpServlet Init method to perform expensive operations that need only be done once.

Minimize use of System.out.println.

Access entity beans from session beans, not from client or servlet code.

Reuse EJB homes.

Use Read-Only methods where appropriate in entity-beans to avoid unnecessary invocations to store.

The EJB "remote programming" model always assumes EJB calls are remote, even where this is not so. Where calls are actually local to the same JVM, try to use calling mechanisms that avoid the remote call.

Remove stateful session beans (and any other unneeded objects) when finished with, to avoid extra overheads in case the container needs to be passivated.

Beans.instantiate() incurs a filesystem check to create new bean instances. Use "new" to avoid this overhead.

J2EE worst practices
Directory servers are optimized for frequent reads, with few writes. If you frequently add data to a directory server, performance degrades.

Stateless session beans are soooo much faster.

J2EE Performance tuning
Call HttpSession.invalidate() to clean up a session when you no longer need to use it.

For Web pages that don't require session tracking, save resources by turning off automatic session creation using:

Implement the HttpSessionBindingListener for all beans that are scoped as session interface and explicitly release resources implementing the method valueUnbound().

Timeout sessions more quickly by setting the timeout or using session.setMaxInactiveInterval().

Keep-Alive may be extra overhead for dynamic sites.

Use the include directive where possible, as this is a compile-time directive (include action is a runtime directive).

Use cache tagging where possible.

Always access entity beans from session beans.

If only using an entity bean for data access, use JDBC directly instead.

Use read-only in the deployment descriptor.

Cache access to EJB homes.

Use local entity beans when beans are co-located in the same JVM.

Proprietary stubs can be used for caching and batching data.

Use a dedicated remote object to generate unique primary keys.

Caching. JNDI caching. Distributed caching with synchronization.

Optimized subsystems (RMI, JMS, JDBC drivers, JSP tags & cacheable page fragments).

EJB design
Some application server implementations (e.g., WebSphere) automatically convert remote communications to local communications to make them faster.

Low granularity (i.e. fine-grained) methods in an EJB typically leads to poor performance of the overall system.

Local interfaces in EJB 2.0 is one attempt to improve overall performance: local interfaces provide for beans in the same container to interact locally without involving RMI.

The most effective way to improve the overall performance of EJB-based applications is to minimize the amount of method invocations, making the communications overhead negligible compared with the execution time. This can be achieved by implementing coarse-grained methods.

Entity beans should not be simply mapped to database tables. Treating entity beans as such fine-grained objects which are effectively wrappers on table rows leads to increased network communications and heavier database communications than if entity beans are treated as coarse-grained components.

For optimal performance, entity beans should be designed to: have large granularity, which usually means they should contain multiple Java classes and support multiple database tables; be associated with a certain amount of persistent data, typically multiple database tables, one of which should define the primary key for the whole bean; support meaningful business methods and encapsulate business rules to access the data.

Don't use client transactions in the EJB environment since long-running transactions that can cause database lockup.

Entity beans are transactional resources due to their stateful nature, but application server vendors often rely on the underlying database to lock and resolve access appropriately. Although this approach greatly improves performance, it provides the potential for database lockup.

Use Command objects to automatically queue or retry RMI calls.

Caching RMI stubs.
Remote method calls are much slower than local calls, at least 1000 times slower.

Reduce the number of remote calls made by an application to improve performance.

Cache remote objects locally where possible, rather than repeatedly fetching them.

Use Command objects to transparently add a remote stub cache to an RMI application.

Caching stubs keeps them from being garbage collected, and may prevent an RMI server from closing. Use a policy to expire stubs and delete them from the cache.

Website performance.
Some e-commerce consultants cite an attention span on the order of eight seconds as the threshold for abandoning a slow retail site.

Where broadband connections are the norm, pages that don't appear instantly stand a good chance of never being seen: slow pages might as well be no pages.

Systems can only be designed to meet performance goals if those goals have been identified. Determine what range of response times will be acceptable.

Try to understand the performance impacts of your design decisions. However the performance of some design choices can be hard to predict and may remain unclear before testing.

Test the system under conditions that simulate real patterns of use.

Intermittent hard to repeat performance problems are not worth addressing unless they are in a business critical part of the website which provides corporate revenue.

Use a rapid, iterative development process in combination with frequent performance testing.

Try to plan up-front rather than have to rely on late-phase tuning.

Improving J2EE performance

Set performance goals before development starts.

If supporting clients with slow connections, consider compressing data for network communication.

Minimize the number of network round trips required by the application.

For applications to scale to many users, minimize the amount of shared memory that requires updating.

Cache data to minimize lookup time, though this can reduce scalability if locks are required to access the cache.

If there are more accesses than updates to a cache, share the access lock amongst all the accessors, though be aware that this reduces the window for updators to lock the cache.

For optimum performance, zero shared memory provides a cache per user.

Be methodical to ensure that changes for performance do actually improve performance.

Eliminate memory leaks before tuning execution speed.

Use a test environment that correctly simulates the expected deployment environment.

Simulate the expected client activity, and compare the performance against your expected goals.

Consider which metrics to measure, such as: Max response time under heavy load; CPU utilization under heavy load; How the application scales as additional users are added.

Profile the application to find the bottlenecks. Correct bottlenecks by making one change at a time and testing for improvement.

Generate stack traces to look for bottlenecks which are multi-thread conflicts (waiting for locks).

Improving the performance of a method that is called 1000 times and takes a tenth of a second on average each call, is better than improving the performance of a method that is only called 10 times but takes 1 second each call.

Don?t cache data unless you know how and when to invalidate the cached entries.

Why CMP is better than BMP
Use CMP except in specific cases when BMP is necessary: fields use stored procedures; persistence is not simple JDBC (e.g. JDO); One bean maps to multiple tables; non-standard SQL is used.

CMP can make many optimizations: optimal locking; optimistic transactions; efficient lazy loading; efficiently combining multiple queries to the same table (i.e. multiple beans of the same type can be handled together); optimized multi-row deletion to handle deletion of beans and their dependents.

Scalable recoverable applications
A database caching layer in the servlet helps performance. An EJB caching layer is difficult to achieve.

A load balancing message queue may be needed for a high rate of messages (>500/sec).

Stateful to Stateless Bean
Stateless session beans are much more efficient than stateful session beans.

Stateless session bean have no state. Most containers have pools of stateless beans. Each stateless bean instance can serve multiplw clients, so the bean pool can be kept small, and doesn't need to change in size avoiding the main pooling overheads.

A separate stateful bean instance must exist for every client, making bean pools larger and more variable in size.

[Article discusses how to move a stateful bean implementation to stateless bean implementtaion].

J2EE design patterns to improve performance
Combine multiple remote calls for state information into one call using a value object to wrap the data (the Value Object pattern, superceded by local interfaces in EJB 2.0).

Database performance
Thoughtful page design makes for a better user experience by enabling the application to seem faster than it really is.

EJB performance tip

EJB calls are expensive. A method call from the client could cover all the following: get Home reference from the NamingService (one network round trip); get EJB reference (one or two network roundtrips plus remote creation and initialization of Home and EJB objects); call method and return value on EJB object (two or more network rountrips: client-server and [mutliple] server-db; several costly services used such as transactions, persistence, security, etc; multiple serializations and deserializations).

If you don't need EJB services for an object, use a plain Java object and not an EJB object.

Use Local interfaces (from EJB2.0) if you deploy both EJB Client and EJB in the same JVM. (For EJB1.1 based applications, some vendors provide pass-by-reference EJB implementations that work like Local interfaces).

Wrap multiple entity beans in a session bean to change multiple EJB remote calls into one session bean remote call and several local calls (pattern called SessionFacade).

Change multiple remote method calls into one remote method call with all the data combined into a parameter object.

Control serialization by modifying unnecessary data variables with 'transient' key word to avoid unnecessary data transfer over network.

Cache EJBHome references to avoid JNDI lookup overhead (pattern called ServiceLocator).

Declare non-transactional methods of session beans with 'NotSupported' or 'Never' transaction attributes (in the ejb-jar.xml deployment descriptor file).

Transactions should span the minimum time possible as transactions lock database rows.

Set the transaction time-out (in the ejb-jar.xml deployment descriptor file).

Use clustering for scalability.

Tune the EJB Server thread count.

Use the HttpSession object rather than a Stateful session bean to maintain client state.

Use the ECperf benchmark to help differentiate EJB server performances.

Tune the Stateless session beans pool size to minimize the creation and destruction of beans.

Use the setSessionContext() or ejbCreate() method to cache bean specific resources. Release acquired resources in the ejbRemove() method.

Tune the Stateful session beans cache size to and time-out minimize activations and passivations.

Allow stateful session beans to be removed from the container cache by explicitly using the remove() method in the client.

Tune the entity beans pool size to minimize the creation and destruction of beans.

Tune the entity beans cache size to minimize the activation and passivation of beans (and associated database calls).

Use the setEntityContext() method to cache bean specific resources and release them from the unSetEntityContext() method.

Use Lazy loading to avoid unnecessary pre-loading of child data.

Choose the lowest cost transaction isolation level that avoids corrupting the data. Transaction levels in increasing cost are: TRANSACTION_READ_UNCOMMITED, TRANSACTION_READ_COMMITED, TRANSACTION_REPEATABLE_READ, TRANSACTION_SERIALIZABLE.

Use the lowest cost locking available from the database that is consistent with any transaction.

Create read-only entity beans for read only operations.

Use a dirty flag where supported by the EJB server to avoid writing unchanged EJBs to the database.

Commit the data after the transaction completes rather than after each method call (where supported by EJB server).

Do bulk updates to reduce database calls.

Use CMP rather than BMP to utilize built-in performance optimization facilities of CMP.

Use ejbHome() methods for global operations (from EJB2.0).

Tune the connection pool size to minimize the creation and destruction of database connections.

Use JDBC directly rather than using entity beans when dealing with large amounts of data such as searching a large database.

Combine business logic with the entity bean that holds the data needed for that logic to process.

Tune the Message driven beans pool size to optimize the concurrent processing of messages.

Use the setMesssageDrivenContext() or ejbCreate() method to cache bean specific resources, and release those resources from the ejbRemove() method

Servlet performance tips
Use the servlet init() method to cache static data, and release them in the destroy() method.

Use StringBuffer rather than using + operator when you concatenate multiple strings.

Use the print() method rather than the println() method.

Use a ServletOutputStream rather than a PrintWriter to send binary data.

Initialize the PrintWriter with the optimal size for pages you write.

Flush the data in sections so that the user can see partial pages more quickly.

Minimize the synchronized block in the service method.

Implement the getLastModified() method to use the browser cache and the server cache.

Use the application server's caching facility.

Session mechanisms from fastest to slowest are: HttpSession, Hidden fields, Cookies, URL rewriting, the persistency mechanism.

Remove HttpSession objects explicitly in your program whenever you finish the session.

Set the session time-out value as low as possible.

Use transient variables to reduce serialization overheads.

Disable the servlet auto reloading feature.

Tune the thread pool size.

High load web servlets
Hand off requests for static resources directly to the web server by specifying the URL, not by redirecting from the servlet.

Use separate webservers to deliver static and dynamic content.

Cache as much as possible. Make sure you know exactly how much RAM you can spare for caches, and have the right tools for measuring memory.

Load balance the Java application using multiple JVMs.

Use ulimit to monitor the number of file descriptors available to the processes. Make sure this is high enough.

Logging is more important than the performance saved by not logging.

Monitor resources and prepare for spikes.

JSP performance tips
Use the jspInit() method to cache static data, and release them in the jspDestroy() method.

Use the jspInit() method to cache static data.

Use StringBuffer rather than using + operator when you concatenate multiple strings.

Use the print() method rather than the println() method.

Use a ServletOutputStream rather than a PrintWriter to send binary data.

Initialize the PrintWriter with the optimal size for pages you write.

Flush the data in sections so that the user can see partial pages more quickly.

Minimize the synchronized block in the service method.

Avoid creating a session object with the directive

Increase the buffer size of System.out with the directive

Use the include directive instead of the include action when you want to include another page.

Minimize the scope of the 'useBean' action.

Custom tags incur a performance overhead. Use as few as possible.

Use the application server's caching facility, and the session and application objects (using getAttribute()/setAttribute()). There are also third-party caching tags available.

Session mechanisms from fastest to slowest are: session, Hidden fields, Cookies, URL rewriting, the persistency mechanism.

Remove 'session' objects explicitly in your program whenever you finish the session.

Reduce the session time-out as low as possible.

Use 'transient' variables to reduce serialization overheads.

Disable the JSP auto reloading feature.

Tune the thread pool size.

The user's view of the response time for a page view in his browser depends on download speed and on the complexity of the page. e.g. the number of graphics. A poorly-designed highly graphical dynamic website could be seen as 'slow' even if the web downloads are individually quite fast.

No web application can handle an unlimited number of requests; the trick in optimization is to anticipate the likely user demand and ensure that the web site can gracefully scale up to the demand while maintaining acceptable levels of speed.

Profile the server to identify the bottlenecks. Note that profiling can be done by instrumenting the code with measurement calls if a profiler is unavailable.

One stress test methodology is: determine the maximum acceptable response time for getting a page; estimate the maximum number of simultaneous users; simulate user requests, gradually adding simulated users until the web application response delay becomes greater than the acceptable response time; optimize until you reach the desired number of users.

Pay special attention to refused connections during your stress test: these indicate the servlet is overwhelmed.

There is little performance penalty to using an MVC architecture.

Use resource pools for expensive resources (like database connections).

Static pages are much faster than dynamic pages, where the web server handles static pages separately.

Servlet filtering has a performance cost. Test to see if it is an acceptable cost.

Ensure that the webserver is configured to handle the expected number of user for example: enough ready sockets; enough disk space; enough CPU.

Use the fastest JVM you have access to.

JMS performance tips

Start the consumer before you start the producer so that the initial messages do not need to queue.

Use a ConnectionConsumer to process messages concurrently with a ServerSessionPool.

Close resources (e.g. connections, session objects, producers, consumers) when finished with them.

DUPS_OK_ACKNOWLEDGE and AUTO_ACKNOWLEDGE perform better than CLIENT_ACKNOWLEDGE.

Use separate transactional sessions and non-transactional sessions for transactional and non-transactional messages.

Tune the Destination parameters: a smaller capacity increases message throughput; a higher redelivery delay and lower redelivery limit reduces the overhead.

Choose non-durable (NON_PERSISTENT) messages wherever appropriate to avoid the persistency overhead.

Set the TimeToLive value as low as feasible (default is for messages to never expire).

Receive messages asynchronously with a MessageListener implementation.

Choose the message type that minimizes memory overheads.

Use 'transient' variables to reduce serialization overheads.

Scaling middleware exposes a number of issues such as threading contention, network bottlenecks, message persistence issues, memory leaks, and overuse of object allocations.

[Article dicusses questions to ask when setting up benchmarks for messaging middleware].

Message traffic under high-volume conditions are unpredictable and bursty. Messages can be produced far faster than they can be consumed, causing congestion. This condition requires the message sends to be throttled with flow control (could be an exception, or an automatic resend).

When testing performance, run overnight and over weekends to generate longer term trends. Some concerns are: testing without a real network connection can give false measures; low user simulation can be markedly different from high user simulations; network throughput may be large than the deployed environment; nonpersistent message performance is dependent on processor and memory; disk speed is crucial for persistent messages.

[Article provides a benchmark harness for testing JMS].

Sun Community chat on Java BluePrints
For very large transactions, use transaction attribute TX_REQUIRED for EJB methods to have all the method calls in a call chain use the same transaction.

Make tightly coupled components local to each other. Put remote beans primarily as facades across subsystems.

The page-by-page pattern is designed to handle cases where the result set is large, and the end-user is not interested in seeing all of the results. There is really no upper threshold for the size of result set in the pattern.

Clustering with JBoss
A hardware- or software-based HTTP load-balancer usually sits in front of the application servers within a cluster. The load balancer can decrypt HTTPS requests and distribute load.

HTTP session replication is expensive for a J2EE application server. If you can live with forcing a user to log in again after a server failure, then an HTTP load-balancer probably provides all of the fail-over and load-balancing functionality you need.

If you are storing things other than EJB Home references in your JNDI tree, then you may need clustered JNDI.

24/7 availability needs the ability to hot-deploy and undeploy new applications and new versions, and to apply patches, without bringing down the application server for maintenance.

Smart proxies can be used to implement load-balancing and fail-over for EJB remote clients. These proxies manage a list of available RMI connections one of which it will use to service an invocation.

Speeding web page downloads using compression
Browsers sending "Accept-Encoding: gzip" will accept gziped compressed pages. Return the page compressed with "Content-Encoding: gzip" using GZIPOutputStream.

Use a servlet filter to transparently compress pages to browsers that accept compressed pages.

"Serialization" from "Java RMI"
Use transient to avoid sending data that doesn't need to be serialized.

Serialization is a generic marshalling mechanism, and generic mechanisms tend to be slow.

Serialization uses reflection extensively, and this also makes it slow.

Serialization tends to generate many bytes even for small amounts of data.

The Externalizable interface is provided to solve Serialization's performance problems.

Externalizable objects do not have their superclass state serialized, even if the superclass is Serializable. This can be used to reduce the data written out during serialization.

Use Serializable by default, then make classes Externalizable on a case-by-case basis to improve performance.

Web application scalability.

Web application scalability is the ability to sustain the required number of simultaneous users and/or transactions, while maintaining adequate response times to end users.

The first solution built with new skills and new technologies will always have room for improvement.

Avoid deploying an application server that will cause embarrassment, or that could weaken customer confidence and business reputation [because of bad response times or lack of calability].

Consider application performance throughout each phase of development and into production.

Performance testing must be an integral part of designing, building, and maintaining Web applications.

There appears to be a strong correlation between the use of performance testing tools and the likelihood that a site would scale as required.

Automated performance tests must be planned for and iteratively implemented to identify and remove bottlenecks.

Validate the architecture: decide on the maximum scaling requirements and then performance test to validate the necessary performance is achievable. This testing should be done on the prototype, before the application is built.

Have a clear understanding of how easily your configurations of Web, application, and/or database servers can be expanded.

Factor in load-balancing software and/or hardware in order to efficiently route requests to the least busy resource.

Consider the effects security will have on performance: adding a security layer to transactions will impact response times. Dedicate specific server(s) to handle secure transactions.

Select performance benchmarks and use them to quantify the scalability and determine performance targets and future performance improvements or degradations. Include all user types such as "information-gathering" visitors or "transaction" visitors in your benchmarks.

Perform "Performance Regression Testing": continuously re-test and measure against the established benchmark tests to ensure that application performance hasn?t been degraded because of the changes you?ve made.

Performance testing must continue even after the application is deployed. For applications expected to perform 24/7 inconsequential issues like database logging can degrade performance. Continuous monitoring is key to spotting even the slightest abnormality: set performance capacity thresholds and monitor them.

When application transaction volumes reach 40% of maximum expected volumes, it is time to start executing plans to expand the system

Web Load Test Planning
The only reliable way to determine a system?s scalability is to perform a load test in which the volume and characteristics of the anticipated traffic are simulated as realistically as possible.

It is hard to design and develop load tests that come close to matching real loads.

Characterize the anticipated load as objectively and systematically as possible: use existing log files where possible; characterize user sessions (pages viewed - number and types; duration of session; etc). Determine the range and distribution of variations in sessions. Don't use averages, use representative profiles.

Estimate target load and peak levels: estimate overall and peak loads for the server and expected growth rates.

Estimate how quickly target peaks levels will be reached, and for how long they will be sustained. The duration of the peak is important and the server must be designed to handle it.

The key elements of a load test design are: test objective (e.g. can the server handle N sessions/hr peak load level?); pass/fail criteria (e.g. pass if response times stay within define values); script description (e.g. user1: page1, page2, ...; user2: page1, page3, start transaction1, etc); scenario description (which scripts at which frequency, and how load increases).

Factor out constant computations from loops. For Servlets, push one time computations into the init() method.

Servlet sessions expire after a settable timeout, but screens that automatically refresh can keep a session alive indefinitely, even when the screen is no longer in use.

Report of how Ace's Hardware made their SPECmine tool blazingly fast
Tranform your data to minimize the costs of searching it.

If your dataset is small enough, read it all into memory or use an in-memory database (keeping the primary copy on disk for recovery).

An in-memory datavase avoids the following overheads: no need to pass data in from a separate process; less memory allocation by avoiding all the data copies as it's passed between processes and layers; no need for data conversion; fine-tuned sorting and filtering possible; other optimizations become simpler.

Pre-calculation makes some results faster by making the database data more efficient to access (by ordering it in advance for example), or by setting up extra data in advance, generated from the main data, to make calculating the results for a query simpler.

Pre-determine possible data values in queries, and use boolean arrays to access the chosen values.

Pre-calculate all formatting that is invariant for generated HTML pages. Cache all reused HTML fragments.

Caching many strings may consume too much memory. IF memory is limited, it may be more effective to generate strings as needed.

Write out strings individually, rather than concatenating them and writing the result.

Extract common strings into an identical string object.

Compress generated html pages to send to the user, if their browser supports compressed html. This is a heavier load on the server, but produces a significantly faster transfer for limited bandwidth clients.

Some pages are temporarily static. Cache these pages, and only re-generate them when they change.

Caching can significantly improve the responsiveness of a website.

Sun community chat on EJBs with Pravin Tulachan
CMP (container managed persistence) is generally faster than BMP (bean managed persistence).

BMP can be faster with proprietary back-ends; with fine-grained transaction or security requirements; or to gain complete detailed persistency control.

Scalability is improved by passing primary keys rather than passing the entities across the network.

EJB 2.0 CMP is far faster than EJB 1.1 CMP. EJB 1.1 CMP was not necessarily capable of scaling to high transaction volumes.

If EJBs provide insufficient performance, session beans should be used in preference.

Don't make fine-grained method calls across the network. Use value object and session facade design patterns instead.

J2EE best practices.
Executing a search against the database calls one of the finder() methods. finder() methods must return a collection of remote interfaces, not ValueObjects. Consequently the client would need to make a separate remote call for each remote interface received, to acquire data. The SessionFacade pattern suggests using a session bean to encapsulate the query and return a collection of ValueObjects, thus making the request a single transfer each way.

The Value Object Assembler pattern uses a Session EJB to aggregate all required data as various types of ValueObjects. This pattern is used to satisfy one or more queries a client might need to execute in order to display multiple data types.

EJBs are wonderful
The out-of-the-box configuration for Entity EJB engines, such as WebLogic, are designed to handle read-write transactional data with the best possible performance.

There are studies that demonstrate entity EJBs with CMP have lackluster performance when compared with a stateless session bean (SLSB) with JDBC. [Author points out however that SLSB/JDBC combination is less robust, less configurable, and less maintainable].

Configure separate deployments for each entity bean for different usage patterns (e.g. typical 85% read-only, 10% read-write, 5% batch update), and partition the presentation layer to use the appropriate corresponding deployment (e.g. read requests use the read-only deployment).

EJB performance tips
Design coarse-grained EJB remote interfaces to reduce the number of network calls required.

Combine remote method calls into one call, and combine the data required for the calls into one transfer.

Reduce the number of JNDI lookups: cache the home handles.

Use session bean wrapper for returning multiple data rows from an entity bean, rather than returning one row at a time.

Use session beans for database batch operations, entity beans typically operate only one row at a time.

Use container-managed persistence (CMP) rather than bean-managed persistence (BMP).

Use entity beans when only a few rows are required for the entity, and when rows need to be frequently updated.

Use the lowest impact isolation (transaction) level consistent with maintaining data coherency. Highest impact down: TRANSACTION_SERIALIZABLE, TRANSACTION_REPEATABLE_READ, TRANSACTION_READ_COMMITED, TRANSACTION_READ_UNCOMMITED.

Correctly simulate the production environment to tune the application, and use profiling and other monitroing tools to identify bottlenecks.

Tune the underlying system, e.g. TCP/IP parameters, file limits, connection pool parameters, EJB pools sizes, thread counts, number of JVMs, JVM heap size, shared pool sizes, buffer sizes, indexes, SQL queries, keep/alive parameters, connection backlogs.

Use clustering to meet higher loads or consider upgrading the hardware.

RMI performance tuning
Use netperf to measure network bandwidth.

Consider altering the TcpWindowSize parameter.

Configure RMI garbage collection by setting the properties sun.rmi.dgc.client.gcInterval and sun.rmi.dgc.server.gcInterval.

Send groups of objects together rather than one object at a time.

Implementing Externalize can speed up transfers.

Pack data to reduce the number and amount of reads and writes, and the amount of data transferred.

Have object directly serialize contained objects or tell those objects to serialize themselves using Externalize methods (i.e. chain Externalize methods for all contained objects).

Use special codes to handle special cases such as singleton or reusable objects.

Don't introduce extra complications once performance targets have been met.

Local entity beans
Local entity beans do not need to be marshalled, and do not incur any marshalling overhead for method calls either: parameters are passed by reference.

Local entity beans are an optimization for beans which it is known will be on the same JVM with their callers.

Facade objects (wrappers) allow local entity beans to be called remotely. This pattern incurs very little overhead for remote calls, while at the same time optimizing local calls between local beans which can use local calls.

Enterprise Java Performance
RMI over IIOP has a higher overhead than plain RMI.

Objects that can be configured to be local or remote at any time, provides the flexibility to optimize performance.

Large grained remote calls [i.e. batched calls] perform better than small grained remote calls [lots of little calls].

Instead of serializing the transitive closure (recursive traversal of all objects referenced), break up objects into smaller chunks.

Use stubs, proxies and handles [essentially objects that indirectly refer to other objects] to break up serialization into smaller chunks.

Unless the application is put together with care, the remote method call costs may dominate.

Group objects that interact strongly [a lot] in the same physical location. The closer they are, the more efficient their interaction.

Cache in the client any read-only objects, for the whole session. Replicate any data needed so that queries run locally in the client.

Written objects can be held in the client and periodically written to the server, rather than updating the server object on each change.

Good partitioning of objects in distributed applications limits interactions between objects in different partitions and takes advantage of local method access for objects within each partition.

Application partitioning is best addressed early in the design.

How to use java.rmi.MarshalledObject
MarshalledObject lets you postpone deserializing objects. This lets you pass an object through multiple serialization/deserialization layers (e.g. passing an object through many JVMs), without incurring the serialization/deserialization overheads until absolutely necessary.

J2EE challenges
Thoroughly test any framework in a production-like environment to ensure that stability and performance requirements are met.

Each component should be thoroughly reviewed and tested for its performance and security characteristics.

Using the underlying EJB container to manage complex aspects such as transactions, security, and remote communication comes with the price of additional processing overhead.

To ensure good performance use experienced J2EE builders and use proven design patterns.

Consider the impact of session size on performance.

Avoid the following common mistakes: Failure to close JDBC result sets, statements, and connections; Failure to remove unused stateful session beans; Failure to invalidate HttpSession.

Performance requirements include: the required response times for end users; the perceived steady state and peak user loads; the average and peak amount of data transferred per Web request; the expected growth in user load over the next 12 months.

Note that peak user loads are the number of concurrent sessions being managed by the application server, not the number of possible users using the system.

Applications that perform very little work can typically handle many users for a given amount of hardware, but can scale poorly as they spend a large percentage of time waiting for shared resources.

Applications that perform a great number of computations tend to require much more hardware per user, but can scale much better than those performing a small number of computations.

EJB Clustering
Four locations that can provide clustering logic for an EJB are: the JNDI naming server where the home stub is bound, the container, the home stub, and the remote stub.

J2EE Application servers
A scalable server application probably needs to be balanced across multiple JVMs (possibly pseudo-JVMs, i.e. multiple logical JVMs running in the same process).

Performance of an application server hinges on caching, load balancing, fault tolerance, and clustering.

Application server caching should include web-page caches and data access caches. Other caches include caching servers which "guard" the application server, intercepting requests and either returning those that do not need to go to the server, or rejecting or delaying those that may overload the app server.

Application servers should use connection pooling and database caching to minimize connection overheads and round-trips.

Using one thread per user can become a bottleneck if there are a large number of concurrent users.

Hans Bergsten's top ten JSP tips
The include directive () is faster than the include action ().

redirects are slower than forwards because the browser has to make a new request.

Database access is typically very expensive in terms of server resources. Use a connection pool to share database connections efficiently between all requests, but don't use the JDBC ResultSet object itself as the cache object.

Moving from JSP to EJB
Entity EJBs should contain aggregate get/set methods that return chunks of data rather than fine-grained get/set methods for individual attributes, to reduce unnecessary database, transactional, and network communication overheads.

Avoid stateful session beans as they are resource-heavy, since one instance is maintained for each client.

Under heavy loads, entity beans should do more than merely represent a table in a database. If you are merely retrieving and updating data values, consider using JDBC within session beans instead.

If you have one large database host but only a small Web and middleware host, consider moving much of your logic into stored procedures and calling them via JDBC in session beans.

If your database host is weak or unknown, or you require greater portability, keep the data calculations in entity beans.

Consider using a single stateless session bean to provide access to other EJBs (this is a façade pattern). This optimizes multiple EJB references and calls by keeping them in-process.

Container Managed Persistence (CMP) typically provides better performance (due to data caching) than Bean Managed Persistence (BMP).

Judging various aspects of Java, including performance
J2EE defines component models with high scalability potential. Maximizing scalability requires sticking to stateless session beans and handling all database interactions programmatically (through pooled JDBC connections).

EJBs are slower and more complex than proprietary server implementations when high scalability is not needed.

JMS vs RMI
RMI calls marshall and demarshall parameters, adding major overhead.

Every network communication has several overheads: the distance between the sender and the receiver adds a minimum latency (limited by the speed the signal can travel along the wire, about two-thirds of the speed of light: London to New York would take about 3 milliseconds); each network router and switch adds time to respond to data, on the order of 0.1 milliseconds per device per packet.

Part of most network communications consists of small control packets, adding significant overhead.

One RMI call does not generally cause a noticeable delay, but even tens of RMI calls can be noticeable to the users.

Beans written with many getXXX() and setXXX() methods can incur an RMI round trip for every data attribute.

Messaging is naturally asynchronous, and allows an application to decouple network communications from ongoing processing, potentially avoiding threads from being blocked on communications

Pseudo Sessions for JSP, Servlets and HTTP
Use pseudo sessions rather than HttpSessions to improve web server scalability.

Pseudo sessions reside on file instead of in memory, thus both decreasing memory and allowing sessions to be distributed across multiple servers.

Pseudo sessions do not use cookies, instead they alter URLs to encode the session, and so reduce the generation of session objects by cookie-declining browsers.

Clustering for J2EE and Java application servers. Looks at Bluestone Total-e-server, Sybase Enterprise Application Server, SilverStream Application Server, and WebLogic Application Server.
Clustering should allow failover if a machine/process crashes. For stateful sessions, this requires state replication.

Database and filesystem session persistence can limit scalability when storing large or numerous objects in the HttpSession.

To scale the static portions of your Website, add Web servers; to scale the dynamic portions of your site, add application servers.

Multicasting efficiency
When dealing with large numbers of active listeners, multicast publish/subscribe is more efficient than broadcast or multiple individual connections (unicast).

When dealing with large numbers of listeners with only a few active, or if dealing with only a few listeners, multicasting is inefficient. This scenario is common in enterprise application integration (EAI) systems. Inactive listeners require all missed messages to be resent to them in order when the listener becomes active.

A unicast-based message transport, such as message queuing organized into a hub-and-spoke model, is more efficient than multicast for most application integration (EAI) scenarios

Weblogic's RMI framework
Use a single, multiplexed, asynchronous, bidirectional connection for RMI client-to-network traffic instead of the standard reference implementation using multiple sockets.

Try to improve the serialization mechanism for faster RMI [Externalization is better].

Use local calls for objects located in the same JVM.

Minimize distributed garbage collection.

Use smart stubs which provide data caching and localized execution in addition to the normal remote execution and data fetching capabilities.

HTTP sessions vs. stateful EJB
The comparative costs of storing data in an HTTP session object are roughly the same as storing the same data in a stateful session bean.

Failure to remove an EJB that should have been removed (from the HTTP session) carries a very high performance price: the EJB will be passivated which is a very expensive operation.

I/O Performance
Default Serialization is slow.

Use the transient keyword to define fields to avoid having those fields serialized. Examine serialized objects to determine which fields do not need to be serialized for the application to work.

Use network probes to break down how the network is being used by the various networked applications on it.

Server performance testing
Use session beans as a façade to your entity beans to encapsulate the workflow of one entire usecase in one network call to one method on a session bean (and one transaction).

Optimizing entity beans
Use container-managed persistence when you can. An efficient container can avoid database writes when no state has changed, and reduce reads by retrieving records at the same time as find() is called.

Minimize database access in ejbStores. Use a "dirty" flag to avoid writing tee bean unless it has been changed.

Always cache references obtained from lookups and find calls. Always define these references as instance variables and look them up in the setEntityContext (method setSessionContext for session beans).

Avoid deadlocks. Note that the sequence of ejbStore calls is not defined, so the developer has no control over the access/locking sequence to database records

EJB best practices
To avoid resources being held unnecessarily for long periods, a transaction should never encompass user input or user think time.

Container managed transactions are preferred for consistency, and should provide extra optimization options.

Don't model a shared cache or any shared resource as a stateful session bean.

Stateless session beans are easier to scale than stateful session beans. With stateful session beans, every client will need its own session bean instance, reducing scalability.

Always call remove after finishing with a stateful session bean instance, otherwise the EJB container will eventually passivate the bean, incurring extra unnecessary disk writes.

J2EE clustering
To support distributed sessions, make sure: all session referenced objects are serializable; store session state changes in a central repository.

Avoiding memory leaks in EJBs
Make sure that any beans which have session scope implement the HttpSessionBindingListener interface

Explicitly release any resources that may be used within the bean by implementing the valueUnbound() callback.

Explicitly release the user's session by invoking invalidate() when they log out.

Try setting the session invalidation interval to a smaller value than the default 30 minutes.

Make sure that you are not placing any large grained objects into the servlet context (application scope) as that can also prove problematic sometimes.

Experiences building a servlet
Keep the size of the client tier small so that downloads are fast.

Use the servlet init() and destroy() methods to start and stop limited and expensive resources, such as database connections.

Make the servlets thread-safe and use connection pooling.

Use PreparedStatements rather than plain Statement objects.

Use database stored procedures

RMI arguments
Some application servers can automatically pass parameters by reference if the communicating EJBs are in the same JVM. To ensure that this does not break the application, write EJB methods so that they don't modify the parameters passed to them.

Choosing a J2EE application server, emphasizing the importance of performance issues
Application server performance is affected by: the JDK version; connection pooling availability; JDBC version and optimized driver support; caching support; transactional efficiency; EJB component pooling mechanisms; efficiency of webserver-appserver connection; efficiency of persistence mechanisms.

Servlet Filters
Servlet Filters provide a standardized technique for wrapping servlet calls.

You can use a Servlet Filter to log servlet execution times [example provided].

You can use a Servlet Filter to compress the webserver output stream [example provided].

Implementing clustering on a J2EE web server (JBoss+Jetty)
The different EJB commit options affect database traffic and performance. Option 'A' (read-only local caching) has the smallest overhead.

Hardware load balancers are a simple and fast solution to distributing HTTP requests to clustered servers.

EJB2.0 Container-Managed Persistence
EJB 2.0 Container-Managed Persistence provides local interfaces which can avoid the performance overheads of remote interfaces.

Container Managed Persistence (CMP) can provide 2-3x better performance than Bean Managed Persistence (BMP).

Stateless session beans can support multiple clients, thus increasing scalability.

An HTTP layer is not always necessary. Connecting directly to EJBs is faster and provides automatic load balancing.

Minimizing space taken by HTTP downloads
Use HttpConnection.getLength() to determine the number of bytes needed to to hold the data from a download.

Use a ByteArrayOutputStream to accumulate results if the content length is indeterminate.

The best performance is obtained from a 1.1 compliant webserver using persistent connections.

Faster JSP with caching
The (open source) OSCache tag library provides fast in-memory caching.

Cache pages or page sections for a set length of time, rather than update the page (section) with each request.

Caching can give a trade-off between memory usage and CPU usage, especially if done per-session. This trade-off must be balanced correctly for optimal performance.

Precompile your JSPs one way or another to avoid the first user having a slow experience

Architecting and Designing Scalable, Multitier Systems
Separate the UI controller logic from the servlet business logic, and let the controllers be mobile so they can execute on the client if possible.

Validate data as close to the data entry point as possible, preferably on the client. This reduces the network and server load. Business workflow rules should be on the server (or further back than the front-end).

You can use invisible applets in a browser to validate data on the client.

Optimizing dynamic web pages
Dynamic generation of web pages is more resource intensive than delivering static web pages, and can cause serious performance problems.

Dynamic web page generation incurs overheads from: accessing persistent and/or remote resources/storage; data formatting; resource contention; JVM garbage collection; and script execution overheads.

Dynamic content caching tries to mitigate Dynamic web page generation overheads by reusing content that has already been generated to service a request.

JSP cache tagging solutions allow page and fragment level JSP output to be automatically cached.

On highly personalized sites page-level caching results in low cache hit rates since each page instance is unique to a user.

Component-level caching applies more extensively when components are reused in many pages, but requires manual identification of bottleneck components.

Precompiling JSPs
Precompile your JSPs one way or another to avoid the first user having a slow experience.

Tuning tips intended for Sun's "Web Server" product, but actually generally applicable.
Use more server threads if multiple connections have high latency.

Use keep-alive sockets for higher throughput.

Increase server listen queues for high load or high latency servers.

Avoid or reduce logging.

Buffer logging output: use less than one real output per log.

Avoid reverse DNS lookups.

Write time stamps rather than formatted date-times.

Separate paging and application files.

A high VM heap size may result in paging, but could avoid some garbage collections.

Occasional very long GCs makes the VM hang for that time, leading to variability in service quality.

Doing GC fairly often and avoiding paging is more efficient.

Security checks consume CPU resources. You will get better performance if you can turn security checking off.

JMS & CORBA

Asynchronous messaging is a proven communication model for developing large-scale, distributed enterprise integration solutions. Messaging provides more scalability because senders and receivers of messages are decoupled and are no longer required to execute in lockstep

Sun community chat session with Bill Shannon, Kevin Osborn, and Jim Glennon on JavaMail
You might see a performance increase by using multiple connections to your mail server. You would need to get multiple Transport objects and call connect and sendMessage on each of them, using multiple threads (one per connection) in your application.

JavaMail 1.2 includes the ability to set timeouts for the initial connection attempt to the server.

JavaMail tries to allow you to make good and efficient use of the IMAP protocol. Fetch profiles are one technique to allow you to get batches of information from the server all at once, instead of single pieces on demand. Used properly, this can make quite a difference in your performance.

JMS redelivery
Both auto mode (Session.AUTO_ACKNOWLEDGE) and duplicate delivery mode (Session.DUPS_OK_ACKNOWLEDGE) guarantee delivery of messages, but duplicate okay mode can have a higher throughput, at the cost of the occasionally duplicated message.

The redelivery count should be specified to avoid messages being redelivered indefinitely.

Maximizing Servlet Performance
Try to optimize the servlet loading mechanism, e.g. by listing the servlet first in loading configurations.

Core Java and J2EE

Monday, June 21, 2010

Java and J2EE performance tips

No comments:

Post a Comment

Followers

Blog Archive

About Me