Welcome!

Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Related Topics: Java IoT, IBM Cloud, Weblogic

Java IoT: Article

Java Feature — Concurrent Queries

A pattern for improving database query performance

In the ConcurrentQueryThreadImpl class, the runQuery() method first checks to see if any previously submitted query threads have finished and need to be reaped. This is important because the list of running threads is constrained so that too many queries can't run at once and overload the database server. So we want to get these threads processed and off the list first to make room for more query threads to be invoked. Once a query thread has been reaped then there's room on the list for another query thread. If there's room on the running threads list and there are queued queries waiting to be submitted (e.g., queries that previously had to wait because the running thread list was full) then they get submitted first before the query being passed to the runQuery() method. The query being passed in would then have to go onto the end of the list. Otherwise, if there's room on the running threads list and no queued queries, the caller's query will be immediately submitted.

The ConcurrentQueryThreadImpl class contains a private QueryThread class that extends Thread. This class starts a new thread, runs the SQL query, and holds onto the results (or an SQLException, if one occurred) until the ConcurrentQueryThreadImpl processes the results and removes the thread from the list. See Listing 5.

Once the ConcurrentQueryThreadImpl notices that the QueryThread is finished, it calls the processResults() method of the CanResolveAConcurrentQuery interface reference that the domain object has implemented, marks the processed object as reaped via the same interface, and removes the QueryThread from the list of running threads. Besides the getInstance() method that gives visibility into the singleton class, the public user interface for this class simply consists of the runQuery() and waitForAllQueriesToComplete() methods.

A Variation Using Thread Pools
In situations where concurrent queries can be used extensively, there might be some uneasiness about starting a new thread for each query and having it exit when that query is completed. In such cases, I'd recommend using a callable thread pool available in the java.util.concurrent package. Threads of this type would have an advantage over normal threads in that a) they can be pooled, b) they can throw an exception, and c) they can return a result. As an exercise, I've implemented a callable thread pool version of the QueryThread class that the ConcurrentQueryThreadImpl class can use to run queries. This class, a private class named QueryThreadPool, implements the Callable interface, instantiates a thread pool the size of the constraint of the maximum number of queries we want to have running at once, and puts the main unit of work of the thread inside the call() method. The source for the QueryThreadPool class is in Listing 6.

To make it easier to switch between the two threading models, a simple interface was extracted from the original QueryThread implementation named IsAConcurrentQueryThreadRunner, mandating the following methods: getResultSet(), getSQLException(), and isAlive(). See IsAConcurrentQueryThreadRunner.java below.

package net.sourceforge.concurrentQuery.article.concurrent;

import java.sql.ResultSet;
import java.sql.SQLException;

public interface IsAConcurrentQueryThreadRunner {

    public ResultSet getResultSet();
    public SQLException getSQLException();
    public boolean isAlive();
}

This interface is used on the ConcurrentHashMap lists that hold the references to the running query threads. Now, it's possible to change a few references of the QueryThread to the QueryThreadPool and vice-versa to switch between the two threading models. Of course, a factory to create the threading model based on a properties file would be more efficient, but outside the immediate scope of our discussion. The entire source for the ConcurrentQueryThreadImpl class is in Listing 7.

A Second, More Robust Implementation
To demonstrate use further, I've put together a more elaborate implementation of this pattern that builds a large object from real database queries. This database has one table that lists cities with large populations, their districts (or states), and the countries in which they reside. For this example, I have built a single object that contains a list of countries that have more than 75 cities. The CountryList object contains a list of its districts, each district contains a list of its cities. All of this is in one big object. Once it's built, the results are printed. Below is Partial output from printing the CountryList object.

=== stuff deleted ===

Country Code: USA
     District: Alabama
         city name: Birmingham, population: 242820
         city name: Huntsville, population: 158216
         city name: Mobile, population: 198915
         city name: Montgomery, population: 201568
     District: Alaska
         city name: Anchorage, population: 260283

=== stuff deleted -==

Once built, this object contains 11 countries, 350 districts each associated with its country, and 2,233 cities each associated with its district. I've implemented the solution using a concurrent query pattern that uses a factory to create a concurrent query object with the desired threading model (normal threads, callable thread pool, or runnable thread pool). Then I created a factory broker singleton class that reads the threading model, JDBC settings. and the number of connections from a properties file and invokes the proper factory to create the concurrent query object. If I use one connection, thus simulating a serialized approach, it takes about 30 seconds on average to construct this object (this doesn't include the amount of time needed to print the results). If I use two connections concurrently, the process of constructing the object takes only about 7.7 seconds. Using three connections gets the time down to 5.2 seconds. Your mileage may vary and you will eventually hit a point of diminishing returns where adding more concurrent connections won't improve performance.

Consider the CountryList domain class in Listing 8 that accepts an argument for the number of cities, builds a list of countries that have more than that number of cities, and then constructs a list of the districts in each country.

Note that the processResultSet method is defined in the ResolvableFromConcurrentQuery interface. Also, the DistrictList class, which is instantiated by the CountryList object, is a domain object that participates in concurrent queries and will invoke a CityList object, yet another concurrent query domain object. And all of this happens using threaded queries and queued queries on lists to manage them. Notice too that in this implementation that I've chosen to have the domain objects explicitly call the resolve() method of the ConcurrentQuery object rather than build a notification into the interface as the previous implementation did with the isReaped() method. The resolve() method waits for all the running threads and queued queries to complete before continuing. The tradeoff is whether or not it's more feasible to have each getter in the domain object check to be sure it's reaped or whether it's better to have the domain objects explicitly wait to be resolved.

So, in general, a concurrent query implementation will likely have a mechanism to invoke a SQL query without waiting for the SQL results, and a way to ensure that an object is properly built before it's used - either by having the business logic explicitly wait for all results to finish after invoking some concurrent queries, or by having the domain object itself recognize that it hasn't processed its SQL results and requests to wait for those results.

When To Use Concurrent Queries
I wouldn't propose using a concurrent query pattern as a general rule for all database access because of resource constraints, but I believe there are many applications that could benefit from occasional use. This pattern fits most easily with POJOs that already build and execute and process results for their own SQL queries. The following are characteristics of applications that might benefit:

  • Database and server resources are adequate and the database server isn't already under duress.
  • Your application is already using JDBC queries.
  • Your application controls when queries are run and when the results are processed (e.g., not using an external tool for building, managing, and running queries).
  • You're not having issues with the number of connections available to the database server.
If so, then it might be feasible to implement this pattern. Remember, you can always configure the number of queries allowed to run concurrently to one, essentially running your application as a regular serialized JDBC query/result model, if resource constraints become an issue.

Conclusion Such a simple pattern can be implemented in a few hours and the results might help a project over some bumpy performance issues. A few items worth noting that didn't seem to fit in anywhere else:

  • Concurrent queries don't have to be implemented using threads. Since most database servers are multithreaded themselves they usually return control back to the client after a query has been parsed and submitted while the database server works on the query. If you hold the connection then you can check for the results later without having to use threads (e.g., set a timeout to zero and check for a result). Of course, the threading approach is pretty efficient and I personally like that model better. While it's entirely possible to use JDBC and hold the connection without immediately processing the result, the de facto standard for Java/JDBC development, up to this point, has been to submit queries and process results in one operation. But, when using a language or platform whose threading package isn't trustworthy then this pattern can be implemented without threads. In a previous project, I implemented a variation of this pattern using C and ODBC without threads.
  • If you access a singleton concurrent query implementation from threaded clients then you might need to synchronize methods or blocks strategically in the concurrent query singleton.
  • I've never implemented this pattern with objects that insert, update or delete data, but I suppose it could be done. I've never implemented this pattern to participate in a transaction, but that too should be possible.
  • Besides building query-intensive objects faster, another potential use for this pattern could be in improving front-end user response time by pre-fetching data. For instance, suppose that after a user logs in to your application, his likely next choice would be to pull a list of active orders, view a list of products, or view their account settings. Concurrent queries could be used to build objects for all three potential choices immediately after the user logs in. By the time the user decides on which option to choose, the domain objects would be immediately available, or at least closer to being available than if the object started to be constructed after the user made a choice. Of course, an expiration date on the object would be in order in case the user takes 30 minutes to make a choice. Sure, you might end up building an object that you don't use, but I've had several instances where the perceived user response time was more valuable than the application resources. I don't like fast food restaurants that have my burger made before I actually order it, but I'm not as picky about my data.
For More Information
All of the sources found here, plus the source for the implementation of the list of countries example is available on sourceforge.net. Since concurrent query is more of a pattern than a packaged solution, the project on sourceforge.net is just a sample implementation intended for perusal. Sources are available for download from http://sourceforge.net/projects/concurrentquery.

The SleepyObject (ant target: run-example1) and ConcurrentSleepyObject (ant target: run-example2) are found in the article package and use a Postgres database. View the readme for instructions on creating the sleep function in Postgres. Other database servers might have a built-in function (e.g., waitfor in MS SQL) that could be substituted.

The country list example (ant target: run-ModelDriver) uses a MySQL database server. The DDL and data to create the city table is included and instructions for loading are also in the readme file.

More Stories By Andy Pardue

Andy Pardue is a senior software developer who has specialized in the medical software industry for over 15 years, 11 years as a telecommuter from his home office in Mesquite, Texas. He can be reached at: [email protected]

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
JDJ News Desk 12/19/06 01:30:06 PM EST

Does this sound familiar? You have a domain object, perhaps for reporting purposes, that's built from a ton of JDBC queries and it takes too long to load. Nothing else happens until this object is built, so it's become a bottleneck. Even worse, each of the queries is actually well tuned, so there isn't much to gain from modifying the queries themselves - there are just too many of them. You don't want to change (or can't change) your data model, so what can be done to alleviate this problem short of a major redesign? There are several options like caching, lazy loading, resource pooling. Another worthy option would be to implement a variation of the concurrent query pattern.

IoT & Smart Cities Stories
The challenges of aggregating data from consumer-oriented devices, such as wearable technologies and smart thermostats, are fairly well-understood. However, there are a new set of challenges for IoT devices that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Ochandarena discusses a whole new way to think about your next...
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
All in Mobile is a place where we continually maximize their impact by fostering understanding, empathy, insights, creativity and joy. They believe that a truly useful and desirable mobile app doesn't need the brightest idea or the most advanced technology. A great product begins with understanding people. It's easy to think that customers will love your app, but can you justify it? They make sure your final app is something that users truly want and need. The only way to do this is by ...
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
Cell networks have the advantage of long-range communications, reaching an estimated 90% of the world. But cell networks such as 2G, 3G and LTE consume lots of power and were designed for connecting people. They are not optimized for low- or battery-powered devices or for IoT applications with infrequently transmitted data. Cell IoT modules that support narrow-band IoT and 4G cell networks will enable cell connectivity, device management, and app enablement for low-power wide-area network IoT. B...
The hierarchical architecture that distributes "compute" within the network specially at the edge can enable new services by harnessing emerging technologies. But Edge-Compute comes at increased cost that needs to be managed and potentially augmented by creative architecture solutions as there will always a catching-up with the capacity demands. Processing power in smartphones has enhanced YoY and there is increasingly spare compute capacity that can be potentially pooled. Uber has successfully ...
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...