If you’re using D2 (or Smartview) you probably have realized that the LoadOnStartup parameter which is required for certain features (ie: application list) also makes D2 startup slower.
Depending on the number of repositories/applications this startup time can be very long or extremely long (we’re talking about a web application). As an example, our servers take ~5-7 minutes to complete the startup of D2, but even “funnier” is the situation with our laptops, where (thanks to all the security AV, inventory tools and such) the same war file from dev/qs/prod can take ~20 minutes to start.
How is this possible? Well, I asked myself the same question when I accidentally had to check some code on some classes related to this topic. What I found was that what LoadOnStartup does is basically this:
- Populates the type cache
- Populates the labels for the types
- Caches the configuration objects (so if you have a lot, this will take some “seconds”)
- Populates the labels for every attribute and language (and this will take even minutes depending on how many languages/attributes you have, as it happens for everything, not discriminating even internal attributes not used)
You can see these in the following excerpt from D2.log in full debug mode:
Refresh cache for type : dm_document
Refresh cache for type : d2_pdfrender_config
...
Refresh format label cache
...
Refresh xml cache for object d2_portalmenu_config with object_name MenuContext
...
Refresh en attribute label cache
..
Is this really a problem? Well, not “by default”, but if you have 5-6 repositories with many types and many languages, this becomes… slow.
So, taking a look at the code, we saw that most of the time (or the most significant situation) was happening with the attribute label cache, which would stop for 30-40 seconds for each language. This code is located in com.emc.common.dctm.bundles.DfAbstractBundle.loadProperties method:
query.setDQL(dql);
col = query.execute(newSession, 0);
while(col.next()) {
String name = col.getString("name");
String label = col.getString("label");
if (col.hasAttr("prefix")) {
String prefix = col.getString("prefix");
StringBuffer tmpName = new StringBuffer(prefix);
tmpName.append('.');
tmpName.append(name);
name = tmpName.toString();
}
This dql is “select type_name as name, label_text as label from dmi_dd_type_info where nls_key = ‘en’” which can return tens of thousands of results, and this is executed for each language configured in the repository. And this code is called from the com.emc.d2fs.dctm.servlets.init.LoadOnStartup class:
int count = docbaseConfig.getValueCount("dd_locales");
for (int i = 0; i < count; i++) {
String strLocale = docbaseConfig.getRepeatingString("dd_locales", i);
DfSessionUtil.setLocale(session, LocaleUtil.getLocale(strLocale));
LOG.debug("Refresh {} type label cache", strLocale);
DfTypeBundle.clearCache(session);
DfTypeBundle.getBundle(session);
LOG.debug("Refresh {} attribute label cache", strLocale);
DfAttributeBundle.clearCache(session);
DfAttributeBundle.getBundle(session);
}
The getBundle method is running at the end that query for the labels… So improvement possibilities? Clear one: multithreading. We modified this block of code to run with multiple threads (one per language), and what happened? We cut down the startup time by 2-3 minutes, fantastic, right? Yes π but then we thought: The log clearly shows that LoadOnStartup is a sequential process that repeats the same stuff for each repository… so could we run the “repository initialization” in parallel? Let’s see:
Iterator<IDfSession> iterator = (Iterator<IDfSession>)object;
while (iterator.hasNext()) {
IDfSession session = iterator.next();
try {
if (session != null) {
refreshCache(session);
if (cacheBocsUrl)
loadBocsCache(session);
}
} finally {
try {
if (session != null)
session.getSessionManager().release(session);
} catch (Exception exception) {}
}
}
This block of code is what initiates the LoadOnStartup process for each repository with the “refreshCache” method. So what happens if we also add multithreading to this block of code? Well, that it works:
[pool-5-thread-2] - c.e.d.d.s.i.LoadOnStartup[ ] : Refresh cache for type : d2_subscription_config
[pool-5-thread-1] - c.e.d.d.s.i.LoadOnStartup[ ] : Refresh cache for type : d2_toolbar_config
[pool-5-thread-3] - c.e.d.d.s.i.LoadOnStartup[ ] : Refresh cache for type : d2_attribute_mapping_config
[pool-5-thread-2] - c.e.d.d.s.i.LoadOnStartup[ ] : Refresh cache for type : d2_sysobject_config
You can see how the type cache is populated in parallel by using a different thread for each repository. And what about times? Well, these are the times for the “normal” startup, parallel loading of labels, and parallel loading of the repository configuration and labels taken on my laptop:
[1271387] milliseconds
[1058355] milliseconds
[435735] milliseconds
So, original startup time without touching anything: 21 minutes. Multithreading the attribute label cache: 17 minutes. Full multithread: 7 minutes.
Don’t know, but maybe OT guys should take a look at this and consider a “performance improvement” patch for D2/Smartview…