Java.net is currently undergoing major infrastructure improvements. Some portions of the site will be read-only during the next 72 hours. We apologize for the inconvenience.
Yoav Landman's blog
Posted by yoavl on December 16, 2010 at 6:16 AM EST
Every software project experiences the complexity of incorporating open-source and proprietary components that use a wide range of licenses. The BIG question is what can be done to avoid license violations in the face of countless dependencies.
Tracking Artifact Licenses - Why is this Hard?
Tracking licenses of third-party artifacts is not one of those tasks that get developers excited. With more interesting problems to solve than legal issues, it is not usually high on the priority list for most teams to deal with licenses during active development, so more often than not, this is left as one of the final steps before preparing a release.
Even when you do try to take due diligence and track those third party licenses, making sure that all developers verify each dependency and its transitive dependencies for compatibility with your company’s license usage policy is not a trivial thing to do. Eventually this results in manually digging through each and every dependency in the project and attempting to accurately keep track of the license that each dependency uses.
Now, if you are only developing in-house projects, then this may not seem like a big deal, but once you begin distributing your software, even as a cloud service, the risk of using a third party dependency that uses an unwanted license is a reality.
License Information is Out There - Module Info to the Rescue!
Getting the initial license information for third party dependencies doesn’t have to be a manual process - with modular dependencies there is already good information out there that we can leverage!
Maven, Ivy (+Ant), and Gradle (which uses Ivy) all describe artifacts and dependencies in terms of reusable declarative modules. Both Maven POM files and Ivy descriptor files are designed to contain license information as part of the module metadata. And, in fact, many open source libraries already include valuable license information in their descriptors. Potentially, that means that extracting license information from module metadata can be fully automated!
Relying on Module Metadata - Not Quite There Yet...
In practice, there are a couple of issues with purely relying on license information from Java modules:
Managing Licenses with an Artifact Repository
Many organization already manage their published artifacts and dependencies in a central Artifact Repository, such as JFrog's Artifactory. The repository keeps all the organization’s binaries which are used by the developers and by the build system.
Apart from managing the binary data itself, Artifactory also manages metadata about artifacts.
Managing license information about artifacts as part of this metadata just seems the natural thing to do:
By using the artifacts repository we can tag our artifacts with license information managed at a central place. Adding this license metadata information can be fully automated and can also be controlled by users!
This is, in fact, exactly approach taken used by the License Control feature in Artifactory Pro, and it solves all previously mentioned issues related to license information extraction.
This is how it works:
Discovering Licenses Automatically - Build Servers Never Lie
Automatic license discovery and notifications about possible violations is done as an integral part of the Continuous Integration process -
Whenever a new dependency is introduced by a developer it will get picked up on the next build by triggering automated license analysis. If the dependency is using an unknown or unapproved license an email notification will be sent to specified users.
This is all possible using Artifactory’s comprehensive build integration with Hudson, JetBrains TeamCity and Atlassian Bamboo and works for Maven 2 & 3, Ivy and Gradle builds on each build server.
When installing the Hudson Artifactory plugin, for example, you will get the options to run license checks as part of the build (identical functionality exists for TeamCity and Bamboo):
The Full Cycle - From Modules to Automated License Checks
Here is how it all works together to automatically extract and apply licensing information and conduct license violation checks on the fly:
A developer declares new dependencies in pom.xml files or ivy.xml descriptors (1). Once the changes are declared the developer commits them to the Version Control System (2).
The CI build server monitors version controlled files, sees the changes and pulls them to its workspace (3), which triggers a build (4).
The build is run and intercepted by the Artifactory plugin (for the relevant CI server). The data intercepted is a complete BuildInfo for the build (acting as a bill of materials), including information about all resolved dependency artifacts (5).
Note: It is important to realize that the context of a build is the only reliable source of information for the actual dependencies used by your project, since dependency resolution can be dynamic and rely on dynamic aspects like version ranges, the state of the repository at the time of build, resolved properties, etc.
The Artifactory plugin publishes all modules with the captured BuildInfo to Artifactory (6). This is where things start to get interesting -
Artifactory looks at the dependencies and for each artifact attempts to figure out what licenses it uses (7). This is done by combining: license information from module metadata, previously found license information and user-set license information. It is even possible to tell Artifactory the exact build scopes/configuration for which dependencies need to be checked.
At the end of the analysis an email with all license violations discovered is sent out to the configured recipient addresses (8). Normally this would be the development lead or the project lead and not someone from legal.
Although there may be license violations, the build will not fail - This approach allows development to move on naturally, while letting development leads discover possible licensing discrepancies immediately as they surface and deal with them before they become an issue. To submit the information beyond the development circle, you can generate license usage reports (9) to incorporate into the legal department’s favorite Excel template. Effectively what this means is that you never have a single artifact in your project that was not verified for license information prior to submitting it!
Using the power of a central repository manager like Artifactory, we can extract important license information and combine it with user definitions in order to automate the process of license governance. This is done in the context of a project build executed automatically by the CI server upon changes in the version control system. This ensures that all possible license violations are handled immediately when new or modified dependency declarations are checked in.
The approach taken towards license control is developer-oriented - never stop the build, but let development leads decide per new dependency whether it can go into the project, before the information is generated and transferred for legal improvement.
You can read more about the Artifactory License Control feature on the JFrog wiki, or watch this short video to see the full cycle described here is action.