Welcome!

Weblogic Authors: Yeshim Deniz, Elizabeth White, Michael Meiner, Michael Bushong, Avi Rosenthal

Related Topics: Industrial IoT

Industrial IoT: Blog Post

Injecting Intelligence into PDFs with XMP (Part 2)

Bring a higher degree of efficiency to documentation workflows

In a previous article, we reviewed the benefits and uses of PDF metadata. Specifically, we looked at the emergence of XMP metadata as a potential standardization for metadata frameworks and at how it can help PDF developers store, exchange, track, and retrieve information. In this article we will explore the injection of custom XMP metadata tags into existing PDF/A documents.

PDF/A and XMP: Inevitable Convergence
In 2001, developers recognized the need to add their own customized XMP metadata tags to PDF documents. They understood that by adding their own information, they could make a document easier to retrieve and include data that would not change regardless of where or how it was processed.

However, the introduction of the PDF/A standard has challenged some developers and has forced them to rethink how they would incorporate their XML customizations, especially into PDF/A documents. They realized that although there were many ways to add custom XML tags, there were only a few ways to keep their new data valid without compromising the PDF/A's format restrictions.

Because of the rising popularity of PDF/A, Amyuni Technologies saw the need to provide developers with a tool that would help them solve specific PDF challenges, such as working within the confines of ISO archiving standard. This tool is the PDF Analyzer.

PDF/A: Driving Archiving and Document Conformance
One example of inserting customized XML information into PDF/A documents is with ERP purchasing applications. Often, developers use these applications to generate invoices or purchase orders in PDF/A for archiving and retrieval purposes. Although these files already contain XMP metadata, additional information can be added to make these files more useful—extractable information from the text content itself.

Items such as P.O. numbers, contact details, or the names of sales persons, departments, authors, and projects are all valuable pieces of information developers can use to create reports and enhance document retrieval. Developers can automate PDF Analyzer to:

  • Verify the internal structure of incoming PDF/A purchase orders from an ERP application for PDF/A compliance and repair them if necessary (and possible).
  • Locate and extract specific text strings (names, addresses, dates, etc.) from the PDF/A documents and convert them into customized XMP metadata (Figure 1).


Figure 1: Text String Extraction and Conversion

  • Convert these text strings into XML and then with this information, create customized XMP extension schemas.
  • Reinsert the new customized XMP extension schemas into the XMP streams of the PDF/A documents.
  • Resave the document PDF/A and still keep its adherence to the ISO specifications.


Figure 2 outlines how this automated text conversion process would operate, either for single or multiple PDF/A documents:

Figure 2: Text String to XMP Metadata Workflow

When companies implement PDF/A, it’s often because they have high volumes of documents that require archiving. Insurance companies, banks, medical institutions, and manufacturing companies are just some examples of where the PDF Analyzer's automation and XMP customization capabilities can bring a higher degree of efficiency to their documentation workflows.

Learn more about PDF Analyzer at www.pdfanalyzer.com.

More Stories By Amyuni Tech

Franc Gagnon is a technical copywriter for Amyuni Technologies–a PDF solution provider. Amyuni products are integrated into applications used worldwide. The company's software tools are the PDF engines behind several leading business applications created by large software companies such as Intuit, Sage, CaseWare and many more.

IoT & Smart Cities Stories
Early Bird Registration Discount Expires on August 31, 2018 Conference Registration Link ▸ HERE. Pick from all 200 sessions in all 10 tracks, plus 22 Keynotes & General Sessions! Lunch is served two days. EXPIRES AUGUST 31, 2018. Ticket prices: ($1,295-Aug 31) ($1,495-Oct 31) ($1,995-Nov 12) ($2,500-Walk-in)
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
The challenges of aggregating data from consumer-oriented devices, such as wearable technologies and smart thermostats, are fairly well-understood. However, there are a new set of challenges for IoT devices that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Ochandarena discusses a whole new way to think about your next...
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
All in Mobile is a place where we continually maximize their impact by fostering understanding, empathy, insights, creativity and joy. They believe that a truly useful and desirable mobile app doesn't need the brightest idea or the most advanced technology. A great product begins with understanding people. It's easy to think that customers will love your app, but can you justify it? They make sure your final app is something that users truly want and need. The only way to do this is by ...
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed...