SQL Theory: Structured documents vs Unstructured documents

Understanding the Differences Between Structuredand Unstructured Documents
Differences Betweenthe Two Document Types
What is the difference between structured and unstructured documents? With a structured document, certain information always appears in the same locationon the page. For example, in an employment application the applicants name always appear in the same box in the same place onthe document. In contrast, an unstructured document has the opposite characteristics– information can appearin unexpected placeson the document. An example would be in a hand written note or a whitepaper.
Some documents share the characteristics of both types of documents, such as invoices. For example, suppliers’ invoices feel like a structured document because they have a consistent appearance from one billingperiod to the next. However, when viewed in aggregate by an accounts payable department that receives thousands of invoices daily in a myriad of different formats; they seem more like structured documents.
What About Template-Based OCR Systems
Some document imaging systems advocate template-based OCR (opticalcharacter recognition) to capturethe information needed to identify the document for later retrieval. They call this pixy dust, where you don’tneed to do anything with the documents other than to load the automatic document feeder. Unfortunately this solution only works well with structured documents, and it is not 100% accurate even under the best conditions. (For more information on theaccuracy of OCR, read our whitepaper on that subject).

Needless to say, you will need to have a different method to capture the key information needed to retrieve documents that are unstructured. In many organizations unstructured documents representthe majority of the documents that will be imaged with a document imaging system.
Characteristics of Structuredand Unstructured
Type of Document
Familiar data appears in the same place every time.
Data appears in unexpected places in the document.
Insurance claim form
Employment application
A letter
A hand-written note
Used by Organizations:
Low volume operations
Internally created invoices
High volume operations
Invoices received from outside the organization

Every organization will have both structured and unstructured document with which to contend. It is generallya good idea to purchase a document imaging system that offers themaximum capabilities to deal with both types of documents, rather than purchasing a systemthat caters only to a single document type.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s