DocBox – The Architecture – How to build a modular Document Management System?

How to build a modular Document Management System?

Of course the answer: The DocBox Software is build modular, and can therefore work in different constellations.

Server: This is the central WebServer running in on ROR, connecting with the database.

Scanner-Daemon: This process must be adapted per scanner, in my example using a Fujitso ScanSnap S1300 (described here in GitHub).

Converter-Daemon: This guy does all the work of processing the scanned images he received from the Scanner-Daemon, including OCR processing. For OCR on my local Cubietrack I am using ADOPE PDF, but it also supports tesseract (also pretty good results). I have the Convert-Daemon currently running on my PC and put it into auto-start. So when I boot my PC it automatically starts converting the images and does the OCR. Alternatively the Converter-Daemon could run a Raspberry PI - would just take longer 🙂

Hardware-Daemon: This is currently only implemented for my Cubietrack, you can see the wonderful GREEN button, that triggers the scanning of the document. Its an additional abstraction layer, so the SW can be more easily adjusted to new hardware.

Why is this useful?

This allows scaling and using multiple scanners in different locations, e.g. with one central converter and web-server daemon. Using the AVAHI-daemon, there is no need to configure IP-addresses, as the components register themself.

How do all this Daemons communicate?

This took my quite a while to get running in a reliable way. But it works, the complete constellation runs for months without any issue.

3 Steps..ONLY

On the same system as the WebServer an AVAHI-Daemon is running, that allows the other modules to use avahi-discover to find the DoxBox-Server in the network.
Using the IP - Address and Port announced bay the AVAHI-Daemon, the external modules connect with the DocBox-Server and register its service by calling a specific URL (there is some need for additional security, but the assumption is, that we are running on a local network - but I think more security can be easily added). The information that is registered at the server is the IP-address of the Daemon and its port.
Now the server has IP-Address and Port and is using DRB - Distributed Object System for Ruby.

Because I think DRB is so nice, look at this code - so easy to have all the logic in an object available from a remote system.


# Server  ************************************
require 'drb/drb'

# The URI for the server to connect to
URI="druby://localhost:8787"

class TimeServer

  def get_current_time
    return Time.now
  end

end

# The object that handles requests on the server
FRONT_OBJECT=TimeServer.new

$SAFE = 1   # disable eval() and friends

DRb.start_service(URI, FRONT_OBJECT)
# Wait for the drb server thread to finish before exiting.
DRb.thread.join


# Client ************************************
require 'drb/drb'

# The URI to connect to (that was sent from the daemon)
SERVER_URI="druby://localhost:8787"

DRb.start_service

timeserver = DRbObject.new_with_uri(SERVER_URI)
puts timeserver.get_current_time