While AI solutions appear to be powered by magic, there is a lot going on behind the scenes. To make the show come alive, there is a need for well thought out architecture, the seamless integration of smart algorithms, and robust plumbing to ensure that data can flow in, be transformed into a state that is useful, and be made available for analysis and exploration as the need arises. The data flow aspect is very important for any AI solution in order for raw (often messy) data to go in and valuable and actionable insights to come out. Data flow involves breaking down the various computational processes of the system into separate pipelines that operate simultaneously – much like a factory production line.
Any data flow begins with the raw data; or the raw materials in the case of a production line. There must first be a way to get the data from the source into the solution. This could be manual – such as uploading an Excel spreadsheet, or it could be automated – such as establishing a secure connection to an SQL server. The data might be structured (e.g. relational database), semi-structured (e.g. CSV or JSON file), or unstructured (e.g. PDF document). Data can come from one source or it can come from several sources. It can be ingested according to a schedule, based on events, or streamed at regular intervals – for example, the collection of weather data via an API.
“The best opportunity to check the quality of the data is the moment it enters the system! “
Quality of Data
Aside from collecting as much data as we can get our hands on, we want to be sure to check that what is coming in is actually valid and any information that is duplicated or erroneous is promptly screened out. The best opportunity to check the quality of the data is the moment it enters the system; that way we avoid polluting the data repository right from the beginning. We can integrate checks – from comparing the hash of a file with another to see if the same content already exists, to check that a date or time has the correct format, to validating that an address actually exists in the real world. This is a small price to pay because better quality data leads to more useful and reliable results in the long run.
Once we have a lot of good quality data stored in the system that we are happy with, we then have the freedom and the flexibility to put it to good use. Analytical pipelines can be utilised to find out what the data is able to tell us. These pipelines can be powered by something as simple as a rules-based algorithm to match keywords with results from the database, to complex machine learning models for raising important red flags in a contract or predicting the sales performance for certain products over the coming months. These pipelines meld in so nicely that valuable insights flow seamlessly to the end-user as if by magic.
Having a good stash of data and smart tools to make sense of it is all very well; however, we must make sure that the user can actually access, manipulate, and visualise the information. This is where REST APIs come into the picture. APIs present a secure, versatile, and scalable solution for providing rich, structured data in a broadly consumable manner. Two completely different user interfaces for separate data solutions could connect to the backend using the same API endpoint. On the other hand, multiple endpoints might be used to present a combination of different types of data in one view. Endpoints can be used for conducting security checks and data validation as well. We are not shy about building APIs for all kinds of applications.
At Kentivo, we have built up a lot of knowhow over time, culminating in the Genie platform. This platform ties in with our can-do attitude towards taking on new challenges and providing a range of exceptional outcomes for our clients. We are able to work with diverse datasets and the end result is a rich and intuitive user experience where information comes to life. By getting the data to flow through a smart platform, we can deliver the next best thing to real magic.