Knowledge Functions boost analytical productivity by scaling human expertise and automating complex analytical workflows. Knowledge Functions capture and scale tacit domain knowledge, enabling machines to read, learn, and understand in specialized ways.
Assess how important a piece of data is in context to a specific domain just like a human expert. Using the next generation of natural language processing and the latest developments in deep learning, the machine is able to understand the meaning of a piece of information within a specific area of expertise i.e. finance, real estate, marketing, etc. Articles can now be intelligently appraised and a score can be set to determine how important and relevant any piece of information is. Categorize and distill domain-specific information with artificial intelligence, just like a human expert.
The power and advantages of building a model using a semi-supervised approach is the agility it provides for the model to be able to learn new domains quickly. The model in such cases only learns the specific relevant texts, while everything else is classified as irrelevant.
We combine a plethora of proprietary machine learning and deep-learning models to produce a score that signifies whether or not the content being evaluated is relevant.
The models work in tandem, in a nested fashion to improve performance and pass context from one step to another. Finally, as the model learns new topics/domains, this semi-supervised approach allows it to develop branches where each branch specializes in one domain and thus allows the model to learn in a continuous and dynamic manner.
Understand and contextualize the sentiment of a piece of text just like a human. Using an ensemble of hybrid deep-learning models, comprising of transformer-based architectures, overplayed with sequential and feed-forward neural networks the contextual sentiment knowledge function can classify sentiment with 96% accuracy. Near-human like preciseness is achieved using a new class of models that are superior to old methods reliant upon positive/negative word counts. This knowledge function can understand and interpret the relevant context, with respect to the domain of the text using language understanding and can contextually categorize text like a human.
Available in 2 languages: English, and Simplified Chinese. Can be translated into multiple languages using Accrete’s proprietary models.
The contextual sentiment function is built on the foundations of how humans perceive language, as well as how we, as humans classify something in different categories. We are good at categorizing, i.e. we would be in agreement (on average) about whether something is positive, negative, neutral, etc., but if asked to provide a quantitative score we would generally not be on the same page. This is the underlying principle of how we have built this function. This ensemble of hybrid deep-learning models is first trained as a classifier for sentiment, rather than as a quantifier.
Then, the probabilistic measure of this ensemble is fine-tuned to further improve the value and performance of the function with respect to different use cases, e.g. trading in financial applications, a measure of side-effects in pharmaceuticals, etc. These models are built to ensure they achieve maximal performance with respect to the Bayes Human Error metric and provide scalability to problems that would be impossible for humans to comprehend.
Conduct in-depth research and stay up-to-date with a specific area of interest just like a human. Using the latest in situationally aware machine-learning techniques, this knowledge function can keep track and discover relevant sources of information; articles, blogs, research, financial documents, tweets, etc. all in real-time. By configuring the tool to focus on a specialized area of expertise it can continuously uncover relevant information and provide you with a domain-focused data feed as accurately as a human.
The application starts by converting the keywords into a machine-processable query format. This query is then used by the model to look for relevant information across the web by crawling direct and nested links based on the user-defined keywords. The model is tuned to discover new sources and provide relevant information continuously as the model runs the query continuously in a loop looking for new information in real-time.
All unique new relevant information is returned by the function continuously. The model also uses an in-built noise function which learns continuously to reduce non-relevant outputs. The query can be refined further by adding or modifying the keywords based on the results returned by the function.
Know how reliable an individual source of information is without bias and better than a human. Across the internet, there are millions of sources of information, some popular and many lesser-known. However, even if a source has low notoriety, it does not mean it is any less reliable. By modeling numerous factors including recency and propagation, Accrete has developed a dynamic scoring algorithm for scoring the reliability of an individual source, independent of its popularity. Realizing reliable sources are publishing highly accurate information with greater consistency and far in advance of market leaders opens a gateway to an untapped resource of ‘superstars’. Reliability scoring enables you to empirically understand how reliable a source of information is across time, decoupled from biases, and better than a human.
The reliability of a source can be explained by two factors
While Recency measures how fast some information is propagating, Propagation Count measures how many sources have published the same content. A combination of the above concepts gives a way to understand the source characteristics in a dynamic fashion that captures the informational-α with maximal probability.
Understand the popularity of a source of information better than a human, without bias. This function uses a variety of parameters and a computational model that accounts for the average amount of traffic, individual bounce rate, and average time spent on the source. The popularity score is made more accurate by inputting data that is specific to the source only, factors including the relevancy of the source to the news, social popularity, and influence. By blending all of these data points the individual popularity of a source can be quantified giving you complete transparency on a source’s popularity.
The model looks at different factors to identify the source popularity based on the source type. A social media source’s popularity is calculated on the basis of the number of followers, number of shares, number of likes, number of comments, and nested count of the reach of the post. Other source popularity is based on the type of source, Alexa rank of the source, number of visits, content type, and comments. The model keeps refining the weights for different factors of the model to keep improving the popularity score using a continuous learning model based on user feedback.
To be able to classify topics accurately you need to understand language like a human. Using a deep-learning mathematical framework each text snippet can be classified into a topic with 93% accuracy. This framework comprises the latest advancements in natural language processing and understands the contextual meaning of the individual sentence to its adjacent sentences and its placement within the entire paragraph/document. Accrete has developed AI that can understand the semantic and contextual nature of an individual sentence within its surrounding environment and specific to the domain of the text itself using language understanding, the topic classification can provide a deeper level of understanding as meticulously as a human at scale.
Available in 2 languages: English and Simplified Chinese. Can be translated into multiple languages using Accrete’s proprietary models.
The model starts by converting the text into a multi-scale numerical representation. This representation allows the model to build a multi-scale memory from the text and thus is able to extract the most relevant information related to the task. These numerical representations are then passed on to a custom architecture based deep-learning network module.
We have been conducting extensive research to build a continuous dynamic learning network that can learn from the feedback given by a user, without going into catastrophic forgetting, as is usually the case with online learning for deep-learning networks. Our network has the ability to semi-autonomously grow in size, to alleviate this issue as well as become smarter as it looks at more data, just like a child becomes an educated adult.
Knowing how influential a user and their content is very difficult to quantify. Every social media platform Twitter, YouTube, TikTok, etc. employ similar models whereby users are encouraged to post information and follow other users. This generates countless interactions between users through likes, comments, reposts, shares, and various other actions. To quantify the value of interactions, Accrete has developed a model that counts interactions and calculates the popularity of interacting users, followers, followers-of-followers, and so on. By looking beyond first degree interactions Accrete can model the sphere of influence and quantify the virality of the individual users.
The viral influence of a user is calculated based on ‘impact’ factors, this is derived from the qualitative engagement of their own content. For example, the viral influence score for an artist on a music-centric platform calculates the impact of their tracks upon immediate and nested user networks. To calculate the impact of user-generated content the algorithm considers factors such as likes, reposts, shares, comments or other pertinent actions. A network tree is constructed for each piece of content that considers user's activity i.e. likes, downloads, etc.
Within a network tree, the root node represents the user and the child nodes are interacting users. Once this network tree is constructed, using Accrete’s Source Popularity knowledge function these interactions are scaled to calculate the impact. The viral influence score of the user is the average impact of all of the user-generated content.
With millions of users sharing information online, it has become imperative for analysts to understand the authenticity of each user. With the rise of troll farms and the mass automation of bots, high-levels of engagement can be manufactured. These disingenuous tools are used by organizations to spread misinformation and by people to unfairly garner fame. To account for these dynamics, Accrete's team has implemented a machine learning algorithm for Natural Language Processing that reads through comments in a semantic way. It repeats this process for each connected user up to 2 degrees away. By including the network of followers it filters out the bots and increases the accuracy of the authentic engagement score.
The machine uses a character-level based embedding learning model that converts each comment to a tensor (or an algebraic object). Character-level models are helpful in cases where the language may not have a specific linguistic structure such as in social media comments, and it can account for the use of emojis.
The model is trained in an unsupervised manner on a subset of all the data available. Hereafter, the embeddings are then passed through an unsupervised clustering algorithm, which outputs 3 clusters; (1) Valid comments, (2) Self-promotional comments, and (3) Junk comments. The comments for each user are analyzed and an aggregated score is computed using Accrete's proprietary algorithm.
The goal of image classification is to assign a specific label to an image. Here, we have developed an image classification model for the use case of grading collectible items.
Grading of collectible items such as baseball and other sports card, trading cards, Pokemon cards, coins, currencies, stamps, music and movie posters is currently done by human experts by examining the collectible item minutely to identify physical defects to the item such as incorrect centering of a baseball card, tear to a card or a poster, rusting of coins. This grading procedure can lead to inconsistency in the grade assigned to an item from one human expert to another. To reduce this inconsistency in grading and automate the process of grading, we have developed a deep learning-based computer vision algorithm that expedites grading and provides a consistent grading scheme over all items.
The steps involved in getting the final grade of a collectible item:
Step 1: Extraction of the region of interest
The first step in the grading process is to extract the region of interest (ROI) i.e. the card region from the image so that the grading model can only use the ROI to determine the grade of the card. Retraining of the model does not require more than 400 labeled images based on empirical studies conducted while doing this work
Step 2: Grading the region of interest
Once the ROI is extracted, the next step is to grade the extracted image. We have developed a custom deep learning model which extracts hidden features from the raw ROI image and uses a classifier network to determine the final grade. Along with the hidden-features extracted by the deep learning model, we can also feed features or sub-grades defined by human experts such as centering, corner quality, surface smoothness and edge quality to the grading model to improve the accuracy of the model and provide human-level performance. We incorporate a continuous dynamic learning framework so that the model can improve its performance over time based on feedback provided by the user.
A named entity can have many in-text name-variants, and we normalize them into one key-term (e.g. “United States”, “U.S.A”, “US”, “UNITED STATES”). When faced with a search result including multiple name-variants, Accrete’s Entity Normalization Knowledge Function merges them into their key-term and regard them as one (e.g their frequencies are added up to be the combined frequency for the key-term). When we refine a search result by choosing a key-term, we expand it to its name-variants to search for all of them. We keep normalized results as a dictionary with (key, values) = (key-term, a list of name-variants) for each entity type. We call each (key, value) pair an entry.
Given N entity names, we want to find groups of multiple lengths referring to the same entities. The number of normalized entities and group lengths both vary and are unknown. Even at the simplest level, standard approaches begin by comparing N* (N-1)/2 possibilities for each name pair with time complexity N². At scale, this approach is untenable by conventional fuzzy matching processes.
We are using a computationally efficient algorithm for comparing similarities of word-vector representations as the first layer, and run the various matching algorithms for the pairs found. This method can drastically speed up the whole process, and get us multi-dimensional matching information, with which we use an ensemble method to get one comprehensive match score for each pairing. After some quality control, we use those pairs to generate clusters, which form our normalized name groups. By choosing a key-term as a representative for each name group completes the variant-level normalization process.
Similar processes could be done for group level and data-base level to enhance and maintain the quality of normalization.
Accrete’s Named Entity Recognition Knowledge Function is a supervised model that detects named entities in text, taking into account the surrounding context. This model detects the following entity types: Person, Organization, Location, Date, Time, Event, Law, Product, Facility, Percent, Quantity, Money, Geopolitical entity, and NORP.
The task is addressed as a span detection problem; for each token in the text, the model predicts its entity type, or if it is not an entity at all. Typically, neural networks model’s parameters are tuned by minimizing a Loss Function that measures how far the predicted named entities are from the original ones. Accrete’s model’s parameters are tuned by minimizing a Hybrid Loss Function that also takes into account the predicted part-of-speech (POS) tags from the original ones. The assumption is that to better identify a named entity, it is important to incorporate syntactic information describing its role in the sentence, which is encoded in its POS tag.
For two separate domains given with entity names from each, We may need to connect those names. Entity Mapping Algorithm finds a mapping from names in one domain to names in the other domain, based on their similarities in multi-dimensions. Currently, it is being used in Argus for its Federated Search: entities in Semantic Search <--> entities in Relational Search, and entities in Investment Activity <--> entities in Relational search
Given entities in domains, we can find mappings among similar entities from each domain, utilizing the processes in Entity Normalization. It finds name pairs close to each other, where each name in the pair is from different domains. Using those pairs, we generate a graph structure with one node from a domain and the other neighboring nodes are from the other domain. Populating every node from the name pairs provides mapping between the entities in two different domains. The end result can be one-to-many mappings in the case that domain sizes are different.
Semantic search enables analysts to search content using filters that reveal the context of the returned results. In addition to returning a list of articles for the search query, Semantic Search also provides insight related to the context. This includes delivering most significant entities that appear in the returned list, categories associated with the results, geo heat map that shows the locations mentioned in the articles, Knowledge Graph that connects the extracted entities to the returned results, Semantic Activation Map to visualize the context of the returned results.
Pre processing: Analysis of User Queries
The user queries are analyzed using Contextual Natural Language Processing. Keywords are extracted, normalized, expanded in semantic context, and the associated Semantic Fingerprints are generated.
Post Processing: Extracting insight, building a Knowledge Graph
The returned results are processed to highlight relevant insights and create a knowledge graph. Knowledge Graph, built using the relationships between entities, documents and document segments, are a way to represent and extract hidden relations and insights.