Header Menu
G. N. Shah December 14, 2020 No Comments

Incorporating NLP Capabilities into Innovatix’s WebMineR

Executive Summary Innovatix Technology Partner’s new product, WebMineR, is a state-of-the-art web scraping cloud application. It has over 25 best-in-class capabilities, and is highly scalable, efficient, secure and configurable. The list of major features continues to grow, as shown in the latest Roadmap posted on our web site. In addition, several recent papers on Innovatix’s web site discuss in detail these 25 major capabilities, and how they compare to other products in the marketplace. This paper starts a dialogue on some of the newest functionality to WebMinerR which are centered in the area of NLP, specifically text summarization and topic modeling. These are clearly important features to have in a web scraping technology. We expect these capabilities to be fully available by second quarter 2021. The purpose of this paper is to describe the research into NLP we are now doing to make sure these capabilities perform well when delivered in WebMineR. Text summarization and topic modeling are two of the most prominent use cases in the field of Natural Language Processing (NLP). Text summarization allows one to understand the basic idea of a block of text without having to manually read and summarize the document. Topic modeling on the other hand provides the ability to extract main topics from a block of text and thus get the key ideas of the paper, again without manually reading it. That is at least the concept behind these two important NLP capabilities. Certainly, the technologies are getting better with time, as AI/NLP continues to move forward at a tremendous pace. How well they actually work today for text extracted from a wide range of web sites (in many different languages) is still an open issue in our minds. For sure it is critical to know how to best configure and use the NLP algorithms we select to use in these two areas in order to get the best possible out of them. Hence, that is the goal of our current research in this area and below we report on some of our findings to date. We are doing this testing and research in order to ensure they are reliable when we incorporate them into our web scraping system. WebMineR is built to scrape public information off web sites at ultra-high throughput capacity. The addition of text summarization and topic modeling to the tail end of the WebMineR process would be an obvious major benefit to our user community.   This paper contains a detailed background on both these subjects – text summarization and topic modeling – along with some results of trials and case studies of different off-the-shelf NLP systems and algorithms we have tried out so far. We expect to continue this testing and research through the end of the year. The tests we show here are on healthcare-related documents. In addition to English, we show how these capabilities can be applied to Mandarin language documents as well. Overall, our research and testing results show reasonably good performance using NLP methods to extract summaries and topics from documents. We feel strongly this will add useful new capabilities to WebMineR. Stay tuned! Text Summarization Text summarization is the task of creating a concise and accurate summary that represents the most critical information and has the basic meaning of a longer document’s content. It can reduce reading time, accelerate the process of researching for information, and increase dramatically the amount of information found within an allotted time frame. Text summarization can be used for various purposes such as medical cases, financial research, and social media marketing. Text summarization can broadly be divided into two categories: Extractive summarization and abstractive summarization. Resources/libraries that are commonly used for text summarization include: Natural Language Toolkit (NLTK) and Spacy, and pre-trained models like Bert and T5. NLTK and Spacy are open source libraries used for NLP on Python. Bidirectional Encoder Representations from Transformers (BERT) is a technique for natural language processing pre-training developed by Google. BERT is pre-trained on a large corpus of un-labelled text, including the entire Wikipedia(2,500 million words) and Book Corpus (800 million words). You can fine-tune it further by adding just a couple of additional output layers to create state-of-the-art models for your specific text mining application. T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format, and it is trained on a mixture of unlabeled text (C4 collection of English web text). In our test use cases, we used NLTK, Spacy, and BERT for extractive summarization, and we used T5 for abstractive summarization. We applied each of the 4 tools to implement text summarization. Here is an example that shows the summaries that we generated using the different tools. We first load a text file (PDF format) and then applied the four text summarization methods. Text summarization by using NLTK library first uses Glove to extract words embedding and then uses cosine similarity to compute the similarity between sentences and apply the PageRank algorithm to get the score for each sentence. Based on the sentence scores it put together the top-n sentences as a summary. Spacy library will tokenize the text, extract keywords, and then calculate the score of each sentence based on keyword appearance. Text summarization using pre-trained model BERT will first embed the sentences and then run the clustering algorithm to finally find the sentences closest to the centroids and use those sentences as the summary. T5 converts all NLP problems into a text-to-text format, and the summarization is treated as a text-to-text problem. The model we used is called T5ForConditionalGeneration, and it loads the T5 pre-trained model to extract a summary. The document we used to implement text summarization is a summary of the European public assessment report (EPAR) for the drug Fosavance. It explains how the Committee for Medicinal Products for Human Use (CHMP) assessed the medicine to reach its opinion in favor of granting a marketing authorization and its recommendations on the conditions of use for Fosavance. After analyzing the 4 summaries, the

G. N. Shah November 4, 2020 No Comments

How to Virtualize your VFP Application

The time really has come to virtualize your VFP application. We are not talking about migration or conversion or bringing in a 3rd party tool to replace it. Rather, we suggest virtualizing it and giving your perfectly fine-tuned VFP application additional years of very productive life, within a much more modern setting and UI. This may well be your best option! Read this paper – we will go through all the major issues to keep in mind in considering such a process, and along the way, you may find it is indeed your best option. The benefits are considerable, and the downsides are very limited, as we describe in detail in this paper. It is definitely a low cost and secure option both upfront and on a continuing basis. We also provide a how-to-do guide for virtualizing a VFP application using one of the best cloud services out there – AWS AppStream 2.0.  To start off we describe what we mean by application virtualization, showing different examples of virtual environments. Clearly, the overall virtualization market is booming (even more so since the onset of the C-19 pandemic), and we show some industry statistics that confirm that. The market is growing rapidly because most industry players now recognize the significant benefits of application and desktop virtualization, and as the major players continue to improve their cloud environments, we expect benefits will only grow. There are a number of prominent applications and desktop virtualization services available including: Amazon’s AppStream 2.0 and Workspace; Azure’s RemoteApp and Windows Virtual Desktop; Cybelesoft’s Thinfinity Virtual UI; and FoxinCloud as well as other available services. With specific focus on VFP apps., we show the pros and cons of each of these options. In considering virtualizing a VFP application, there are several major issues you need to consider and address. We discuss these, along with describing a long list of the common issues involved in setting up a virtual VFP application. The final section provides a guide to VFP virtualization for the specific use case of AWS AppStream 2.0. We provide a step-by-step guide to application setup, including the important process steps of QA, application file sharing and database mapping, and printer and backup services. If you have not already done so, now may be the time to virtualize your VFP application! Why Application Virtualization / Environment Today COVID-19 is forcing organizations to move their local desktop application(s) to cloud/virtual servers at an accelerating rate. The imperative often heard is “Virtualization is a Necessity”. It helps businesses with scalability, security and management of their applications and global IT infrastructure, and in addition businesses save significant costs by consolidating their infrastructure needs, both currently and more importantly in the future. What is Virtualization? Before discussing the different categories of virtualization in detail, it is useful to define the term in the general sense. Wikipedia uses the following definition: “In computing, virtualization is a broad term that refers to the abstraction of computer resources. Virtualization hides the physical characteristics of computing resources from users, be they applications, or end users. This includes making a single physical resource (such as a server, an operating system, an application, or storage device) appear to function as multiple virtual resources; it can also include making multiple physical resources (such as storage devices or servers) appear as a single virtual resource…” Types of Virtualization Below we distinguish 6 different types of virtualization and provide short summaries about each. Server virtualizations, also called hypervisors, are classified as one of two types: Virtualization is not only a server domain technology. It is being put to a number of uses on the client side at both the desktop and application levels. Such virtualization can be broken out into four categories: The benefits of desktop virtualization include: Network virtualization is similar to server virtualization, except instead of a single server, in this case we are encompassing an entire network of computing elements. In general, benefits of network virtualization include: Implementation of Storage Virtualization includes several different technology options: Host-Based with Special Device Drivers; Array Controllers; Network Switches; Stand Alone Network Appliances. The general benefits of storage virtualization include: Implementations of service/application virtualization include the following 3 options: Here the benefits are analogous to those shown earlier under desktop and application virtualization, namely: high availability and optimized resource utilization. In summary, it should now be apparent that virtualization is no longer just a server-based concept. The technique can be applied across a broad range of computing options including virtualization of entire machines on both the server and desktop side; applications as well as desktops; storage components; whole networks; and even application infrastructure in its entirety. Moreover, virtualization technology is continuing to evolve and get better in many different and important ways, so the impetus to virtualize will only make more sense as time goes on. 2020 State of Virtualization Technology The market adoption of virtualization continues to increase across a diverse range of markets and industries.  In particular, there is an accelerating adoption rate by those industry segments that have yet to embrace virtualization. The recent entry of Microsoft into the bare-metal hypervisor space with Hyper-V is a sign of the technology’s overall maturity. The current state of company utilization of each of the different types of virtualization is shown in the bar-chart below, along with projections of increased utilization over the next 2 years.[1] Not unexpectedly, the server virtualization adoption rate among corporations is nearly 100%, while for application virtualization, the current adoption rate is 39% and expected to increase markedly (43%) over the next two years to 56%. The adoption rates for all 6 types of virtualizations shown below are averages over both enterprises and small businesses. For application virtualization and desktop virtualization there are significant differences in utilization rates between these two sectors. As shown in the chart, average application virtualization adoption is expected to grow from 39% today to 56%, while average desktop virtualization goes from 32% today to 44%. For enterprises, adoption rates are significantly higher. Enterprise

G. N. Shah October 9, 2019 No Comments

CIO’s Biggest Challenges For Cloud Transformation

Cloud Transformation I have sat across many CIOs, from small organizations to massive enterprises and the common challenge observed is that CIO’s overthink the cloud move.   Falling victim to dated skillset If CIOs want to spearhead the transformation is a different question, but when they choose to, the biggest challenge comes from with-in. It is no other than from the people who want to stick to what they have, shows lack of realization that change is a necessity and have become comfortable with what they know. Yes, that’s the IT teams with-in the organizations. Mostly CIOs are falling victims to their own team which either have very limited knowledge on what today’s cloud really means or they simply are reluctant to embrace the change without realizing that the change has started and probably it’s an opportunity to now upgrade their skillset and be part of the change. The definition of cloud has evolved. Many CIOs I have talked to think that they are already on cloud because they have some application that is accessed over the internet or they share that someone in their organization is moving an application to Cloud. Well that’s not a cloud transformation, not by today’s standards. It also shows a casual and unclear approach towards probably the biggest technology transformation happening around the globe. How to successfully move to Cloud? Moving to cloud requires assessment at Infrastructure, Platform, Security, Applications, Access, Integrations and many other levels. Without that, moving to the cloud is nothing more than rehosting which was a cloud concept from years back. Things have changed since then! Most of the team leaders who are eyes and ears of CIO’s have been managing system operations for some time, they possess great understanding of systems that delivers what organization needs today. For future, they mostly have theoretical knowledge of technology trends which is not enough to help CIO draft strategy or tactical plan. The result are either a vision with a dire tactical plan or nothing at all. There are many tangents to the cloud and very few organizations have the in-house skills to understand them all. What they need is help from technical experts who understand how it works. The Key Note What CIOs need is to lead a Cloud Transformation exercise that can deliver a Strategy that aligns with organization’s future objectives, Envision Innovation that enables Organization’s future growth and Roadmap for Technology Renovation. “Good, bad or indifferent, if you are not investing in new technology, you are going to be left behind.“ Philip Green

How to Attract and Retain the Best IT Candidates

There is an ever increasing need for technology professionals as the US economy continues to see some of the lowest unemployment rates ever. This trend is even more prominent within the IT and Software sector. IT developers to fill these on-site jobs are experiencing an increase in demand, yet are getting difficult to find and retain. As the demand increases companies need to change their approach to find the best candidates. There is an increasing use of technology and specifically artificial intelligence to scour the web in search of candidates. This may produce results when considering volume. But we believe strong social networks and reputation in the market is key to finding the best candidate for hard to fill positions. Maintaining relationships with a pool of candidates is essential to a successful staffing practice. This pool may not be the ultimate person placed in the role, but is the connection to finding the right candidate. Time needs to be invested to build these relationships and nurture them. Staffing companies that are solely transactional finding candidates off the job boards are a dime a dozen. Recruiters with strong business networks across specific technology platforms are the ones that succeed. At the same time organizations needs to adapt in how they source and maintain IT candidates. Though companies are accustomed to working with temporary staff on either short or long term projects. But these temporary staff need to be treated equally as well as your full-time employees. If you simply treat them as insignificant members you are not going to experience the long-term success with them. Innovatix Technology Partners, a Macrosoft, Inc. company has a 32 year history finding great candidates for great clients. We have found that even more important than the pay rate is a flexible work environment. Providing people the flexibility to maintain a great work-life balance and offering a desirable work environment, ensures you can attract and retain the best talent.  Topping this list is the ability to work from their home office on a regular basis. Sure people may slip out and get a haircut but you will quickly know based on their output that people remain equally if not more productive when given the opportunity to work remotely. At Innovatix we are committed to maintaining relationships with clients and keeping a competitive pool of highly qualified candidates. We continually advise our clients to ensure that your work environment is geared to attracting and retaining these high-quality people through competitive compensation and an opportunity to maintain a work-life balance. At Innovatix Technology Partners, we are committed to maintaining relationships with clients and keeping a competitive pool of highly qualified candidates. We continually advise our clients to ensure that your work environment is geared to attracting and retaining these high-quality people through competitive compensation and an opportunity to maintain a work-life balance.

SQL Server On Linux, Is It Really Possible?

Microsoft is renowned for making surprising ventures, this time, they are gearing up to make SQL Server available on Linux platform. In March 2016, Microsoft announced their plans for a Linux port of SQL  Server – an exciting step to make their top-notch data management and business analytics platform SQL Server available for any data, to run any application, anywhere. With SQL Server compatibility on Linux, Microsoft plans to bring in the core relational database capabilities to provide customers more flexibility in their data solution. This upcoming significant release of SQL Server will be built with best-in-class security and hybrid cloud innovations. Microsoft with this product is proving its focus to being a cross-platform solution provider. For communication between client applications and SQL Server, Tabular Data Stream (TDS) protocol is used.  ODBC provides a standard for applications to access databases and Microsoft has supported ODBC in all versions of SQL Server. Microsoft SQL Server Native Client contains ODBC driver which will provide access to SQL Server for Linux applications. With proven experience and capabilities of SQL Server, it is expected that the Linux community shall welcome this move as it brings enterprise customers an increased database choice. SQL Server on Linux should pave the way to reach a broader set of users and innovations. Making SQL Server available on Linux increases Microsoft’s potential market. This initiative should be seen alongside .NET Core, Linux on Azure and other moves like the Linux R Server acquisition from Revolution Analytics. A private preview release of SQL Server on Linux is available on an invitation-only basis. Microsoft targets the full release by mid-2017. For more information on this announcement, read Microsoft blog.  As a Certified partner of Microsoft, Innovatix Technology Partners, formerly Macrosoft Inc., a NJ-based technology consulting company is waiting anxiously to offer this extended platform support of SQL to their clients. Innovatix has been serving customers across the globe on a variety of technologies including CCM, Legacy systems like VFP, VB, Classic ASP to .NET conversion, document composition services etc. 

G. N. Shah March 22, 2016 No Comments

ASP.NET MVC6 – A Complete Rewrite Of The Framework.

Microsoft has come up with ASP.NET MVC6 altering its popular .NET web framework. Keeping the fundamental concepts of Model View Controller intact, the new release is a complete ground-up rewrite of the framework bringing exciting changes coveted for developers. Most of the underlying layers of the framework have been reengineered in MVC 6.The key aspects attained with this rewrite are improved modularity, cross platform adoption and web standardization.  MVC 6 allows developers to create middleware that interacts directly with the request pipeline. MVC 6 is based on the Open Web Interface for .NET (OWIN) that provides pipeline infrastructure. Cross platform compatibility of MVC 6 allows hosting them on Linux or OSX platforms. Microsoft also supports new tools and workflows in Visual Studio and MVC 6. Let’s go through what’s new in MVC 6. VISIT OUR WEBSITE The MVC 6 project brings vital changes to: Project template In MVC 6 project template, right from the root folder and below, the project structure has been revised to align with ever changing nature of the web. The basic elements were improved with MVC 6. GruntJS is a task runner that enables you to build front-end resources such as JavaScript and CSS files. NPM and Bower together with NuGet allows developers plethora of options for bringing modular components to an application. JSON format for configuration is standardized however, the core MVC remains untouched.  Routing In MVC 6, Routing continues to improve upon advances made in MVC 5 and comes with multiple options for mapping URIs. Routes are defined in the Configure method of startup.cs. Based on application needs convention-based and attribute-based routes can be enabled. Routes defined during configuration are convention based routes whereas attribute-based routes makes use of attributes to define routes to provide finer control over the URIs in web application. Configuration MVC 6 doesn’t have a Web.Config file and the new configuration comprises of variety of options, including JSON based files and environment variables. MVC 6 follows a more modular development approach and the new configuration options are provided with Startup routine in startup.cs. With modern web practices, JSON format looks ubiquitous and the new configuration style provides greater flexibility. Dependency Injection Dependency Injection (DI) was there in previous versions of MVC, however with MVC 6 it has become easier to implement. However, the new DI container lacks robust configurations and also it’s easy to replace with feature-rich third-party tools. TagHelpers TagHelpers, although syntactically similar to HTML uses elements, and attributes are then processed by Razor on server. TagHelpers provides better developer experience with virtually seamless creation of client and server side HTML.  Angular JS AngularJS is one of the most popular client-side frameworks for building Single Page Applications (SPAs). Visual Studio 2015 includes templates for creating  AngularJS modules, controllers, directives, and factories. You can combine and minify all of your AngularJS files automatically whenever you perform a build. You can interact with an MVC 6 controller from an AngularJS $resource using REST. Cloud Optimized Framework MVC 6 supports a cloud-optimized framework and is just suitable framework for development of cloud based applications. The core advantage of using cloud optimized framework is that one will not have to upgrade the .NET version on their system for the sake of a single website. The runtime will automatically pick up the correct version of library when these MVC 6 applications will be deployed to the cloud. xUnit.net In previous versions of ASP.NET MVC, the default testing framework was the Visual Studio Unit Testing Framework (sometimes called mstest). This framework uses the [TestClass] and [TestMethod] attributes to describe a unit test. ASP.NET 5 uses xUnit.net as its unit test framework. This framework uses the [Fact] attribute instead of the [TestMethod] attribute (and no [TestClass] attribute]) Microsoft with ASP.NET MVC 6 leveraged familiar concepts by following theoretical roots of MVC and built a framework for future web development. With improvements in key areas, MVC 6 helps developers create top notch solutions to future business. For more insights on ASP.NET MVC 6 and related technologies, VISIT OUR WEBSITE