Datalog Access to Real-World Web Services
John Samuel1, Christophe Rey2
1. CPE Lyon
2. LIMOS, University Clermont-Auvergne
UNILOG 2018, Vichy, 25th June, 2018
- Web Services
- Numerous
- Heterogeneous
- Autonomous
- Evolving
- Question:
- How to integrate with these web services with minimum manual effort?
- Methodology:
- Declarative Programming especially using datalog
1. Understanding Web Services
- Interfaces
- Web application: Manual consumption using internet browsers
- Application programming interface (API): Machine consumption
- API Operation parameters
- Operation order
- Communication protocol
Understanding Web Services
- Resource names
- Tasks vs Todos
- Updates vs Tweets
- Message Formats
- Architectural style
Understanding Web Services
- Service Level Agreements
- N number of API calls per second
- N number of API calls from a single IP address
- Access blocked after limit crosses N calls
- Authentication and Authorization
- Basic HTTP authentication (user name, password)
- Open authentication (OAuth)
- Custom authentication (e.g., special URLs, generation of keys)
- Web Services: Numerous
- Social Media
- News
- Marketing
- Project Management
- Professional Networking
- Accounting
- Human Resource Management
- Web Services: Autonomous
- Shift from self-controlled database systems to third-party
managed database systems
- Users cannot modify database schema
- Users cannot modify API
- Web service: Evolution
- Change in message formats
- Change in operations
- Change in SLA (service level agreements)
- Change in authentication/authorization
- Web Services: Focus
- Communication protocol: HTTP
- Operations: Data providing operations
- Message formats: XML, JSON
- Authentication: Basic HTTP, OAuth
- RESTful/REST like web services
- Integration with one web Service
- Manually developed programs using API
- Integration with one web Service: Current workflow
- Read Web service API documentation
- Understand business requirements
- Decide relevant operations
- Write program using procedural languages (e.g., Java, PHP)
- Problem: Not scalable for a large number of web services
- Integration with multiple web services: automated solution
- Machine readable documentation for API
- Syntax: WSDL and WADL
- Syntax and Semantics: SAWSDL, OWL-S
- Autogenerated codes
- Integration with multiple web services: automated solution
- Integration with Web Services: automated solution
- There still exists web services having only human-readable API documentation
- Manual effort is therefore still required
- Question: Is it possible to reduce this manual effort?
2. Solution: Data Integration
- Data integration
- Provides uniform query interface over heterogeneous, autonomous data sources
- More than two decades of research
- Initially proposed for legacy databases
- Our proposition:
- Consider data providing API operations as database relations
- Use mediation approach of data integration for querying web services
Mediation Approach
- Global Schema
- Set of relations with attributes
- End user exposed to global schema relations
- Hides underlying heterogeneity of data sources
- Local Schema:
- Relations of individual data sources/databases
Mediation Approach
- Mapping
- Mapping required between local and global schema
- Mapping approaches:
- GAV (Global as view): Global schema is defined using local schema relations
- LAV (Local as view): Local schema is defined using global schema relations
- GLAV (Global-Local as view)
Mediation Approach
- Languages used for Mapping
- Conjunctive query
- Union of conjunctive query
- Datalog query
- Advantages
- Declarative languages (Focus on what and not on how)
- Similar to SELECT-PROJECT-JOIN (SPJ) SQL queries
Query rewriting
- Definition
- Translation of queries formulated over the global schema to local schema relations
- Algorithms
- Bucket algorithm
- Minicon algorithm
- Inverse-rules algorithm
Mediation approach in case of Web Services
- Global Schema
- Created after understanding business requirements
- Local Schema
- Every data providing API operation is considered as a local schema relation with access pattern
- Mapping
- Local and global schema relations must be mapped manually
- Query rewriting
- Queries over global schema must be translated to API operation calls
Mediation approach in case of Web Services: Query Evaluation
- Datalog Engine
- Evaluation of query generated by query rewriting algorithm
- Wrapper:
- Web service API response (in XML, JSON etc.) transformed to format understood by datalog engine (e.g., facts)
Mediation approach in case of Web Services: Wrapper
- Response Validation
- Validating schema of obtained response
- Declarative languages like XSD, JSON-schema
- Response Transformation:
- Transformed obtained response to a desired format
- Declarative languages like XSLT, JSONT
3. Implementation
Implementation
- Mapping
- LAV mapping using conjunctive queries
- Queries on global schema:
- Generic Web Service API wrapper:
- Response validation and transformation
- XSD and XSLT
- Datalog Engine:
- Modified IRIS integrated with generic wrapper
Use cases
- Feeding a data warehouse (data analysis)
- Integrated dashboard
- Web mashups
4. Future Works
- Limitations and future works
- Incomplete information
- Optimizing number of API operation calls
- Handling errors
- Handling optional input parameters
- Handling heterogeneous SLA
5. Conclusion
- Web Services
- Growing use of specialized web services
- Personal and professional use
- Integrated solutions
- Need for solutions providing a global overview
- Mediation approach as a partially automated solution
- Fully automated solution
- Semantic web languages for describing syntax and semantics
- Use of linked open data
References
- Duschka, O.M., Genesereth, M.R., Levy, A.Y.: Recursive query plans for data integration. J. Log. Program. 43(1), 49–73 (2000)
- Espinha, T., Zaidman, A., Gross, H.: Web API growing pains: Loosely coupled yet strongly tied. Journal of Systems and Software 100, 27–43 (2015)
- Fielding, R.T.: Architectural styles and the design of network-based software architectures (2000)
- Grahne, G., Kiricenko, V.: Towards an algebraic theory of information integration. Inf. Comput. 194(2), 79–100 (2004)
- Halevy, A.Y.: Theory of answering queries using views. SIGMOD Record 29(4), 40–47 (2000)
- Halevy, A.Y.: Answering queries using views: A survey. The VLDB Journal 10(4), 270–294 (Dec 2001)
- Samuel, J.: Feeding a data warehouse with data coming from web services. A mediation approach for the DaWeS prototype. Ph.D. thesis, Blaise Pascal University, Clermont-Ferrand, France (2014)
- Samuel, J.: Towards a data warehouse fed with web services. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC PhD Symposium. Lecture Notes in Computer Science, vol. 8465, pp. 874–884. Springer (2014)
- Samuel, J., Rey, C.: Dawes: Data warehouse fed with web services. In: INFORSID (2014)
- Samuel, J., Rey, C.: Generic web service wrapper for mediation based data warehousing. In: Akerkar, R., Plantié, M., Ranwez, S., Harispe, S., Lau- rent, A., Bellot, P., Montmain, J., Trousset, F. (eds.) Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics, WIMS 2016, Nı̂mes, France, June 13-15, 2016. pp. 34:1–34:4. ACM (2016)
- Ullman, J.: Information integration using logical views. Theoretical Computer Science 239(2), 189–210 (2000)