Architecture of Information Systems
Search Engine

John Samuel
CPE Lyon

Year: 2017-2018
Email: john(dot)samuel(at)cpe(dot)fr

Creative Commons License

Architecture of Information Systems

Outline: Search engine

  1. Frontend development
  2. Backend development
  3. Application programming interface

1. Frontend development: Search Engine

1. Frontend development: Search Engine

1. Frontend development: Search Engine

1. Frontend development: Search Engine

1. Frontend development: Search Engine Filters

More options (filters)

1. Frontend development: Search Engine Filters

More options (filters)

1. Frontend Development: Target Audience

1. Front-end development

Search interface

Personalized user experience

1. Frontend development: Simple search (One box)

Queries

Search Results



Leonardo da Vinci (October 2017 Google results)

1. Frontend development: Advanced search (filters)

Filter search results (Multiple boxes)

Artists on Histropedia

1. Frontend development: Advanced search (filters)

Location of Archaelogical sites (Wikidata)

1. Frontend development: Advanced search (filters)

1. Frontend development: Advanced search (filters)

1. Frontend development: Advanced search (filters)

Why filters?

1. Frontend development: Advanced Search in one-box

Operators

1. Frontend development: Advanced Search in one-box

Mnemonics

Bangs (DuckDuckGo)

1. Frontend development: Personalized user experience

Time and location (Internationalization)

Weather (weather.com)

1. Frontend development: Personalized user experience

Past user search queries

User privacy

2. Backend development

  1. Data collection
  2. Data storage
  3. Configuration
  4. Logging
  5. Dashboard
  6. Security

2.1 Backend development: Data collection

Data ownership

Data model (Data and Schema)

2.1 Backend development: Data collection

Data sources

2.1 Backend development: Data collection

Data acquisition

2.1 Backend development: Data collection

2.1 Backend development: Data collection

API
import requests
url = "https://api.github.com/"

response = requests.get(url)
print(response.json())

2.1 Backend development: Data collection

Open Data
from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper("http://query.wikidata.org/sparql")
sparql.setQuery("""
SELECT ?item WHERE {
  ?item wdt:P31 wd:Q9143;
}
LIMIT 10
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

for result in results["results"]["bindings"]:
    print(result)

2.1 Backend development: Data collection

Linked Open data cloud

2.1 Backend development: Data collection

Archived and Historical Data

2.1 Backend development: Data collection

Data cleaning and transformation

2.2 Backend development: Data storage

2.2. Backend development: Data Model

2.2 Backend development: Data storage

2.2. Document indices and Query Optimization

Document indices

Database Indexation

Query Optimization

2.2. Backend development: Caching

2.2. Backend development: Replication and Backup

Replication(Master-slave)

2.3. Resource management and configuration

Availability (Wikipedia)

2.3. Resource management and configuration

2.3. Deployment

2.3. Packaging

2.3. Load balancing

2.3. Selective Testing

A/B Testing

2.4. Backend development: Logging

2.4. Logging

Why logs?

2.4. Logging

2.5. Backend development: Dashboard

Wikimedia (Grafana: 5th October 2017)

2.5. Backend development: Dashboard

Wikimedia (Availability: 5th October 2017)

2.5. Backend development: Dashboard

2.5. Backend development: Dashboard

2.6. Backend development: Security

Login (Wikipedia)

2.6. Backend development: Security

OpenID
Mozilla Persona (2011-2016)

2.6. Detecting security vulnerabilities

3. Application programming interface

3. API: Data formats

3. API: (CRUDL) Operations

3. API: Examples

3. API: Examples

GitHub API: Repository Search

3. API: Examples

GitHub API: Pagination

3. API: Examples

3. API: Data dumps

3. Application programming interface

3. Interface definition

3. Human readable Documentation

  1. Read documentation
  2. Develop application to integrate
  3. Add business logic, if any

3. Machine-readable Documentation

  1. Fully autonomous solution to integrate
  2. Add business logic, if any

3. Quality of service

Resource usage limits

3. Quality of service

3. Security

OAuth

Project

Virtual Library

Project

Target audience

Project

References

References

Image credits